Explain working of Custom Lucene Analyzer for full-text index search

poem_daga · August 1, 2022, 12:45pm

With the example given here, we have implemented a custom Analyzer that supports 'case insensitive exact matches' by combining KeywordTokenizerFactory and LowerCaseFilterFactory .

Implementation:

package com.test.nosql.neo4j;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.core.KeywordTokenizerFactory;
import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.custom.CustomAnalyzer;
import org.neo4j.annotations.service.ServiceProvider;
import org.neo4j.graphdb.schema.AnalyzerProvider;

import java.io.IOException;

@ServiceProvider
public class KeywordLowerAnalyzerProvider extends AnalyzerProvider {

    public static final String DESCRIPTION = "same as keyword analyzer, but additionally applies a lower case filter to all tokens";
    public static final String ANALYZER_NAME = "keyword_lower";

    public KeywordLowerAnalyzerProvider() {
        super(ANALYZER_NAME);
    }

    public Analyzer createAnalyzer() {
        try {
            return CustomAnalyzer.builder()
                    .withTokenizer(KeywordTokenizerFactory.class)
                    .addTokenFilter(LowerCaseFilterFactory.class)
                    .build();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String description() {
        return DESCRIPTION;
    }
}

We can now,

Create a index with custom analyzer

CALL db.index.fulltext.createNodeIndex("acc_idx_1",["Account"],["nativeId"], { analyzer: "keyword_lower",eventually_consistent: "true" });
Upload some data - example create two nodes

CREATE (n:Account) set n.nativeId = 'JOhN DoE' return n;

CREATE (n:Account) set n.nativeId = 'John Doe' return n;

In the background, property nativeId is indexed and stored in some index table...
- Question1: What is the process of applying custom index? In our example, is KeywordTokenizer and LowercaseFilter applied sequentially?
  - "JOhN DoE" --apply keyword tokenizer-> "JOhN DoE" -apply lowercase filter-> "john doe"
  - "John Doe" -> "John Doe" -> "john doe"
- Question 2: What is the actual data stored in index table? Final state i.e. 'john doe' in our example? or an intermediary state?

Now query Graph using index:

CALL db.index.fulltext.queryNodes('acc_idx_1', 'john DOE') yield node as n return n

Question3: How is the index value compared to a user input at runtime? Is input parameter value(from user) tokenized, converted to lowercase and then compared?

Overall, what is the working of Custom Analyzer, What values are stored and how are they compared at runtime?

Topic		Replies	Views
How to add custom Lucene analyzer? Operations index	5	2366	June 19, 2025
Building a custom Lucene analyzer for Neo4j Operations operations	2	1032	February 18, 2019
Creating a case insensitive full-text index with analyzer 'whitespace' Cypher index	2	545	May 5, 2022
Fulltext search on keyword analyzer not working as expected Newbie Questions	4	741	February 3, 2020
Is there documentation on how to configure custom tokenizer in full-text indexing? Neo4j Graph Platform	1	412	November 28, 2020

Get Certified in June!

Explain working of Custom Lucene Analyzer for full-text index search

Related topics