I'm using Neo4j's full-text indexes and trying to decide between analyzers for indexing full names that may or may not include diacritics. I came across the standard-folding
analyzer, and the documentation says:
"An analyzer that uses ASCIIFoldingFilter to remove accents (diacritics). Otherwise behaves as the standard English analyzer. Note! This analyzer may have unexpected behavior, such as tokenizing, for all non-ASCII numbers and symbols."
However, it's unclear whether "standard English analyzer" here refers to the english
analyzer (which includes stemming and stop word filtering) or something else.
My Questions:
- Does the
standard-folding
analyzer also perform stemming like theenglish
analyzer? - If I want to index full names (e.g., "JosĂ© GarcĂa", "Jose Garcia"), where:
- Diacritics should be ignored (e.g., "José" = "Jose")
- Stemming should NOT be applied
- No stop-word filtering is neededâ What is the recommended analyzer to use in this case?
Thanks in advance for any clarification!