i got :
-
Elasticsearch stores most of the searchable / document-like data (text fields, city, etc.)
-
Neo4j stores relationships and some entity attributes (e.g., gender, graph connections)
Each entity exists in both systems and is linked by a shared eid.
Currently, I query Elasticsearch using apoc.es.getRaw() from Neo4j to retrieve matching documents, then match the returned eids back to Neo4j nodes for further filtering and graph traversal.
The problem:
Some filters exist only in Neo4j (e.g., gender), while others exist only in Elasticsearch (e.g., city).
For example, when searching for “males living in Shiraz”, Elasticsearch may return thousands of documents for city = Shiraz, but only a small subset match gender = male in Neo4j. This results in inefficient queries because large ES result sets must be processed just to find a small valid subset.
Question:
Is there any recommended or established pattern to efficiently handle this kind of cross-database filtering between Neo4j and Elasticsearch?
Specifically:
-
Is there a way to push Neo4j filters into Elasticsearch queries?
-
Are there known architectural patterns for “joining” or synchronizing filters between the two systems?
-
How do people typically avoid scanning large Elasticsearch result sets when part of the filter logic lives in Neo4j?
I am aware that real-time joins are not natively supported, but I’m looking for practical production scale solution (e.g., selective duplication, sync strategies, query decomposition, etc.).
Any insights from people running Neo4j + Elasticsearch in production would be appreciated.