Efficient Search Across Elasticsearch and Neo4j Without Pulling Large Result Sets

i got :

  • Elasticsearch stores most of the searchable / document-like data (text fields, city, etc.)

  • Neo4j stores relationships and some entity attributes (e.g., gender, graph connections)

Each entity exists in both systems and is linked by a shared eid.

Currently, I query Elasticsearch using apoc.es.getRaw() from Neo4j to retrieve matching documents, then match the returned eids back to Neo4j nodes for further filtering and graph traversal.

The problem:
Some filters exist only in Neo4j (e.g., gender), while others exist only in Elasticsearch (e.g., city).
For example, when searching for “males living in Shiraz”, Elasticsearch may return thousands of documents for city = Shiraz, but only a small subset match gender = male in Neo4j. This results in inefficient queries because large ES result sets must be processed just to find a small valid subset.

Question:
Is there any recommended or established pattern to efficiently handle this kind of cross-database filtering between Neo4j and Elasticsearch?

Specifically:

  • Is there a way to push Neo4j filters into Elasticsearch queries?

  • Are there known architectural patterns for “joining” or synchronizing filters between the two systems?

  • How do people typically avoid scanning large Elasticsearch result sets when part of the filter logic lives in Neo4j?

I am aware that real-time joins are not natively supported, but I’m looking for practical production scale solution (e.g., selective duplication, sync strategies, query decomposition, etc.).

Any insights from people running Neo4j + Elasticsearch in production would be appreciated.

You should have a process that does ETL like:

Doc -> Neo4J (relationships) -> Elastic (full text search)

Your searches are either relational (via Neo4J terms to the documents) or textual (first elastic -> then filtered by Neo4J).

You can't mix the way you want (they are different technologies, for different purposes), and you need to programmatically determine the path your queries will take.

If you have different users, you also have to consider the sync with the different access controls for each individual on both technologies.

1 Like

is there really no architectural change or general advice to make it more efficient?
there are thousands of documents being searched and right now i search through 1000 elastic documents just to find like 10 wanted results and and since elastic has pagination limits its kinda uneficient ,
i appreciate you replying
thanks!

You can think about them as 2 separate databases ... your only alternative is to pre-process and find relationships in Neo4J and then try to create indices for those relationships in ElasticSearch.

It shouldn't really be that slow - depends on the size of your servers' memories and how optimized are your indices in both DB.