Hi,
I'm using Neo4j to store relations between 3 entities:
USER, PROFILE, COMPANY.
Here is my schema:
(:USER)-[:CONNECTED_WITH]->(:PROFILE)
(:PROFILE)-[:WORKED_AT]->(:COMPANY)
(:PROFILE)-[:CURRENTLY_WORKS_AT]->(:COMPANY)
1 USER may have many CONNECTED_WITH relations with different PROFILE nodes.
1 PROFILE may have many WORKED_AT relations with different COMPANY nodes.
1 PROFILE may have only 1 CURRENTLY_WORKS_AT relation with any company.
Each node has an "id" attribute and an associated index.
My goal is to extract the list of user IDs (preferably unique) connected to profiles that currently work at a given company.
This is my query:
MATCH (:COMPANY {id: "<id>"})<-[:CURRENTLY_WORKS_AT]-(:PROFILE)<-[:CONNECTED_WITH]-(u:USER)
RETURN DISTINCT u.id
The query works fine when COMPANY and PROFILE nodes have a low degree. But I have some nodes in my graph with lots of relations (like a large COMPANY), which causes Neo4j to scan its storage (I can see it on my IOPS charts), causing the queries to be very slow.
Here is a profile of an example query, where a company has over 18k CURRENTLY_WORKS_AT relations, which expands into 126k relations that end up leading to ~18k distinct USERS. This query, without profiling, took ~30s to complete the first time, before that part of the graph got pulled into the memory.
I have cases of COMPANY nodes with hundreds of thousands of CURRENTLY_WORKS_AT relations, where this query doesn't complete in any reasonable time.
My DB has ~700,000,000 nodes and ~1,700,000,000 relationships.
Total size of data and native indexes in all databases: ~256 GB.
I've got Neo4j 5.25.1 CE + APOC + DozerDB 5.25.1.0-alpha.1.
It's running on AWS EC2 m7g.4xlarge instance (16x CPU + 64 GB RAM).
I have a default memory config.
My main question is: can I do anything here to improve things, besides moving to a machine with enough RAM to keep the entire DB in?
I'm new to Neo4j and Cypher. Perhaps I have missed something fairly obvious. Or maybe there's nothing to do with a graph that gets this wide, rather than tall?
Any help will be much appreciated