Why does the query run so long?

I'm encountering an issue with my Neo4j database deployed on an AWS EC2 r6i.2xlarge (8 vCPU, 64GB of RAM) instance. My query has been running for a few weeks and doesn't execute. Here are some details about my setup:

Neo4j: Community Edition (version: 5.19.0)

Deployment: AWS Marketplace Neo4j Template

Database Schema:

Dao {
    organizationId: String!
    daoId: String!
    name: String!
}

Proposal {
    proposalNativeId: String!
    proposer: String!
    (Proposal)-[:BELONGS_TO]->(Dao)
}

Vote {
    voteNativeId: String!
    voter: String!
    choice: String!
    (Vote)-[:IN]->(Proposal)
}

Indexes:

  • Dao - daoId column
  • Proposal - proposalNativeId column
  • Vote - voteNativeId column

Data Volume:

  • Dao - 1
  • Proposal - 62 (potentially more)
  • Vote - 483 (potentially more)

Query being executed (additionally, I have queries for finding coalitions with larger groups of people):

// Step 1: Match votes and collect votes per voter
    MATCH (d:Dao { daoId: '9ad9fb81-ad04-4830-85c2-212034072580' })<-[:IN]-(v:Vote)-[:BELONGS_TO]->(p:Proposal)
    WITH d, p, v.voter AS voter, v.choice AS choice
    ORDER BY voter, p.proposalNativeId
    
    // Step 2: Collect votes per voter
    WITH d, voter, COLLECT({proposal: p.proposalNativeId, choice: choice, proposal_choice: p.proposalNativeId + ":" + choice}) AS votes
    WITH d, COLLECT({voter: voter, votes: votes}) AS voterPatterns
  
    
    // Step 3: Compare voting patterns for groups of 8 voters
    UNWIND voterPatterns AS v1
    UNWIND voterPatterns AS v2
  UNWIND voterPatterns AS v3
  UNWIND voterPatterns AS v4
  UNWIND voterPatterns AS v5
  UNWIND voterPatterns AS v6
  UNWIND voterPatterns AS v7
  UNWIND voterPatterns AS v8
    // Step 4: Ensure unique groups of voters
    WITH v1,v2, v3, v4, v5, v6, v7, v8
    WHERE v1.voter < v2.voter AND v2.voter < v3.voter AND v3.voter < v4.voter AND v4.voter < v5.voter AND v5.voter < v6.voter AND v6.voter < v7.voter AND v7.voter < v8.voter  
    
    // Step 5: Find common proposals where choices match
    WITH v1,v2, v3, v4, v5, v6, v7, v8, apoc.coll.intersection([a IN v1.votes | a.proposal_choice], [b IN v2.votes | b.proposal_choice]) AS commonProposals
    WHERE SIZE(commonProposals) > 0
  
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals, apoc.coll.intersection(commonProposals, [c IN v2.votes | c.proposal_choice]) AS commonProposals2
    WHERE SIZE(commonProposals2) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals2 AS commonProposals, apoc.coll.intersection(commonProposals, [d IN v2.votes | d.proposal_choice]) AS commonProposals3
    WHERE SIZE(commonProposals3) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals3 AS commonProposals, apoc.coll.intersection(commonProposals, [e IN v2.votes | e.proposal_choice]) AS commonProposals4
    WHERE SIZE(commonProposals4) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals4 AS commonProposals, apoc.coll.intersection(commonProposals, [f IN v2.votes | f.proposal_choice]) AS commonProposals5
    WHERE SIZE(commonProposals5) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals5 AS commonProposals, apoc.coll.intersection(commonProposals, [g IN v2.votes | g.proposal_choice]) AS commonProposals6
    WHERE SIZE(commonProposals6) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals6 AS commonProposals, apoc.coll.intersection(commonProposals, [h IN v2.votes | h.proposal_choice]) AS commonProposals7
    WHERE SIZE(commonProposals7) > 0
    
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals7 AS commonProposals, apoc.coll.intersection(commonProposals, [i IN v2.votes | i.proposal_choice]) AS commonProposals8
    WHERE SIZE(commonProposals8) > 0
    
    // Step 6: Filter matching choices
    WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals8 AS commonProposals
    
    RETURN [v1.voter,v2.voter, v3.voter, v4.voter, v5.voter, v6.voter, v7.voter, v8.voter] AS members, SIZE(commonProposals) AS votedTogether, commonProposals
    ORDER BY SIZE(commonProposals) DESC
    LIMIT 20

The query retrieves information about voters who voted together on the same proposals with the same choice within a DAO. The goal: get ## number people coalitions for ## number of votes.

Any insights into resolving this problem would be greatly appreciated. Thank you!

@yurii.pristay

is this not a duplicate of your prior post at What is the best AWS EC2 instance for a Neo4j database? - #2 by glilienfield

and was @glilienfield response not sufficient?