I'm encountering an issue with my Neo4j database deployed on an AWS EC2 r6i.2xlarge (8 vCPU, 64GB of RAM) instance. My query has been running for a few weeks and doesn't execute. Here are some details about my setup:
Neo4j: Community Edition (version: 5.19.0)
Deployment: AWS Marketplace Neo4j Template
Database Schema:
Dao {
organizationId: String!
daoId: String!
name: String!
}
Proposal {
proposalNativeId: String!
proposer: String!
(Proposal)-[:BELONGS_TO]->(Dao)
}
Vote {
voteNativeId: String!
voter: String!
choice: String!
(Vote)-[:IN]->(Proposal)
}
Indexes:
- Dao - daoId column
- Proposal - proposalNativeId column
- Vote - voteNativeId column
Data Volume:
- Dao - 1
- Proposal - 62 (potentially more)
- Vote - 483 (potentially more)
Query being executed (additionally, I have queries for finding coalitions with larger groups of people):
// Step 1: Match votes and collect votes per voter
MATCH (d:Dao { daoId: '9ad9fb81-ad04-4830-85c2-212034072580' })<-[:IN]-(v:Vote)-[:BELONGS_TO]->(p:Proposal)
WITH d, p, v.voter AS voter, v.choice AS choice
ORDER BY voter, p.proposalNativeId
// Step 2: Collect votes per voter
WITH d, voter, COLLECT({proposal: p.proposalNativeId, choice: choice, proposal_choice: p.proposalNativeId + ":" + choice}) AS votes
WITH d, COLLECT({voter: voter, votes: votes}) AS voterPatterns
// Step 3: Compare voting patterns for groups of 8 voters
UNWIND voterPatterns AS v1
UNWIND voterPatterns AS v2
UNWIND voterPatterns AS v3
UNWIND voterPatterns AS v4
UNWIND voterPatterns AS v5
UNWIND voterPatterns AS v6
UNWIND voterPatterns AS v7
UNWIND voterPatterns AS v8
// Step 4: Ensure unique groups of voters
WITH v1,v2, v3, v4, v5, v6, v7, v8
WHERE v1.voter < v2.voter AND v2.voter < v3.voter AND v3.voter < v4.voter AND v4.voter < v5.voter AND v5.voter < v6.voter AND v6.voter < v7.voter AND v7.voter < v8.voter
// Step 5: Find common proposals where choices match
WITH v1,v2, v3, v4, v5, v6, v7, v8, apoc.coll.intersection([a IN v1.votes | a.proposal_choice], [b IN v2.votes | b.proposal_choice]) AS commonProposals
WHERE SIZE(commonProposals) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals, apoc.coll.intersection(commonProposals, [c IN v2.votes | c.proposal_choice]) AS commonProposals2
WHERE SIZE(commonProposals2) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals2 AS commonProposals, apoc.coll.intersection(commonProposals, [d IN v2.votes | d.proposal_choice]) AS commonProposals3
WHERE SIZE(commonProposals3) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals3 AS commonProposals, apoc.coll.intersection(commonProposals, [e IN v2.votes | e.proposal_choice]) AS commonProposals4
WHERE SIZE(commonProposals4) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals4 AS commonProposals, apoc.coll.intersection(commonProposals, [f IN v2.votes | f.proposal_choice]) AS commonProposals5
WHERE SIZE(commonProposals5) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals5 AS commonProposals, apoc.coll.intersection(commonProposals, [g IN v2.votes | g.proposal_choice]) AS commonProposals6
WHERE SIZE(commonProposals6) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals6 AS commonProposals, apoc.coll.intersection(commonProposals, [h IN v2.votes | h.proposal_choice]) AS commonProposals7
WHERE SIZE(commonProposals7) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals7 AS commonProposals, apoc.coll.intersection(commonProposals, [i IN v2.votes | i.proposal_choice]) AS commonProposals8
WHERE SIZE(commonProposals8) > 0
// Step 6: Filter matching choices
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals8 AS commonProposals
RETURN [v1.voter,v2.voter, v3.voter, v4.voter, v5.voter, v6.voter, v7.voter, v8.voter] AS members, SIZE(commonProposals) AS votedTogether, commonProposals
ORDER BY SIZE(commonProposals) DESC
LIMIT 20
The query retrieves information about voters who voted together on the same proposals with the same choice within a DAO. The goal: get ## number people coalitions for ## number of votes.
Any insights into resolving this problem would be greatly appreciated. Thank you!