I am trying to determine the best AWS EC2 instance type for running a Neo4j database. Here are some details about my setup:
Neo4j: Community Edition (version: 5.19.0)
Deployment: AWS Marketplace Neo4j Template
Database Schema:
Dao {
organizationId: String!
daoId: String!
name: String!
}
Proposal {
proposalNativeId: String!
proposer: String!
(Proposal)-[:IN]->(Dao)
}
Vote {
voteNativeId: String!
voter: String!
choice: String!
(Vote)-[:BELONGS_TO]->(Proposal)
}
Indexes:
- Dao - daoId column
- Proposal - proposalNativeId column
- Vote - voteNativeId column
Data Volume:
- Dao - 1
- Proposal - 62 (potentially more)
- Vote - 483 (potentially more)
Query being executed (additionally, I have queries for finding coalitions with larger groups of people):
// Step 1: Match votes and collect votes per voter
MATCH (d:Dao { daoId: '9ad9fb81-ad04-4830-85c2-212034072580' })<-[:IN]-(v:Vote)-[:BELONGS_TO]->(p:Proposal)
WITH d, p, v.voter AS voter, v.choice AS choice
ORDER BY voter, p.proposalNativeId
// Step 2: Collect votes per voter
WITH d, voter, COLLECT({proposal: p.proposalNativeId, choice: choice, proposal_choice: p.proposalNativeId + ":" + choice}) AS votes
WITH d, COLLECT({voter: voter, votes: votes}) AS voterPatterns
// Step 3: Compare voting patterns for groups of 8 voters
UNWIND voterPatterns AS v1
UNWIND voterPatterns AS v2
UNWIND voterPatterns AS v3
UNWIND voterPatterns AS v4
UNWIND voterPatterns AS v5
UNWIND voterPatterns AS v6
UNWIND voterPatterns AS v7
UNWIND voterPatterns AS v8
// Step 4: Ensure unique groups of voters
WITH v1,v2, v3, v4, v5, v6, v7, v8
WHERE v1.voter < v2.voter AND v2.voter < v3.voter AND v3.voter < v4.voter AND v4.voter < v5.voter AND v5.voter < v6.voter AND v6.voter < v7.voter AND v7.voter < v8.voter
// Step 5: Find common proposals where choices match
WITH v1,v2, v3, v4, v5, v6, v7, v8, apoc.coll.intersection([a IN v1.votes | a.proposal_choice], [b IN v2.votes | b.proposal_choice]) AS commonProposals
WHERE SIZE(commonProposals) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals, apoc.coll.intersection(commonProposals, [c IN v2.votes | c.proposal_choice]) AS commonProposals2
WHERE SIZE(commonProposals2) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals2 AS commonProposals, apoc.coll.intersection(commonProposals, [d IN v2.votes | d.proposal_choice]) AS commonProposals3
WHERE SIZE(commonProposals3) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals3 AS commonProposals, apoc.coll.intersection(commonProposals, [e IN v2.votes | e.proposal_choice]) AS commonProposals4
WHERE SIZE(commonProposals4) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals4 AS commonProposals, apoc.coll.intersection(commonProposals, [f IN v2.votes | f.proposal_choice]) AS commonProposals5
WHERE SIZE(commonProposals5) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals5 AS commonProposals, apoc.coll.intersection(commonProposals, [g IN v2.votes | g.proposal_choice]) AS commonProposals6
WHERE SIZE(commonProposals6) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals6 AS commonProposals, apoc.coll.intersection(commonProposals, [h IN v2.votes | h.proposal_choice]) AS commonProposals7
WHERE SIZE(commonProposals7) > 0
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals7 AS commonProposals, apoc.coll.intersection(commonProposals, [i IN v2.votes | i.proposal_choice]) AS commonProposals8
WHERE SIZE(commonProposals8) > 0
// Step 6: Filter matching choices
WITH v1,v2, v3, v4, v5, v6, v7, v8, commonProposals8 AS commonProposals
RETURN [v1.voter,v2.voter, v3.voter, v4.voter, v5.voter, v6.voter, v7.voter, v8.voter] AS members, SIZE(commonProposals) AS votedTogether, commonProposals
ORDER BY SIZE(commonProposals) DESC
LIMIT 20
Currently, I am trying to run this query on an AWS EC2 r6i.2xlarge
instance, and the memory usage is at 91-96%, but the query cannot be completed for several days.
What EC2 instance would be ideal for my use case, and should I focus on getting more RAM or more CPU cores, etc.?