cancel
Showing results for 
Search instead for 
Did you mean: 

Finding clusters in Ethereum transfers using Neo4j 3.5.16 and Cypher

andrespra
Node

I’m performing an analysis of token distributions in Ethereum using Neo4j and Cypher. In one of my use cases I need to find clusters. For example, if a token transfer for a particular token was made to 5 different accounts and then these 5 accounts resent all of these tokens to one particular account I need to find those 6 accounts.

The token address needs to be the same in both hops so I start my query by getting all of the token addresses for a particular transfer type. There are four transfer types, and for each transfer type there is a different number of distinct token addresses. This table shows how many transfers and token addresses I have per transfer type:
2X_8_84c4407193f180bc3f1b25c0039bb96c166e739a.png
These are my specs and configuration settings: Neo4j Version is 3.5.16 Community, Database size is 77GB, 8 CPU Cores. 128 GB RAM (only 95 available for Neo).

dbms.memory.heap.initial_size=32g
dbms.memory.heap.max_size=32g
dbms.memory.pagecache.size=55g

And this is my query for the Airdrops transfer type:

MATCH (at:Transfer {eventName:'Airdrop'})
WITH collect(DISTINCT at.address) as tokenAddresses
UNWIND tokenAddresses as tokenAddress
MATCH (t1:Transfer {eventName:'Airdrop',address:tokenAddress})-[:sent_to]->(a2:active)<-[:sent_from]-(t2:Transfer 
{address:tokenAddress})-[:sent_to]->(a3:account)
USING INDEX t1:Transfer(eventName,address)
USING INDEX t2:Transfer(address)
WITH a3.accountId as a3Account, collect(DISTINCT a2.accountId) AS a2Accounts, count(DISTINCT a2) AS 
a2count,collect(DISTINCT t2.address) as tokenAddress
WHERE a2count>=2 AND a2count <=1000
RETURN a2Accounts,a3Account,tokenAddress,a2count+1 AS clusterCount
ORDER BY clusterCount DESC;

So when the transfer type is Airdrop or Distr. my query runs and finishes in 4 seconds and 10 minutes respectively. When the transfer type is Mint it just never ends or just hangs after a while. I haven’t even tried this with Transfer type=Transfer as I know it won’t run. As you can see both accounts and transfers are Nodes, this way I can use indexes on the transfers. Any thoughts or ideas would be greatly and very much appreciated. Thank you.

5 REPLIES 5

Cobra
Ninja
Ninja

Hello @andrespra and welcome to the Neo4j community

Just to know, did you use UNIQUE CONSTRAINTS on all your nodes?

Regards,
Cobra

andrespra
Node

Thank you for your quick reply. I have a Uniqueness constraint on the account nodes for the "accountId" property. Transfers Nodes also have a unique id on the property "TransferId'. That being said, I don't use these indexes on my query as I don't query accounts or transfers by their id. I query them by their "eventName + tokenAddress" properties or just by their "tokenAddress" property, and in nboth cases I'm using indexes as it can be seen on my query after the second MATCH statement:

USING INDEX t1:Transfer(eventName,address)
USING INDEX t2:Transfer(address)

I see, no problem

Try to add UNIQUE CONSTRAINTS also on others Label node like Transfer, it will speed-up your query even if you don't use them, Neo4j will use them

Regards,
Cobra

Ok, actually :Transfer(transferId) was a regular index, not a uniqueness constraint, unlike :account(accountId). I dropped the index, created the index again as a uniqueness constraint and launched my queries again but my issue remains. The problem starts as the amount of nodes I have to check increases. Any thoughts on my memory configuration? I ran Memrec and with my available RAM I tried to match the given values as much as I could. Do you think I need more resources for this? The thing is that the graph is only 77gb in size so I think it should work ok.

Ok good:)

dbms.memory.heap.initial_size=32g
dbms.memory.heap.max_size=32g

You can let heap.initial_size at the minimum (I think it's 500Mo or something) and maybe increase heap.max_size if you can:)

And I never touched dbms.memory.pagecache.size so I can't advice you on this one

Regards,
Cobra

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online