How to fetch data on from data base having around 60 million node of one type?

yeshveer.yadav · October 18, 2019, 4:58am

HI
i need to run simple match query
match(n:label)-[:has_example]->(m:label2) with m, count(n.property) as c, collect(n.property) as c2 where c>100
return m.property, c,c2 order by c desc limit 10
on 60+ million nodes ...

and my server is not able to run ,.. how can i make it run via parallel or any other way..??

benjamin.squire · October 18, 2019, 5:52am

I am confused on your query, you have
collect(n.property) as c, collect(n.property) as c2
What is the point of collecting it twice?

As for running in parallel, one method to do this would be using apoc.periodic.iterate. In this case you can't simply return something, you must either set or create. Since you are grouping by m.property what you could do is something like this:
CALL apoc.periodic.iterate("MATCH (m:label2) WHERE size((m)<-[:has_example]-()) > 0 RETURN m","WITH m MATCH (m)<-[:has_example]-(n:label) SET m.count_property = count(n.property) , m.count_property2 = count(n.property)",{batchSize:10000,parallel:true,iterateList:true})
Then you could run something like
MATCH (m:label2) RETURN m.property, m.count_property AS c,m.count_property2 AS c2 ORDER BY c DESC LIMIT 10
This can further be sped up with CREATE INDEX ON :label2(count_property)

yeshveer.yadav · October 18, 2019, 6:12am

MATCH (n:user)-[:has_mobile]->(m:Mobile) with m.mobile as Mob_Number, count(n.ID) as c, collect(n.ID) as id
where c>100
return Mob_Number,id, c order by c desc limit 10

this is the exact query one is count and other is collect

i just need to read data of top ten count

benjamin.squire · October 18, 2019, 6:28am

Then it is the same as above only once you get the top ten in the second query you would expand those in order to get the collection of n.ID

michael.hunger · November 12, 2019, 12:38am

What does your memory config and disk IO look like?

Can you run PROFILE instead of EXPLAIN?

You could try this:

match(n:label)-[:has_example]->(m:label2) 
with m, count(*) as c, collect(n) as c2 where c>100
return m.property, c,c2 
order by c desc limit 10

Or even better, just do the cheap aggregation + sorting first, then go and re-fetch the related data for the 10 top nodes.

Please take into account that it might blow up the browser if your c is really large, as you would return lists with millions of properties

match(n:label)-[:has_example]->(m:label2) 
with m, count(*) as c where c>100
with m, c order by c desc limit 10
match (n:label)-[:has_example]->(m) 
with m, c, collect(n.property) as c2 
return m.property, c, c2

Topic		Replies	Views
Strategy for matching 6 million nodes Cypher	2	198	April 2, 2022
Cypher query optimization General migrated	13	188	August 30, 2022
Optimize Neo4j cypher query on huge dataset Cypher optimization , performance , cypher , neo4j	3	378	December 20, 2021
Query running for forever Cypher	9	741	May 28, 2020
Run MATCH query for multi core machine Python	4	3199	September 25, 2018

July Summer Fun!

How to fetch data on from data base having around 60 million node of one type?

Related topics