Getting a list of children runs really slowly

oleg_neo4j · November 30, 2018, 4:05pm

Hi,

I have what should be a simple query to get a list of children (ideally grandchildren later) nodes from a parent node. I have a document node that is connected to words with a :BOW_OF relationship. The result I'm looking for a row with a documentID and a list of words in that document.
If I specify the document ID, it is very fast:

MATCH (word)-[r:BOW_OF]->(doc:Desc{id:'12345'}) RETURN doc.id, collect(word.word)

but if I take the id out, and add a LIMIT 1 on the end, it doesn't finish, so I think I'm doing something wrong.

MATCH (word)-[r:BOW_OF]->(doc:Desc) RETURN doc.id, collect(word.word) limit 1

What I would like to get to is without the limit:

MATCH (stem)-[s:STEM_OF]->(word)-[r:BOW_OF]->(doc:Desc) RETURN doc.id, collect(stem.stem)

Is there something I'm doing wrong? Thank you very much!

Oleg

andrew_bowman · November 30, 2018, 7:45pm

Since you're using an aggregation (collect) with respect to the doc.id, ALL results need to be expanded out first before the collect(). Is doc.id unique per :Desc node? If so, your aggregation should instead be by the doc node and not by its id property. That way when you do property access at the end, it only does the access once per node instead of multiple times for every row for which the same node appears.

For your LIMIT 1 approach try this instead:

MATCH (doc:Desc) 
WITH doc
LIMIT 1
MATCH (word)-[r:BOW_OF]->(doc)
WITH doc, collect(word.word) as words
RETURN doc.id, words

Alternately you could use pattern comprehension to get a list of results from a pattern:

MATCH (doc:Desc) 
WITH doc
LIMIT 1
WITH doc, [(word)-[r:BOW_OF]->(doc) | word.word] as words
RETURN doc.id, words

How many :Desc nodes are your db, and how many word and stem nodes? If the result set is huge you may have some trouble executing this via the browser (especially if the browser is attempting to visualize it). You could try using cypher-shell instead.

For your full query, you would want to do a similar approach, but make sure to get only DISTINCT stems, I'm guessing there are a lot of duplicates there.

MATCH (stem)-[:STEM_OF]->()-[:BOW_OF]->(doc:Desc)
WITH doc, collect(DISTINCT stem) as stems
RETURN doc.id, stems

oleg_neo4j · December 4, 2018, 5:10pm

Thanks for replying! :) I get it now about aggregating by node instead of property. I Yes, every doc.id is unique. I have 200k doc nodes now, but eventually a few million. Each doc node can have ~100-4000 words/stems. No, there shouldn't be any duplicate words/stems, but that will be something to check.

What I'd like to do is then add the classification(s) of each document to the query to get a result to be able to train on... classifications and a list of stems. Does this seem like a reasonable query to do that? I don't necessarily need to visualize it, but I'll try to use the cypher-shell, I just never have before.

Topic		Replies	Views
Group nodes by property type Cypher cypher	4	2222	April 24, 2020
How to use order by and limit inside a multi-line cypher query Cypher	7	1228	April 22, 2020
How to paginate intermediate nodes Cypher cypher	1	526	December 14, 2019
Tuning Cypher queries by understanding cardinality Cypher performance , cypher , knowledge-base	0	1027	August 23, 2018
Limiting MATCH results per row Cypher cypher , knowledge-base , limit	0	1744	August 23, 2018

Getting a list of children runs really slowly

Related topics