Neo4j query really slow

Dragotic · April 8, 2019, 10:51am

So, I am running a local neo4j database which has 5000 nodes, 5000 relationships and is of 120MB in size.

I am running a query:

MATCH p=(:Tweet)-[:REPLIED_TO|RETWEETED_FROM*]->(:Tweet)-[:REPLIED_TO|RETWEETED_FROM]->(:Tweet {type: 'TWEET'})
RETURN p LIMIT 150

Having the LIMIT modifier returns the result in ~20seconds, while not adding the modifier LIMIT took me at least 5 minutes where I stopped running the query.

Do you have any idea why it runs so slow?

Let me give you a little more insight into the db schema.

I have tweets, retweets, and replies. For each tweet, I'm creating a chain ordered by timestamp. The chain has replies-retweets and ends to the specific tweet.

The above pattern returns this type of chains.

andrew_bowman · April 8, 2019, 11:45pm

I think you may not fully understand what this query is doing.

It's finding every single possible path that matches this pattern, and you have no limit on the length of the relationships in this chain, and you're doing this for all possible :Tweets of type 'TWEET'. I think you'll find that the number of possible paths is skyrocketing into the hundreds of thousands if not higher.

Even if this is what you want, I'm not sure what you would do with all of those rows that you're returning. For sure the browser can't handle that volume of data and display it.

If you can, please be more specific in what you're trying to do here, as what this is doing currently isn't efficient and doesn't seem useful as it is.

You say you're creating chains, but there are no CREATE or MERGE operations here. Also you mentioned specific tweets, but your query for a :Tweet with the 'TWEET' type doesn't seem specific to me.

Dragotic · April 9, 2019, 8:30am

The chain is created at some prior point. The :Tweet nodes are connected with :REPLIED_TO, :RETWEETED_FROM relationships.

What I want to achieve is to get these types of chains. The longest is probably 250-300 nodes. Is there a better way to get them, instead of the above pattern?

andrew_bowman · April 11, 2019, 8:59pm

One thing you could try is ensuring the end tweets in the chain are end nodes (which either don't reply or retweet any other node, or are not themselves replies or retweets) as your query currently finds subchains of any length.

You could give this a try:

MATCH p=(end:Tweet)-[:REPLIED_TO|RETWEETED_FROM*]->(start:Tweet {type: 'TWEET'}) 
WHERE NOT (start)-[:REPLIED_TO|RETWEETED_FROM]->() AND NOT ()-[:REPLIED_TO|RETWEETED_FROM]->(end)
RETURN p LIMIT 150

Start with a lower limit to make sure it's working okay then scale up. Also you may want to add your PROFILE plan of the query (with all elements expanded) to take a look at how it's being planned and executed.

Dragotic · April 12, 2019, 9:36am

Hey @andrew_bowman, thanks a lot. The query did run way faster.

Below you can find the Profile plan of the query that you asked for.

Topic		Replies	Views
Why are my queries so slow..? Neo4j Graph Platform migrated	2	305	August 12, 2022
Cypher query slow performance Cypher cypher	5	597	November 12, 2023
Simple relational query is very slow Neo4j Graph Platform migrated , cypher-tagged	4	272	January 12, 2023
Neo4j crashing while running cypher path query Cypher paths	4	1220	April 7, 2019
Is Neo4j good enough for data models relationship oriented? Cypher	5	1263	April 26, 2019

Take the Course Then Join The Aura Agent Hackathon

Neo4j query really slow

Related topics

Take the Course Then Join
The Aura Agent Hackathon