cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Debugging a long running query

Max-Tait
Node Link

Hi all,

I have a problematically large cypher query that seems to run forever.

Little background.

I am running neo4j in a docker container as part of docker compose.
The machine this is running on has 4 vCPUs and 32 GiB of memory.
Neo4j currently uses 5gb of disk space.

Neo4j docker container:

neo4j:
image: neo4j:4.4.3
hostname: neo4j
container_name: neo4j
environment:
- NEO4J_dbms_logs_debug_level:debug
- NEO4J_dbms_connector_https_advertised__address:localhost:7473
- NEO4J_dbms_connector_http_advertised__address:localhost:7474
- NEO4J_dbms_connector_bolt_advertised__address:localhost:7687
- NEO4J_apoc_uuid_enabled=true
- NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
- NEO4J_dbms_security_procedures_unrestricted=gds.*, apoc.*
- NEO4J_dbms_memory_heap_initial__size=12000m
- NEO4J_dbms_memory_heap_max__size=12000m
- NEO4J_dbms_memory_pagecache_size=12100m
- NEO4J_dbms_jvm_additional=-XX:+ExitOnOutOfMemoryError
- NEO4J_dbms_logs_gc_enabled=true

 Relevant database schema:

(d:Document)<-[:IN_DOC]-(t:Table)<-[:IN_TABLE]-(tr:TableRow)<-[:IN_ROW]-(p:Phrase)

Indexes:

CREATE CONSTRAINT ON (p:Phrase) ASSERT p.content IS UNIQUE;
CREATE INDEX on :Phrase(content);
CALL db.index.fulltext.createNodeIndex("Phrases", ["Phrase"], ["content"]);
CREATE INDEX on :TableRow(encoded_content);
CALL db.index.fulltext.createNodeIndex("TableRows", ["TableRow"], ["text"]);

Different versions of the query I have tried

Query v1:

MATCH (d:Document {id: $doc_id})
MATCH (tr:TableRow)-[:IN_TABLE]->(t:Table)-[:IN_DOC]->(d)
MATCH (p1:Phrase)-[c1:IN_ROW]->(tr) 
MATCH (p2:Phrase)-[:IN_ROW]->(tr) 
    WHERE NOT p1=p2
MERGE (p1)-[:CONNECTED_TO {relation: c1.label}]->(p2)

Query v2:

CALL apoc.periodic.iterate(
	'
	 	MATCH (d:Document {id: $doc_id})
		MATCH (tr:TableRow)-[:IN_TABLE]->(t:Table)-[:IN_DOC]->(d)
		MATCH (p1:Phrase)-[c1:IN_ROW]->(tr) MATCH (p2:Phrase)-[:IN_ROW]->(tr) 
			WHERE NOT p1=p2
		RETURN p1, p2, c1 
	', 
	'MERGE (p1)-[:CONNECTED_TO {relation: c1.label}]->(p2)', 
	{batchSize:10000, parallel:true, params:{doc_id: $doc_id}}
);

 Query v3: This was an attempt to use less ram

    CALL apoc.periodic.commit(
	    "
			MATCH (d:Document {id: $doc_id}) 
			MATCH (tr:TableRow)-[:IN_TABLE]->(t:Table)-[:IN_DOC]->(d)
			MATCH (p1:Phrase)-[c1:IN_ROW]->(tr) MATCH (p2:Phrase)-[:IN_ROW]->(tr) 
				WHERE NOT p1=p2 AND NOT (p1)-[:CONNECTED_TO]->(p2)
	    	WITH c1, p2, p1 LIMIT $limit
		    MERGE (p1)-[:CONNECTED_TO {relation: c1.label}]->(p2) RETURN count(*)
	    ", 
	    {limit:1000, doc_id: $doc_id}
    );

Some database stats:

MATCH (d:Document {id:$doc_id}) RETURN count(d)
	1 
MATCH (d:Document {id:$doc_id})--(t:Table) RETURN count(t)
	1
MATCH (d:Document {id:$doc_id})--(t:Table)--(tr:TableRow) RETURN count(tr)
	253,762
MATCH (d:Document {id:$doc_id})--(t:Table)--(tr:TableRow)--(p:Phrase) RETURN count(p)
	7,612,860
MATCH (d:Document {id:$doc_id})--(t:Table)--(tr:TableRow)
MATCH (tr)--(p1:Phrase)
MATCH (tr)--(p2:Phrase)
	WHERE NOT p1=p2 
RETURN count(p2)
	203,009,870

 

So this is a huge query and while it justifiably should take a long time to run, however I haven't been able to get this query to complete while even giving it 12+ hours to run.


Any recommendations on how to fix this query or what I might be doing wrong are greatly appreciated. 

1 ACCEPTED SOLUTION

glilienfield
Ninja
Ninja

I think this is a duplicate of the following post.  I think it was getting marked as spam, so the author reposted it. Maybe the original got unmarked as spam. 

https://community.neo4j.com/t5/neo4j-graph-platform/debugging-a-long-running-query/td-p/57818

View solution in original post

4 REPLIES 4

TrevorS
Community Team
Community Team

Hello @Max-Tait 

Thanks for reaching out!

Are you able to provide the debug.log file for this query? This will help to determine what could be causing this to run for such extended times.

Thanks!

TrevorS
Community Specialist

Hi @TrevorS,

Here is a google drive link to the debug.log file, I wasn't able to find a way to upload it directly to this site.
The log was copied 15 minutes after the query had started but not completed.

glilienfield
Ninja
Ninja

I think this is a duplicate of the following post.  I think it was getting marked as spam, so the author reposted it. Maybe the original got unmarked as spam. 

https://community.neo4j.com/t5/neo4j-graph-platform/debugging-a-long-running-query/td-p/57818

Thanks @glilienfield 

You are correct! I am going to mark your post as a solution and leave this post as-is.
@Max-Tait Can you post your debug log to the main thread?

TrevorS
Community Specialist