Why is this query so slow?

Neo4j Desktop 5.20
Macbook Pro Sonoma 14.5

LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
                       MATCH (a {id: row.head})
                       MATCH (b {id: row.tail})
                       CALL apoc.create.relationship(a, row.relationship, {}, b)
                       YIELD rel
                       RETURN rel;

I have about 200K nodes in the graph, and I have created index on the 'id' property of each node. 'test.csv' contains only 20K rows. It should complete immediately. Anything wrong in the query?

@lingvisa

the 1st 2 matches effectively indicate find me a node with any label and which has a proptery named id and its value is row.head and row.tail respectively.

As no node label was specified in the Match statement it can not utilize an index, for an index is built upon a label and property pair and not simply a property. And so for each rows in the CSV each MATCH statement will examine all 200k nides

@dana_canzano Thanks for the good point. I will try to adjust my query to speed up.

@dana_canzano what's the right syntax to add 'label' to the query above:

LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
                       MATCH (a {id: row.head})
                       MATCH (b {id: row.tail})
                       CALL apoc.create.relationship(a, row.relationship, {}, b)
                       YIELD rel
                       RETURN rel;

For example:
MATCH (a:row.label {id: row.head})

The 'label' column is from my csv file. But this doesn't work, and why is row.head working? Is there a way that I can directly use LOAD csv file with the code I want as shown here?

I tried this below. It didn't report error messages, but don't load any relationship either. Probably the syntax is wrong:

LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
                        WITH row, row.headType AS headLabel, row.tailType AS tailLabel
                        MATCH (a:headLabel {id: row.head})
                        MATCH (b:tailLabel {id: row.tail})
                        CALL apoc.merge.relationship(a, row.relationship, {}, {}, b)
                        YIELD rel
                        RETURN rel;

I'm not sure what your csv looks like, but you're using cypher in a very disadvantageous manner, which is leading to your performance issues. You cannot inject the "label" column value as a node Label, nor can you do it as a relationship type (hence your use of the apoc.create.relationship rather than using CREATE or MERGE Cypher clauses), but you can take column values from the csv and match properties to them. Properties are treated much differently in Cypher than Node Labels and Relationship Types.
I'd highly recommend some graph academy courses to help you better understand the nature of cypher:
Importing CSV data into Neo4j | Graph Databases and Neo4j | Free Neo4j Courses from GraphAcademy
Cypher Fundamentals | Beginner | Free Neo4j Courses from GraphAcademy
Intermediate Cypher Queries | Cypher | Free Neo4j Courses from GraphAcademy

Now to help solve your issue, firstly, you could try using the apoc.create.node() or apoc.merge.node() procedures, similar to how you're using apoc.create.relationship()
apoc.create.node - APOC Extended Documentation (neo4j.com)

But it is best to do things with regular cypher in most cases. I would recommend potentially

  1. filtering the rows before running your cypher
    OR
  2. dividing your data into separate CSV's if you have only a few varying Label values and load each CSV separately using an explicit cypher statement.

Also, creating a uniqueness constraint (if appropriate) or adding an index allows Neo4j to build and index to give you O(1) retrieval speed for that particular node property.

For examples:

//Create constraint 
CREATE CONSTRAINT unique_Person_id 
FOR (p:Person) REQUIRE p.id IS UNIQUE;

//Option 1: Filtering rows for the to run the write cypher for the right label
LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
WITH row
CALL{
    WITH row //double with for simple external references for CALL
    WITH row
    WHERE row.label = 'Person'
    MATCH (a:Person {id: row.head})
    MATCH (b:Person {id: row.tail})
    MERGE (a)-[:FRIEND_OF]->(b)
}
CALL{
    WITH row //double with for simple external references for CALL
    WITH row
    WHERE row.label1 = 'Movie' AND row.label2 = 'Person'
    MATCH (a:Movie{id: row.head})
    MATCH (b:Person{id: row.tail})
    MERGE (a)-[:DIRECTED_BY]->(b)
}

//Option 2: Separating Files (preferably files or reference data AKA nodes, and seperate relationship files that show how the nodes connect)
LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
MATCH (a:Person {id: row.head})
MATCH (b:Person {id: row.tail})
MERGE (a)-[:FRIEND_OF]->(b);

LOAD CSV WITH HEADERS FROM 'file:///test.rel2.csv' AS row
MATCH (a:Movie{id: row.head})
MATCH (b:Person{id: row.tail})
MERGE (a)-[:DIRECTED_BY]->(b);
1 Like

@gq16, I have test.entity.csv to store entity with columns:
id type name

Then I have test.rel.csv:
head tail relationship headType tailType
where 'head' and 'tail' are 'id' from test.entity.csv. Similarly, 'headType' and 'tailType' are 'type' column from test.entity.csv.

Because I am loading from files, and the node label could be many in my data, I need a way to get the node label (type) directly from each 'row' in LOAD CSV statement.

In both your options, you need to specify the actual label name like "Person" and "Movie". But this is variable from each row and I can't explicitly its actual label name. If I use:
WHERE row.label1 = 'Movie' AND row.label2 = 'Person'

This looks like 'if' logic and if have lots of labels, this quickly becomes unmanagable.

So, is there a way to do something like:

LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
                        WITH row, row.headType AS headLabel, row.tailType AS tailLabel
                        MATCH (a:headLabel {id: row.head})
                        MATCH (b:tailLabel {id: row.tail})
                        CALL apoc.merge.relationship(a, row.relationship, {}, {}, b)
                        YIELD rel
                        RETURN rel;

I thought this should work, and it didn't report any error, but it didn't load any relatioship either. Is this logically right? It doesn't report any error

Actually, this way of loading nodes are working:

LOAD CSV WITH HEADERS FROM 'file:///{file}' AS row
                        WITH row WHERE row.id IS NOT NULL AND row.id <> ''
                        WITH row, CASE WHEN row.type IS NOT NULL AND row.type <> '' THEN row.type ELSE 'DefaultLabel' END AS label
                        CALL apoc.merge.node([label], {{id: row.id}}, row) 
                        YIELD node
                        RETURN node;

You can see that "[label]" works in the merge, but why doesn't this exist for 'merge.relationship'?

Actually, the problem is in the Match statement, not the "merge" part

Try using apoc.cypher.run() this may work. I am not able to test since I'm not sure exactly what your file structure is, so feel free to modify and adjust as needed.

LOAD CSV WITH HEADERS FROM 'file:///test.rel.csv' AS row
WITH row, row.headType AS headLabel, row.tailType AS tailLabel
// Remake of MATCH (a:headLabel {id: row.head})
CALL apoc.cypher.run("MATCH (a:" + headLabel + "{id:$headId}) RETURN a", {headId:row.head})
YIELD value
WITH row, tailLabel, value.a AS a
// Remake of MATCH (b:tailLabel {id: row.tail})
CALL apoc.cypher.run("MATCH (b:" + tailLabel + "{id:$tailId}) RETURN b", {tailId:row.tail})
YIELD value
WITH row, a, value.b AS b

CALL apoc.merge.relationship(a, row.relationship, {}, {}, b)
YIELD rel
RETURN rel;

Although I recommend either using some external language driver such as Java or Python to be able to more easily loop through the labels and inject those label strings into the cypher query.

OR

changing your import file structure

OR

Changing your neo4j data model (depends on what your user story is and the goals you have for your model).