Create cypher query very slow

Hello everyone, I hope I'm not rehashing a topic that's already been created, but I haven't found any problems similar to mine.

I'm going to start by explaining my project to you, I'm currently at engineering school and for a project we need to create a graph database. we have a lot of data, this data is articles from French codes of law linked to legal dictionaries in French.
For this we have created different programs and scripts to create text files containing the creation of noeds and links, but we are encountering optimisation problems knowing that we have imported just 1/20 of our data which corresponds to around 20,000 nodes and 3,000 links, and this took us almost 2 hours.
So I'm trying to find out how to optimise this knowing that I've tried on the desktop version on a MacBook Air but also on the computers at my school which have better capacity but it didn't change the only difference was on the neo4j online version where it was a bit faster.

If I've forgotten anything, let me know and I'll get back to you as soon as possible.
here's an example of what I have :

CREATE (D3:Dictionnaire {mots:'Abandon '})
CREATE (D4:Dictionnaire {mots:'Abandon de biens '})
CREATE (D5:Dictionnaire {mots:'Abandon de domicile '})
CREATE (D6:Dictionnaire {mots:'Abandon d’enfant '})
CREATE (D7:Dictionnaire {mots:'Abandon de famille '})
CREATE (D8:Dictionnaire {mots:'Abattement '})
CREATE (D9:Dictionnaire {mots:'Abattement supplémentaire '})
CREATE (D10:Dictionnaire {mots:'Ab intestat '})
CREATE (D11:Dictionnaire {mots:'Ab irato '})
CREATE (D12:Dictionnaire {mots:'Abolition '})
CREATE (D13:Dictionnaire {mots:'Abondement '})
CREATE (D14:Dictionnaire {mots:'Ă€ bon droit '})
CREATE (D15:Dictionnaire {mots:'Abordage '})
CREATE (D16:Dictionnaire {mots:'Abornement '})
CREATE (D17:Dictionnaire {mots:'Aboutissants '})
CREATE (D18:Dictionnaire {mots:'Abrogation '})
CREATE (D19:Dictionnaire {mots:'Absence '})
CREATE (D20:Dictionnaire {mots:'Absentéisme '})
CREATE (D21:Dictionnaire {mots:'Absentéisme scolaire '})
CREATE (D22:Dictionnaire {mots:'Absolu '})
CREATE (D23:Dictionnaire {mots:'Absolution '})
CREATE (D24:Dictionnaire {mots:'Absolutisme '})
CREATE (D25:Dictionnaire {mots:'Abstention '})
CREATE (D26:Dictionnaire {mots:'Abstention délictueuse '})
CREATE (D27:Dictionnaire {mots:'Abstentionnisme Ă©lectoral '})
CREATE (D28:Dictionnaire {mots:'Abstrait '})
CREATE (D29:Dictionnaire {mots:'Abstrat '})
CREATE (D30:Dictionnaire {mots:'Abus '})
...
CREATE (D5653:Dictionnaire {mots:'Vigueur '})
CREATE (D5654:Dictionnaire {mots:'Vil '})
CREATE (D5655:Dictionnaire {mots:'Viol '})
CREATE (D5656:Dictionnaire {mots:'Violation de la loi '})
CREATE (D5657:Dictionnaire {mots:'Violence '})
CREATE (D5658:Dictionnaire {mots:''})
CREATE (C1:Code {type:'Civile', article:'Art. 7'})
CREATE (D1154)-[:isIn ]->(C1)
CREATE (C2:Code {type:'Civile', article:'Art. 8'})
CREATE (C3:Code {type:'Civile', article:'Art. 9'})
CREATE (D408)-[:isIn ]->(C3)
CREATE (D1646)-[:isIn ]->(C3)
CREATE (D4009)-[:isIn ]->(C3)
CREATE (D4530)-[:isIn ]->(C3)
CREATE (C4:Code {type:'Civile', article:'Art. 9-1'})
CREATE (D1646)-[:isIn ]->(C4)
CREATE (D1879)-[:isIn ]->(C4)
CREATE (D4043)-[:isIn ]->(C4)
CREATE (D4044)-[:isIn ]->(C4)
CREATE (C5:Code {type:'Civile', article:'Art. 10'})
CREATE (C6:Code {type:'Civile', article:'Art. 11'})
CREATE (C7:Code {type:'Civile', article:'Art. 12'})
CREATE (D1646)-[:isIn ]->(C7)
CREATE (C8:Code {type:'Civile', article:'Art. 14'})
...
CREATE (C3006:Code {type:'Civile', article:'Art. 6'})
CREATE (C3007:Code {type:'Civile', article:'Art. 6-1'})
CREATE (C3008:Code {type:'Civile', article:'Art. 6-2'})
CREATE (EN1:Code {type:'Environnement', article:'Art. L. 110-1'})
CREATE (D1154)-[:isIn ]->(EN1)
CREATE (EN2:Code {type:'Environnement', article:'Art. L. 110-1'})
CREATE (EN3:Code {type:'Environnement', article:'Art. L. 110-1'})
CREATE (EN4:Code {type:'Environnement', article:'Art. L. 110-2'})
CREATE (EN5:Code {type:'Environnement', article:'Art. L. 110-3'})
CREATE (EN6:Code {type:'Environnement', article:'Art. L. 110-4'})
CREATE (D2546)-[:isIn ]->(EN6)
CREATE (EN7:Code {type:'Environnement', article:'Art. L. 110-5'})
CREATE (D2546)-[:isIn ]->(EN7)
CREATE (EN8:Code {type:'Environnement', a

or

MERGE (n:Code {type:'Procédure pénale', article:'Art. 475-1'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Code {type:'Procédure pénale', article:'Art. 475-1'}) CREATE (d)-[:Dist {distance:'0.02729748934154429'}]->(node2);
MERGE (n:Code {type:'Procédure pénale', article:'Art. 462'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Code {type:'Procédure pénale', article:'Art. 462'}) CREATE (d)-[:Dist {distance:'0.171078083057003'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 53'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Loi {type:'du 29 juillet', article:'Art. 53'}) CREATE (d)-[:Dist {distance:'0.08399455234486025'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 29'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Loi {type:'du 29 juillet', article:'Art. 29'}) CREATE (d)-[:Dist {distance:'0.06670416864045477'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 33'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Loi {type:'du 29 juillet', article:'Art. 33'}) CREATE (d)-[:Dist {distance:'0.06503631770093163'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 24'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Loi {type:'du 29 juillet', article:'Art. 24'}) CREATE (d)-[:Dist {distance:'0.0005921364282330649'}]->(node2);
MERGE (n:Code {type:'Procédure pénale', article:'Art. 2'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Code {type:'Procédure pénale', article:'Art. 2'}) CREATE (d)-[:Dist {distance:'0.01086570345807674'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 48-1'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Loi {type:'du 29 juillet', article:'Art. 48-1'}) CREATE (d)-[:Dist {distance:'0.038814542870677406'}]->(node2);
MERGE (n:Code {type:'général des impôts', article:'Art. 1018'}); MATCH (d:Dictionnaire {mots:'Absence '}), (node2:Code {type:'général des impôts', article:'Art. 1018'}) CREATE (d)-[:Dist {distance:'0.2257914890257382'}]->(node2);
MERGE (n:Code {type:'Procédure pénale', article:'Art. 475-1'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Code {type:'Procédure pénale', article:'Art. 475-1'}) CREATE (d)-[:Dist {distance:'0.19579977893573347'}]->(node2);
MERGE (n:Code {type:'Procédure pénale', article:'Art. 462'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Code {type:'Procédure pénale', article:'Art. 462'}) CREATE (d)-[:Dist {distance:'0.3220037896731407'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 53'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Loi {type:'du 29 juillet', article:'Art. 53'}) CREATE (d)-[:Dist {distance:'0.23492025896099794'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 29'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Loi {type:'du 29 juillet', article:'Art. 29'}) CREATE (d)-[:Dist {distance:'0.21762987525659244'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 33'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Loi {type:'du 29 juillet', article:'Art. 33'}) CREATE (d)-[:Dist {distance:'0.21596202431706932'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 24'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Loi {type:'du 29 juillet', article:'Art. 24'}) CREATE (d)-[:Dist {distance:'0.03583412284857098'}]->(node2);
MERGE (n:Code {type:'Procédure pénale', article:'Art. 2'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Code {type:'Procédure pénale', article:'Art. 2'}) CREATE (d)-[:Dist {distance:'0.17936799305226592'}]->(node2);
MERGE (n:Loi {type:'du 29 juillet', article:'Art. 48-1'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Loi {type:'du 29 juillet', article:'Art. 48-1'}) CREATE (d)-[:Dist {distance:'0.20731683246486657'}]->(node2);
MERGE (n:Code {type:'général des impôts', article:'Art. 1018'}); MATCH (d:Dictionnaire {mots:'Absolu '}), (node2:Code {type:'général des impôts', article:'Art. 1018'}) CREATE (d)-[:Dist {distance:'0.39429377861992737'}]->(node2);

Do you have any indexes? The MERGE clauses causes neo4j to look for the node first, then create it if not found. As such, each MERGE contains a MATCH. This is where the index will speed things up.

Your MERGE is specifying two properties to MATCH on. As such, you can try a composite index composed of both type and article for each of the labels you are matching on.

create index code_index if not exists
for (n:Code) ON
(n.type, n.article)
create index loi_index if not exists
for (n:Loi) ON
(n.type, n.article)
1 Like

Yes as Gary said, make sure that you have the right indexes and constraints for individual node-lookups (both for MATCH and MERGE). For MERGE also make sure to merge only on a key-property, not all the properties of a node.

Best import performance is when you use parameter with your statements and batched imports.

So just pick any driver (javascript, python, java).
Read in your file and pass batches (10-50k) of a list of dictionary/maps as a parameter.

And then use UNWIND $rows as row CREATE (n:Node {id:row.id, name:row.name});

see also

If you can't do that for some reason:

Also in general it is better to have smaller statements, e.g. 1-10 and then end with a semicolon.
Not gigantic ones with tens of thousands of lines.
Otherwise the parser has to deal with those gigantic files.

Hello to both of you,
First thank you for your quick response, the index creation improve the time of my merge/create file.txt but only if I separate my file and lunch small instruction one by one (it s a big improv I will do that).

but I'm not sure I can do the UNWIND $rows as row CREATE (n:Node {id:row.id, name:row.name}); because of what I understand I need to first create an index but I parse my file with AWK so for the link

I need to still get an id that I won't have with that instruction ?
maybe I don't understand but for doing it it will be like that ?

:param data => [
    {"type": 'Civile', "article": "Art. 6"},
    {"type": 'Civile', "article": "Art. 2"},
    {"type": 'Civile', "article": "Art. 3"}
]
UNWIND $data as row
CREATE (n:Code {Type: row.type, article: row.article})

maybe for your understanding I share to you my script for getting those data :

gawk '
 BEGIN { 
   RS = "Art\\. ?[aA-zZ]?\\.? [1-9]{0,3}-?[0-9]{0,3}-?[0-9]{0,3}" 
   nmots = 0
   nmots2 = 0
   x=0
   while (getline < "dico2etoile.txt") {    
     split($0, words, "**\n")
     for (i=1; i<=length(words); i++) {
         mots[++nmots] = words[i]
         }
   }
   close("dico2etoile.txt")
     while (getline < "dico2etoile.txt") {    
      split($0, words, "**\n")
      for (i=1; i<=length(words); i++) {
          mots2[++nmots2] = words[i]
          print "CREATE (D"nmots2":Dictionnaire {mots:'\''"mots[nmots2]"'\''})"
          }
    }
   close("dico2etoile.txt")
 }
 {
   x++
   print "CREATE (C" x ":Code {type:'\''Civile'\'', article:'\''"RT"'\''})"
   for (j=1; j<=(nmots-2); j++){
     if (index($0, mots[j]) > 0) {
       print "CREATE (D"j")-[:isIn ]->(C"x")"
     }
   }
 }
' code_civil.txt > code_civile_DATA.txt

Thanks you all for your response again if you have any idea how I can optimize more please help me

You approaching this from a different angle. You are parsing a text file of the data and creating the sequence of cypher statements to give you a cypher script. This is not the typical approach.

Instead, you want the data in a csv file that cypher can read and execute the create/merge operations. You can use apoc or cypher to batch the updates in batches. Since you are using a driver, you can also use @michael.hunger suggestion and batch the data and send through a driver in one transaction per batch.

Instead of creating the files for creating steps, create a file of the data. Then use a driver or cypher for neo4j-shell or neo4j desktop to execute a query to load the data and create the nodes.

Here is an example using the data your provided.

csv file:

Screen Shot 2023-06-06 at 10.07.18 AM

Simple import query batching in transactions of 10,000 rows each. Note, the ':auto' is need in Neo4j desktop and in a driver when using auto-commit transactions.

:auto
load csv with headers from "file:///articles.csv" as row
call {
    with row
    CREATE (n:Code {Type: row.type, article: row.article})
} in transactions of 10000 rows

Result: