I have Person nodes with basic string fields such (firstName,lastName, fatherName,motherName) and trying to link nodes based on those fields.
A simple query where I compare motherName to concatenation of first name and last name such as
match(p1:Person) match (p2:Person) where p1.motherName=p2.firstName+' '+ p2.lastName return p1,p2 limit 500
takes a really long time , (removing ' ' from the concatenation does not make a difference )
while if comparing exact fields such as
match(p1:Person) match (p2:Person) where p1.motherName=p2.firstName return p1,p2 limit 500
only takes a few seconds.
I have noticed something peculiar regarding transaction memory which is that in the first query the estimatedUsedHeapMemory is always 2097152 and currentQueryAllocatedBytes is 64, but I see the database is consuming around 7.5 GB of memory.
When running the 2nd query, the numbers for memory used for the heap and query are much bigger
I had successfully ran a query to link persons and fathers, that matches on exact fields, which took 2.5 hours. while the query for the mothers which needs to compare concatenated strings was still running after 9 hours with no result.
Query for father linking, which was successful.
CALL apoc.periodic.iterate(
"match (p1:Person) match(p2:Person) where p1.fatherName=p2.firstName and p1.lastName=p2.lastName and p1.registryTown=p2.registryTown and p1.registryKadaa=p2.registryKadaa and p1.registryMouhafaza=p2.registryMouhafaza and p1.registryNumber=p2.registryNumber and p1.dateOfBirth>p2.dateOfBirth return p1,p2",
"MERGE (p1)-[:CHILD_OF {parentRelationship:'FATHER'}]->(p2)",
{batchSize:5000}
)
I have 4 million nodes, my db size is 3.14 gb , these are my memory settings
- NEO4J_server_memory_heap_max__size=5G
- NEO4J_server_memory_heap_initial__size=5G
- NEO4J_server_memory_pagecache_size=7G