Hi,
I am trying to optimize the running time of the following code:
Attempt 1:
CALL apoc.periodic.iterate(
"MATCH path=((a:alias) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
WITH a, b, c1.CITATION as CITATION1, c2.CITATION as CITATION2, p1, p2, t, 2 as precision
WITH a, b, CITATION1, CITATION2, p1, p2, t, 10^precision as factor
WITH a, b, CITATION1, CITATION2, t, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, CITATION1, CITATION2, t, weight",
"CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight
SET e.topic = t.MESH_TERM
Set e.citation1 = CITATION1
Set e.citation2 = CITATION2", {batchSize:2500})
YIELD batches, total, errorMessages
Attempt 2:
CALL apoc.periodic.iterate(
"MATCH path=((c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation))
WHERE id(c1) < id(c2)
WITH c1.CITATION as CITATION1, c2.CITATION as CITATION2, p1, p2, t, 2 as precision
MATCH (a:alias) --> (c1)
WITH a, CITATION1, CITATION2, p1, p2, t, 10^precision as factor
MATCH (b:alias) --> (c2)
WHERE id(a) <> id(b)
WITH a, b, CITATION1, CITATION2, t, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, CITATION1, CITATION2, t, weight",
"CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight
SET e.topic = t.MESH_TERM
Set e.citation1 = CITATION1
Set e.citation2 = CITATION2", {batchSize:5000})
YIELD batches, total, errorMessages
I feel I can do much better so the second and third MATCH statements could be parallelized. Is there a more elegant way?
Thanks,
Lavanya