Hello!
I have 2 node labels, one of them has around 750K nodes, and another one with almost 50millions nodes. I have a stand-alone neo4j 3.5 running with 8 core and 96g memory. I am creating a relationship between these node labels using apoc.periodic.itererate
but this query is been running for too long.
Some of the node label properties get refreshed every day and I want to be able to refresh the relationship once the node properties are updated in both labels. The last time when I ran the query to update the relationship, it was running for more than 8 hrs and I had to kill it.
Do you guys have any solution for this to speed up this process? Should I be doing this in a different way? If so, would you please let me know the approach.(I'm new to Neo4j).
Thanks!!
call apoc.periodic.iterate("match (i:Inventory) , (p:Products) where p.p3 <> '0' and i.p1= p.p1 OR where p.p3 <> '0' and i.p2= p.p2 return i,p","merge (i) -[:PRODUCES]->(p)",{batchSize:20000, parallel:false})
This is how the query looks like and I have indexes on p1,p2 and p3.
- Negation is expensive
(good formatting isn't)
First, lest get this one running faster:
MATCH (i:Inventory), (p:Products)
WHERE p.p3 <> '0' AND i.p1= p.p1 OR # this one is gonna *kill* your cpy and ram
WHERE p.p3 <> '0' AND i.p2= p.p2
RETURN i,p
So... let's make sure we only need to do that negation once, and group the rest of the WHERE
clause more carefully
MATCH (p:Products)
WHERE p.p3 <> '0'
WITH p
MATCH (i :Inventory)
WHERE i.p1= p.p1 OR i.p2= p.p2
RETURN i,p
...which will get you to a faster solution:
CALL apoc.periodic.iterate("
MATCH (p:Products) WHERE p.p3 <> '0' WITH p
MATCH (i :Inventory) WHERE i.p1= p.p1 OR i.p2= p.p2
RETURN i,p
","
MERGE (i) -[:PRODUCES]->(p)
",{batchSize:20000, parallel:false})
However, these would probably be better handled when importing data, or changes. When you're updating the inventory, the additional Cypher to adjust these relationships wouldn't have to check the entire DB. I strongly advise taking a very close look at whatever is creating/updating this data, as it will be orders of magnitude more efficient to do it then, instead of after the fact.
5 Likes
Thank you @tony.chiboucas!! I'll try your query and see if it helps.
I'm sorry about the formatting. :)
I absolutely agree with you. I should be creating this relationship when the Product
nodes are created/updated batch-wise. I am using golang-neo4j
driver and session.WriteTransaction
to write/update 20000 Product nodes at a time using go-routine. I will just plug in this cypher query into that go-routine function.
Since I am still in the testing phase I wanted to give this query a try by running manually in the browser.
Thanks again!
Could you share the query/command that is creating/updating the nodes to begin with?
sure .. here is it is - I pasted only the data create portions
func InsertToNeo(data interface{}) error {
m := ArrayStructToMap(data)
session, err := NewSession(InfraObj.NeoDriver)
if err != nil {
return err
}
defer session.Close()
_, err = session.WriteTransaction(**execute**(m), neo4j.WithTxTimeout(300*time.Second))
if err != nil {
return err
}
return nil
}
func execute(data interface{}) func(tx neo4j.Transaction) (interface{}, error) {
return func(tx neo4j.Transaction) (interface{}, error) {
result, err := tx.Run(
`unwind $events as event MERGE (a:Products {key: event.key}) SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key
.
.
.
.
.
a.p10 = event.p10`,
map[string]interface{}{"events": data})
if err != nil {
return nil, err
}
return nil, result.Err()
}
}
1 Like
I am still not sure how to plug these events
(data) to relationship creation query. If you know how to do it please let me know. That would save a ton of my time
Thanks in advance!
UNWIND $events as event
MERGE (a:Products {key: event.key})
SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key,
...
# Just add at the end...
WITH a
MATCH (i :Inventory)
WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i)-[:PRODUCES]->(a)
Note: This won't clean up old connections... so as your data changes, old i - p
relationships will persist. You'll want to come up with a strategy for when/how to clear old :PRODUCES
rels.
2 Likes
WOW, I honestly didn't think this is going to be this easy
Thanks so much!!!!
Yes, I am going to wipe the DB and reload the data from scratch.
Ohh .. a qq - do you think I still need to plug in the below query to my Products update
function? or just enough running this query in the create
function like above?
MATCH (i :Inventory)
WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i)-[:PRODUCES]->(a)
That's the point of doing it during CREATE
, the relationships are created then. No extra work needed. Everything better, faster, stronger.
gotcha.. thanks a ton, Sir!
Have a great day!
I'm sorry to bother you again. 
It seems that it's still taking too long to create the relationship. Below is my code snippet
result, err := tx.Run(
`unwind $events as event MERGE (a:Products {key: event.key}) SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key,
.
.
.
a.pN = event.pN
WITH a MATCH i:Inventory) WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i) -[:PRODUCES]->(a)`,
map[string]interface{}{"events": data})
For testing purposes, I am only creating 25 a
records at a time and trying to create PRODUCES
rel with 750K i
records. It's taking forever.... (I just killed the program).
If you have any insight on this please let me know. Thank you!
- Did you create indices on i.p1 and i.p2?
CREATE INDEX inventory_p1 FOR (i:Inventory) ON (i.p1)
CREATE INDEX inventory_p2 FOR (i:Inventory) ON (i.p2)
-
You're missing a parenthesis: WITH a MATCH i:Inventory)
-
Isolate code steps, and get the profile of that Cypher.
result, err := tx.Run(
`UNWIND $events AS event CREATE (a:TempEvent) SET a = event`,
map[string]interface{}{"events": data})
Once that's run once, open a Neo4j Browser, and run the next step there, with EXPLAIN
. That'll show you where the bottleneck is.
EXPLAIN
MATCH (a:TempEvent)
MATCH (i:Inventory) WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i) -[:PRODUCES]->(a)
Note: simplify
If you really are trying to copy all $event
properties onto a :Product
node, and an $event
never contains an object in a property, you can simply do the following:
result, err := tx.Run(
`UNWIND $events AS event
MERGE (a:Product {key: event.key}) SET a = event`,
map[string]interface{}{"events": data}
)
If an event does not meet that criteria, I'd give that Class a method which returns the structure you want in Neo that does meet that criteria.
Thank you for your time.
- It was created before running the program.
- Sorry it was copy/paste error here
- correct.. an
$event
does not contain an object and your unwind
just worked fine. I'll follow this for creating new nodes.
Here is the EXPLAIN
o/p. I didn't get much out of it. I am reading about EXPLAIN
now and meanwhile, I thought I would share the screenshot here. Thanks!
....i just tried below query in the browser without OR
condition.. it finished within 1ms.
MATCH (a:TempEvent)
MATCH (i:Inventory) WHERE i.p1 = a.p2
MERGE (i) -[:PRODUCES]->(a)
That profile looks good. It is using the index. These should be running pretty quick, even with such a large dataset.
... and how'd it do when you keep the OR
condition?
I had to kill it after 30 mins. I was tracking CPU usage as well while it was running. I did not see it's going beyond 13%
Okay, the for whatever reason, it's trying to load all the :Inventory
before applying the OR
to exclude them. Shouldn't be doing that, but that'll be a tough one to debug.
We can work around the problem by never using an OR, and just calling the MATCH ... MERGE
once for each property. The MERGE
prevents creating duplicate rels, so there will only ever be one or none [:PRODUCES]
for each i : p
pair.
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
...sorry there's not really a cleaner way to do that without writing your own Neo4j procedure.
Now at least we know why your process was taking forever... it was trying to load your entire :Inventory
into RAM.
Ok .. that makes sense. Thanks!!
(My earlier statement was incorrect so deleted it)
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
I tried this but it was still hanging and my program wouldn't complete. I killed it and interchanged the position of 2 queries in my code and voila it finished within a few ms.
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
I am going to test it against 5000 nodes now. Will see how it behaves.
Thanks!