Slow Edge creations

neo4j.wbd · June 27, 2023, 12:21pm

Hello
Looking for some guidance please.
I believe I am doing all the rights things but guess not.
I have a table with network start addresses and end addresses in it, and have code that looks to see if one range of addresses is within another. ( nested ranges )

Row Num	Network_IP_X	Network_Broadcast_X	Range
1	192.168.010.000	192.168.010.255	192.168.010.000 / 192.168.010.255
2	192.168.010.000	192.168.010.127	192.168.010.000 / 192.168.010.127
3	192.168.010.032	192.168.010.063	192.168.010.032 / 192.168.010.063
4	192.168.020.000	192.168.020.255	192.168.020.000 / 192.168.020.255

Row 3 is nested within row 2 and row 2 is nested within row 1
I have indexes defined on all the three IP address fields.
and the following code works fine on small volumes.

profile MATCH (V:Vlans)
MATCH (VR:Vlans)
WHERE V.Range <> VR.Range
AND V.Network_Broadcast_X<=VR.Network_Broadcast_X
AND V.Network_IP_X>=VR.Network_IP_X
CREATE (V)-[:Vlans2Vlans__Aggregates {Aggregate: ['Part of']}]->(VR) ;

but when you start to use larger data volumes it takes a while to execute.
with 56,000 nodes it takes about 14 minutes to create 195,000 relationships
( there is a lot of nesting when dealing with global routing tables.
the results of the profile indicate that I am using indexing over table scans but
I need something quicker as I am using this type of code/methodology in other areas with larger data sets
Profile

thanks in advance
Ian

michael.hunger · June 27, 2023, 12:36pm

Not sure if the Network_IP_X index is most selective?
Do you have an index on Network_Broadcast_X too?
you can force index usage by USING INDEX VR:Vlans(Network_Broadcast_X)

what numbers of relationships do you expect to be created, in the cross product of 56,000 Vlans nodes? (3.1 bn pairs).

In general you also want to batch those updates, e.g. per 1000 V nodes (or 10k depending on how many relationships are created per V node)

profile MATCH (V:Vlans)
CALL { WITH V
MATCH (VR:Vlans)
WHERE V.Range <> VR.Range
AND V.Network_Broadcast_X<=VR.Network_Broadcast_X
AND V.Network_IP_X>=VR.Network_IP_X
CREATE (V)-[:Vlans2Vlans__Aggregates {Aggregate: ['Part of']}]->(VR) 

} IN TRANSAXTIONS OF 1000 ROWS;

neo4j.wbd · June 27, 2023, 1:29pm

Hi
& thanks for the quick response
Yep I have all 3 indexed as part of the data creation.
DROP INDEX I_InfoBlox__Range IF EXISTS ;
CREATE INDEX I_InfoBlox__Range For (INFB:InfoBlox) ON (INFB.Range) ;
;
DROP INDEX I_InfoBlox__Network_IP_X IF EXISTS ;
CREATE INDEX I_InfoBlox__Network_IP_X For (INFB:InfoBlox) ON (INFB.Network_IP_X) ;
;
DROP INDEX I_InfoBlox__Network_Broadcast_X IF EXISTS ;
CREATE INDEX I_InfoBlox__Network_Broadcast_X For (INFB:InfoBlox) ON (INFB.Network_Broadcast_X) ;
LOAD CSV WITH HEADERS FROM .................. etc etc

I was working in the assumption that the initial writes might take a hit on creation but the reading will be faster thereafter.
The eventual cross product size is hard to say but wouldn't expect it ( in this case at least ) to be much higher than the 392,000 it creates at the moment.

Do I need to force the use of an index, the profile suggested it was already and I am using all 3 in effect ( I think :) )

I'll add the transactions clauses on the next run, will it increase performance or just reduce the memory overheads ?

glilienfield · June 27, 2023, 1:45pm

I don't think this is every going to be fast. In effect, your are forming a Cartesian product of all your VLans nodes, and then filtering each pair to see if they match. As @michael.hunger pointed out, there are 3.1 billion pairs for your 56,000 node scenario. It's going to grow as the square of your node population.

If you look at the full plan, it shows that it is taking each node and then finds the other nodes that conform to one of your constraints. It uses one index for that. It then has to row filter each of these candidate nodes to apply the other two predicates to get the result for one node.

neo4j.wbd · June 27, 2023, 2:54pm

I lived in Hope.
I knew it was Cartersian and swapped it the , for a MATCH to stop the interface complaining but didn't see any improvements in DB hits etc.
Guess I'll set it off running at the start of Lunch in future.
many thanks for you thoughts.

glilienfield · June 27, 2023, 3:40pm

Please don’t forget @michael.hunger suggestion to batch in transactions

neo4j.wbd · June 28, 2023, 10:44am

Tried two variants of batch.
:auto MATCH (V:Vlans)
CALL { WITH V
.............
} IN TRANSACTIONS OF 10000 ROWS;
and again with 5000
time jumped from ~14 minutes to ~1hour for both.
Fortunate to be memory rich so will run without batching for now.
If I need to delete, then that runs in less than 1 minute

Topic		Replies	Views
Slow Cypher Query Help Needed! Cypher performance	6	942	January 16, 2020
Too slow adding relationships to 1.8 billion nodes with CYPHER Cypher	15	4172	July 20, 2021
Create edge Cypher cypher	1	428	August 7, 2021
Optimising query performance with a relatively simple match Cypher performance	3	656	July 3, 2020
Trying to squeeze as much performance as I can out of this complex query Cypher	2	507	December 31, 2019

Get Certified in June!

Slow Edge creations

Related topics