Slow Edge creations

Hello
Looking for some guidance please.
I believe I am doing all the rights things but guess not.
I have a table with network start addresses and end addresses in it, and have code that looks to see if one range of addresses is within another. ( nested ranges )

Row Num Network_IP_X Network_Broadcast_X Range
1 192.168.010.000 192.168.010.255 192.168.010.000 / 192.168.010.255
2 192.168.010.000 192.168.010.127 192.168.010.000 / 192.168.010.127
3 192.168.010.032 192.168.010.063 192.168.010.032 / 192.168.010.063
4 192.168.020.000 192.168.020.255 192.168.020.000 / 192.168.020.255

Row 3 is nested within row 2 and row 2 is nested within row 1
I have indexes defined on all the three IP address fields.
and the following code works fine on small volumes.

profile MATCH (V:Vlans)
MATCH (VR:Vlans)
WHERE V.Range <> VR.Range
AND V.Network_Broadcast_X<=VR.Network_Broadcast_X
AND V.Network_IP_X>=VR.Network_IP_X
CREATE (V)-[:Vlans2Vlans__Aggregates {Aggregate: ['Part of']}]->(VR) ;

but when you start to use larger data volumes it takes a while to execute.
with 56,000 nodes it takes about 14 minutes to create 195,000 relationships
( there is a lot of nesting when dealing with global routing tables.
the results of the profile indicate that I am using indexing over table scans but
I need something quicker as I am using this type of code/methodology in other areas with larger data sets
Profile

thanks in advance
Ian

Not sure if the Network_IP_X index is most selective?
Do you have an index on Network_Broadcast_X too?
you can force index usage by USING INDEX VR:Vlans(Network_Broadcast_X)

what numbers of relationships do you expect to be created, in the cross product of 56,000 Vlans nodes? (3.1 bn pairs).

In general you also want to batch those updates, e.g. per 1000 V nodes (or 10k depending on how many relationships are created per V node)

profile MATCH (V:Vlans)
CALL { WITH V
MATCH (VR:Vlans)
WHERE V.Range <> VR.Range
AND V.Network_Broadcast_X<=VR.Network_Broadcast_X
AND V.Network_IP_X>=VR.Network_IP_X
CREATE (V)-[:Vlans2Vlans__Aggregates {Aggregate: ['Part of']}]->(VR) 

} IN TRANSAXTIONS OF 1000 ROWS;

Hi
& thanks for the quick response
Yep I have all 3 indexed as part of the data creation.
DROP INDEX I_InfoBlox__Range IF EXISTS ;
CREATE INDEX I_InfoBlox__Range For (INFB:InfoBlox) ON (INFB.Range) ;
;
DROP INDEX I_InfoBlox__Network_IP_X IF EXISTS ;
CREATE INDEX I_InfoBlox__Network_IP_X For (INFB:InfoBlox) ON (INFB.Network_IP_X) ;
;
DROP INDEX I_InfoBlox__Network_Broadcast_X IF EXISTS ;
CREATE INDEX I_InfoBlox__Network_Broadcast_X For (INFB:InfoBlox) ON (INFB.Network_Broadcast_X) ;
LOAD CSV WITH HEADERS FROM .................. etc etc

I was working in the assumption that the initial writes might take a hit on creation but the reading will be faster thereafter.
The eventual cross product size is hard to say but wouldn't expect it ( in this case at least ) to be much higher than the 392,000 it creates at the moment.

Do I need to force the use of an index, the profile suggested it was already and I am using all 3 in effect ( I think :) )

I'll add the transactions clauses on the next run, will it increase performance or just reduce the memory overheads ?

I don't think this is every going to be fast. In effect, your are forming a Cartesian product of all your VLans nodes, and then filtering each pair to see if they match. As @michael.hunger pointed out, there are 3.1 billion pairs for your 56,000 node scenario. It's going to grow as the square of your node population.

If you look at the full plan, it shows that it is taking each node and then finds the other nodes that conform to one of your constraints. It uses one index for that. It then has to row filter each of these candidate nodes to apply the other two predicates to get the result for one node.

I lived in Hope. :slight_smile:
I knew it was Cartersian and swapped it the , for a MATCH to stop the interface complaining but didn't see any improvements in DB hits etc.
Guess I'll set it off running at the start of Lunch in future.
many thanks for you thoughts.

Please don’t forget @michael.hunger suggestion to batch in transactions

Tried two variants of batch.
:auto MATCH (V:Vlans)
CALL { WITH V
.............
} IN TRANSACTIONS OF 10000 ROWS;
and again with 5000
time jumped from ~14 minutes to ~1hour for both.
Fortunate to be memory rich so will run without batching for now.
If I need to delete, then that runs in less than 1 minute