Query Optimisation Help - Aggregate count of relationships between two related nodes

Hi, I have a bit of a challenge around optimising a query to run more efficiently, as currently it takes too long.

Just some background: The database size is 90m nodes, 700m relationships - although this query is being tested on a much smaller sample database size of 1m nodes, 20m relationships and the issue is the same.

What I am trying to achieve is create an aggregate count of a specific relationship between two already related nodes. <- This is actually working, however the issue is it is taking an extremely long time even on the smaller database sample.

So for example, a Person has opened a Campaign 5 times - so there is 5 :OPENED relationships between Person and Campaign. The objective is to create a summary relationship that includes this total count as a property on the new relationship being added TOTAL_OPENED.

// Starting with this
(p:Person)-[:OPENED]->(c:Campaign) (x5 times on the same Person and Campaign node pairs)

// Ending with this
(p:Person)-[newRel:TOTAL_OPENED]->(c:Campaign) SET newRel.total = 5

Here is the shape of the model I am starting with: (note that this is just one sample node pair being returned - the query actually creates a new relationship for 1m node pairs)

example-relationship1

Here is how it ends up: (note the new relationship TOTAL_OPENED created)

example-relationship2

Here is the query being run:

PROFILE MATCH (p:Person)-[rel:OPENED]->(c:Campaign)
WHERE NOT (p)-[:TOTAL_OPENED]->(c)
WITH p, rel, c LIMIT 2000
WITH COUNT(c) AS total, p, c
CREATE (p)-[totalRel:TOTAL_OPENED]->(c)
SET totalRel.total = total
RETURN p.emailId, total, c.campaignId

This query is being run with apoc.periodic.commit in batches of 2,000.

I have tried it also with putting the COUNT(c) AS total BEFORE the LIMIT, however it seems to increase the number of DB hits, not reduce it.

Here is a profile summary of the query:

Using the apoc.periodic.commit in 2k batches it took 1 hour 6 mins to actually do this on the 1m sample database.

Appreciate any help and all suggestions, thanks.

Instead for creating a new relation to accumulate the number of visit, increase a count property in your relation will make the whole thing drastically easier.

Adding a constraint to your Person node will help to improve the update query performance. If you have a lot of Person the summary will still take a while as you will anchor on each person and read a property value for each relation but will still be a lot faster.

Thanks Gabriel - I am actually doing this already for future updates. Hence the newRel.total = 5 part, will be updated specifically using constraint against person.personId in a lookup.

However, the challenge is the data fed in already has these original relationships created prior to needing to introduce this single new relationship.

Am I just going to have to bite the bullet and run this query until it completes? I guess so. In that case, do you know of the best practice to setting the apoc.periodic.commit batch number - since I am currently using 2,000 batches.