Ideas to remove duplications in parameterized insert query

Hi all,

You can skip the setup and go straight to the problem, at the end there is my idea about the query I'd like.

Background:
I have been implementing Neo4j to store and access a large multigraph of a particular form: Nodes are relatively few (~30k), but there are many edges with have multiple parameters such as a timestamp.

In my use case, I need to query edges that fall into certain periods of time. For that reason, I have followed your advise and implemented edges as nodes in Neo4j. Typically, we have

(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)

This works great!

Additionally, I am now required to keep track on what I call "contextual ties". These are ties between edge-nodes and other normal nodes. I use these to further condition my queries, picking up only edge-nodes that also have a contextual tie to nodes that are relevant

(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)
(e:edge {weight:x, time:y})-[:conto]->(c:context {weight:x})-[:conto]->(b:node)

A typical query looks like this:

MATCH p=(a:node)-[:onto]->(r:edge)-[:onto]->(b:node {idx:x}) 
WHERE  ALL(r in nodes(p) WHERE size([(r) - [: conto]->(:context) - [: conto]->(e:node) WHERE e.idx IN [k,z] | e]) > 0 OR(r: node))  
RETURN ...

The second line was an idea from this board, it essentially means the query gives me a,r,b but only if there is a "context" tie between edge e and another set of nodes [k,z].

Again, this works great.

Problem:

For each insert operation, the contextual ties are always the same for each edge-node. The way I insert contextual ties leads to duplication. Thus, my database grows exponentially large.

Assume I have one focal node a, and two edges to b and c.
Let's say I have one contextual tie to node k.
What I'd like to have is

(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c0)-k
(e2)-(c0)-k (same path c0 to k)

What I get is

(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c1)-k
(e2)-(c2)-k

So there are two, instead of one, context nodes.
In reality, I'll have up to 10 context nodes. So you can imagine if I insert a large number of edges e, then I get n(e)*10 instead of 10 new context nodes.

I obviously need to rely heavily on parameters, since I am adding a lot of connections starting at some node a and adding up to 50 edges to different nodes b,c etc.

Here is my parameterized query, starting always at some ego node and adding alters:

Parameters (not in correct Cypther, sorry, you get the idea):

ego: "a" 
ties: [ {alter: "b", weight:0.5, time: 100}, {alter: "c", weight:0.7, time: 100} ....]
contexts: [{alter: "k", weight:0.3, time: 100}, {alter: "z", weight:0.4, time: 100} ... ]

Query

MATCH (a:node {idx: $ego}) 
WITH a UNWIND $ties as tie 
MATCH (b:node {idx: tie.alter}) 
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a) 
WITH r UNWIND $contexts as con 
MATCH (q:node {idx: con.alter}) WITH r,q,con 
MERGE (r)-[:conto]->(c:context {weight:con.weight, time:con.time })-[:conto]->(q)

Again, this works great, but even though I use MERGE to create contextual ties, it adds a new c:context node for each e:edge node.

I can not really come up with a way to get it working otherwise, while still relying on one collection of parameterized lists that I can pass from my application.

I really need to avoid using two queries and re-matching all edges. The first operation is only performant if it is a pure CREATE.. However, edges never need to be merged, they are always unique. Performance of the above query IS good, but it fills the database exceptionally quickly.

I can start either with edges or contexts, however WITH seems to return always the current path in the unwind. Instead, I'd need to use a WITH that returns ALL edges (or all contexts) that I have created and use those to create the other ties.

Something like

MATCH (a:node {idx: $ego}) 
WITH a UNWIND $ties as tie 
MATCH (b:node {idx: tie.alter}) 
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a) 
SAVE ALL r
UNWIND $contexts as con 
MATCH (q:node {idx: con.alter}) WITH r,q,con 
CREATE (c:context {weight:con.weight, time:con.time })-[:conto]->(q)
SAVE ALL c

FOR EACH PRODUCT (r,c)
CREATE (r)-[:conto]->(c)

I just can't get this to work without re-querying for r and c, which would be to slow.

There probably is a simple solution to fix this? Would you have any idea?

Many thanks!

0 REPLIES 0
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.


Free NODES Training Series


October 19th -

Intro to Neo4j


October 20th -

Healthcare Analytics Using Neo4j


October 25th -

Handling Neo4j data with Apache Hop


October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j


November 2nd -

Graph EDA Using the Neo4j GDS Client