Create relationship when MATCH doesn't find a node in a bulk operation using UNWIND

I am using the nodejs package to run cypher queries against neo4j. I find the UNWIND option very handy to bulk-insert data.
Some of my nodes can have multiple relationships, but since I am doing a bulk insertion, some relationship IDs in the source data are null and then the cypher query fails for those objects.

Here is what I have done:

Some entries to UNWIND:

var entries  =  [
        {
          eid: '12313',
          ident: 'ID1',
          v: 123123,
          ref1_eid: '1492082',
          ref2_eid: '285654'
        },
        {
          eid: '22325',
          ident: 'ID2',
          v: 551,
          ref1_eid: null,
          ref2_eid: '1490392825'
        },
        {
          eid: '1163',
          ident: 'ID3',
          v: 9985,
          ref1_eid: '12466',
          ref2_eid: null
        }
      ];

The cypher query looks like this:

UNWIND $entries as res 
WITH res as res
MERGE (me:Me {eid: res.eid, ident: res.ident, v: res.v})
WITH me, res MATCH (a:TypeA {eid: res.ref1_eid})
MERGE (me)-[:GOES_TO]->(a)-[:GOES_FROM]->(me)
WITH me, res MATCH (b:TypeB {eid: res.ref2_eid})
MERGE (me)-[:GOES_TO]->(b)-[:GOES_FROM]->(me);

I run the command like this then

await session.run(
          query, {entries: entries}
        )

The first object is entered completely with the two relationships. The third one has the first relationship, but not the second, since that is null (which is expected). But the second object does not contain any relationships, since ref1_eid is null and the execution stops after the unsuccessful MATCH.

How could I achieve this bulk insert, where only those relationships are skipped, where nodes can't be found, but all others are added?

This will avoid the query terminating when match on TypeA is not found.

UNWIND $entries as res 
MERGE (me:Me {eid: res.eid, ident: res.ident, v: res.v})
WITH me, res
call {
    WITH me, res
    MATCH (a:TypeA {eid: res.ref1_eid})
    MERGE (me)-[:GOES_TO]->(a)-[:GOES_FROM]->(me)
}
call {
    WITH me, res 
    MATCH (b:TypeB {eid: res.ref2_eid})
    MERGE (me)-[:GOES_TO]->(b)-[:GOES_FROM]->(me)
}

This is a solution if you don't want to match on TypeA or TypeB nodes when their corresponding 'eid' value is null.

UNWIND $entries as res 
MERGE (me:Me {eid: res.eid, ident: res.ident, v: res.v})
WITH me, res
call {
    WITH me, res
    WITH me, res
    WHERE res.ref1_eid is not null
    MATCH (a:TypeA {eid: res.ref1_eid})
    MERGE (me)-[:GOES_TO]->(a)-[:GOES_FROM]->(me)
}
call {
    WITH me, res 
    WITH me, res
    WHERE res.ref2_eid is not null
    MATCH (b:TypeB {eid: res.ref2_eid})
    MERGE (me)-[:GOES_TO]->(b)-[:GOES_FROM]->(me)
}

BTW- I switch the value of 'r' with the value of 'b' in the last merge, as I thought it was a mistake.

1 Like

Thank you so much @glilienfield!
That works as expected.
I think I have to get more familiar with the WITHstatement and why I need to repeat it so many times...

P.S.: I also fixed the r and b values as you said

1 Like

The “with” is used in almost any complex query to derive intermediate results or to aggregate data, or filter data between phases. Sometimes it is required between clauses, such as a “call” after a “merge”. It’s very good to understand.

Thanks!

Now, one last step to success for me: as I am using a large data set for the import with quite some parameters per node, I can't guarantee, that all parameters are not null.
That fact results in the full command not working.

I read about COALESCE to set a default value in case of null, but I would rather nor write the parameter at all...

I would be happy to help if you send the latest query and more realistic date.

In the meantime, or if you want to try yourself, I would look at using the apoc.map.clean function to remove the null values from the node properties that you want to set. There a few ways of creating the map of properties that you want to set for a node from your 'row' data. You can create a node's properties as a literal map, or using map projection.

Here is a simple example showing how to remove the null values from a literal map, while leaving all non-null values in the map:

with {a:"xyz", b:100, c:null, d:1003.54, e:null} as data
return apoc.map.clean(data, [], []) as nullsRemoved

Here is an example of also specifying a key to remove from the map:

with {a:"xyz", b:100, c:null, d:1003.54, e:null} as data
return apoc.map.clean(data, ["d"], []) as specifyKeysToRemoves

Here is an example of using map projection to create a new map of the values you want to extract from another map row:

with {a:"xyz", b:100, c:null, d:1003.54, e:null} as data
with data{.a, .b, .c} as map
return apoc.map.clean(map, [], []) as nodeProperties

Finally, here is an example of creating the node properties map from another map (such as your row data) and merging a node based on its unique id and then setting the node's properties:

with {id:0, a:"xyz", b:100, c:null, d:1003.54, e:null} as row
with apoc.map.clean(row{.id, .a, .b, .c}, [], []) as nodeProperties
merge(n:X{id:0})
set n = nodeProperties
return properties(n)

Map projections also allow you add keys that are not in the original map being projected. Here is an example where newMap is created from projecting all the key/value pairs from 'row' and adding a new key/value pair:

with {id:0, a:"xyz", b:100, d:1003.54} as row, 139 as keyValue
return row{.*, key:keyValue} as newMap

@glilienfield
The problem is not so much removing null/non existing parameters from the parameter list: The query fails, if a parameter is null or not existing.

So I found two possible ways to get this working:

  • COALESCE
UNWIND $entries as res 
MERGE (me:Me {eid: COALESCE(res.eid, ''), ident: COALESCE(res.ident, ''), v: COALESCE(res.v, '')})
WITH me, res
call {
    WITH me, res
    WITH me, res
    WHERE res.ref1_eid is not null
    MATCH (a:TypeA {eid: res.ref1_eid})
    MERGE (me)-[:GOES_TO]->(a)-[:GOES_FROM]->(me)
}
call {
    WITH me, res 
    WITH me, res
    WHERE res.ref2_eid is not null
    MATCH (b:TypeB {eid: res.ref2_eid})
    MERGE (me)-[:GOES_TO]->(b)-[:GOES_FROM]->(me)
}

That way, the null values are still entered with some defined default value, which isn't that nice...

  • SET
UNWIND $entries as res 
MERGE (me:Me {eid: res.eid})
SET me.ident = res.ident
SET me.v = res.v
WITH me, res
call {
    WITH me, res
    WITH me, res
    WHERE res.ref1_eid is not null
    MATCH (a:TypeA {eid: res.ref1_eid})
    MERGE (me)-[:GOES_TO]->(a)-[:GOES_FROM]->(me)
}
call {
    WITH me, res 
    WITH me, res
    WHERE res.ref2_eid is not null
    MATCH (b:TypeB {eid: res.ref2_eid})
    MERGE (me)-[:GOES_TO]->(b)-[:GOES_FROM]->(me)
}

The 'SET' approach looks cleaner:

  • It does not enter a property, if that property is null
  • It also only merges on the eid, so it would update an existing node with that same eid as well.

Yeah, you can’t merge when a property provided is null. Your second option is the one I would use too. It also removes the other two parameters from the merge. Generally you only merge with one property; the one that represents the unique identifier for that label. You then can set others if necessary. This is the best because the properties given in the merge are used to match the node to find it first, before creating it if not found.

Btw, merge has two optional clauses that can be used if needed. One is “on create set” and the other is “on match set”. This allows you to set different properties or values depending on whether the node was found or created. You can still follow these with any number of “set” clauses in case you want to set properties regardless of match vs created.

Good job finding your solution.

1 Like