Skip nulls in apoc.create.vNode

Hi,

I'm trying to create virtual nodes and set properties on some of them if they're not null. The virtual node property creation is being done using list comprehension as I have to collect a bunch of nodes to make into virtual ones, dependent on the value of a property in the original node. There can't be any nulls as I have to create relationships between them, and neo4j does not handle nulls. Is there a way around this? The code I have so far is below:

MATCH (m)-[x]->(n)
WHERE m.date_live > date({year:2020, month:6})
AND m.date_decommissioned < date({year:2024, month:6})
AND n.date_live > date({year:2020, month:6})
AND n.date_decommissioned < date({year:2024, month:6})
WITH COLLECT(DISTINCT m.name) AS sources,
	COLLECT(DISTINCT n.name) AS targets, m
WITH [source IN sources | apoc.create.vNode(['Source'],{name:source, data_sent:split(m.data_sent, ",")})] AS sourcenodes,
	[target IN targets | apoc.create.vNode(['Target'],{name:target})] AS targetnodes
WITH apoc.map.groupBy(sourcenodes, 'name') AS vsource,
	apoc.map.groupBy(targetnodes, 'name') AS vtarget
MATCH (m)-[x]->(n)
WITH vsource, vtarget, m, x, n
WHERE m.date_live > date({year:2020, month:6})
AND m.date_decommissioned < date({year:2024, month:6})
AND n.date_live > date({year:2020, month:6})
AND n.date_decommissioned < date({year:2024, month:6})
return vsource,
	vtarget

The nodes are collected and made into virtual ones dependent on if their live and decommission dates are within the boundaries set. I want to make sure that the data_sent property is either the list of data sent, or not set at all, rather than being the list or a null, as I want relationships between source and target.

Otherwise, is there a way to iteratively copy the entire original node (so name, data_sent, and all the other properties listed in it) so that it won't create categories that don't exist in the original?

Thanks in advance!

You can get a map of an entity 'r' with properties(r). You can then use this to set the properties of another entity.

match(n) where 'some predicate'
match(m) where 'some other predicate'
set m = properties(n)

You can also pick out specific properties using map projection:

match(n) where 'some predicate'
match(m) where 'some other predicate'
set m = n{.name, .data_sent}

Or, if you want all of a nodes properties and want to add additional ones:

match(n) where 'some predicate'
match(m) where 'some other predicate'
set m = n{.*, prop1:1, prop2:"A"}

Note: not sure what the point of your second match is in your query. You are duplicating the query from your first match and not using the results. If you do need the nodes again latter in the query, then you may want to collect them like you did their names and pass them through so you don't have to repeat your match.

You can also use apoc.map.clean to clean specific values from a map.

I can't find another way to display the correct virtual nodes and establish the relationships based on how the original nodes relate to one another in the database- removing the second match query throws the error "Failed to invoke function apoc.create.vRelationship: Caused by: java.lang.NullPointerException: The inserted Start Node is null", which to me looks like it can't find the correct place in memory to create the requisite relations

Can you provide the complete query? I don't see where you are creating virtual relationships.

This is the whole query. In the list comprehension line here, it creates source and target nodes from the m sources and n targets that match the date range in the MATCH query above it:

WITH [source IN sources | apoc.create.vNode(['Source'],{name:source, data_sent:split(m.data_sent, ",")})] AS sourcenodes,
	[target IN targets | apoc.create.vNode(['Target'],{name:target})] AS targetnodes

It then creates maps of the virtual nodes that are retrieved from memory with the second match query.

The issue is, when I'm creating a set of virtual nodes, I can do something like {name:source, data_sent:split(m.data_sent, ",")} to set the properties, but then it sets all the ones without a data_sent as null, which means I then can't create relationships between the nodes because cypher doesn't handle nulls (usually I use foreach and merge, but I can't do that within create.vNode)

Your first query does not have the code that creates the virtual relationships. That would help me understand.

You should be able to create virtual relationships between the vSource and vTarget nodes.

You can remove null values from the map with apoc.map.clean. You can also remove specific keys and values.

When I run the query as it is presented in the first post, it creates virtual nodes, just iteratively rather than manually doing it one by one. I can also add relationships when I'm not trying to iteratively attach another property based on the original node, using the code below:

MATCH (m)-[x]->(n)
WHERE m.date_live > date({year:2020, month:6})
AND m.date_decommissioned < date({year:2024, month:6})
AND n.date_live > date({year:2020, month:6})
AND n.date_decommissioned < date({year:2024, month:6})
WITH COLLECT(DISTINCT m) AS sources,
	COLLECT(DISTINCT n.name) AS targets, m, n
WITH [source IN sources | apoc.create.vNode(['Source'],{name:source})] AS sourcenodes,
	[target IN targets | apoc.create.vNode(['Target'],{name:target})] AS targetnodes, m, n
WITH apoc.map.groupBy(sourcenodes, 'name') AS vsource,
	apoc.map.groupBy(targetnodes, 'name') AS vtarget
MATCH (m)-[x]->(n)
WITH vsource, vtarget, m, x, n 
WHERE m.date_live > date({year:2020, month:6})
AND m.date_decommissioned < date({year:2024, month:6})
AND n.date_live > date({year:2020, month:6})
AND n.date_decommissioned < date({year:2024, month:6})
return vsource,
	vtarget,
	apoc.create.vRelationship(vsource[m.name], 'Sends to',
		{date_live:m.date_live, date_decommissioned:m.date_decommissioned},
		vtarget[n.name]) AS rel

The only difference in this code is that I'm not attempting to add another property to the virtual node.

What I ideally want to do is take the data_sent property from the original node and add it to the virtual node, either once the vNodes are created, or during the loop that creates them, or after both nodes and relationships are established, and skip over the null data_sent values so that I can still create the relationships as I can in the above.

I'll take a look into how apoc.map.clean works, thanks :) looks like that might be a promising means of finding a solution!

Where is the 'data_sent' value(s)? Are the values sometime null? If so, what behavior do you want?

I think this may do the same as your query, but it avoids a second match.

MATCH (m)-[x]->(n)
WHERE m.date_live > date({year:2020, month:6})
AND m.date_decommissioned < date({year:2024, month:6})
AND n.date_live > date({year:2020, month:6})
AND n.date_decommissioned < date({year:2024, month:6})
WITH COLLECT({m:m, n:n}) as nodes,
[source IN COLLECT(DISTINCT m.name) | apoc.create.vNode(['Source'],{name:source})] AS sourcenodes,
[target IN COLLECT(DISTINCT n.name) | apoc.create.vNode(['Target'],{name:target})] AS targetnodes
WITH apoc.map.groupBy(sourcenodes, 'name') AS vsource,
	apoc.map.groupBy(targetnodes, 'name') AS vtarget, nodes
UNWIND nodes as node
WITH node.n as n, node.m as m, vsource[node.m.name] as source, vtarget[node.n.name] as target
return n, m,
	apoc.create.vRelationship(source, 'Sends to',
		{date_live:m.date_live, date_decommissioned:m.date_decommissioned},
		target) AS rel