Updating Node Label, without changing graph itself, using virtual nodes and relationships causes duplicate nodes

I am working with citation data. In academia, paper_0 will be cited by paper_1 which will be cited by paper_2, so on and so forth. During the creation of our database, we added labels to all paper_1 nodes called one_hop and labels to all paper_2 nodes called two_hop. It becomes complicated when a one_hop node is also a two_hop node to a different paper. This means on any given node, it could have a one_hop label and a two_hop label.

I need a solution to that overwrites the label of the nodes when running, what we call the two hop analysis, so each node has one label which can be used to color the nodes in Bloom appropriately. In addition, the intended solution will not update the graph itself. That is where I found Virtual Nodes and Relationships.

I am close to a solution but it is not working perfectly:

Match (n:Paper {paperid: '7608367'})
OPTIONAL MATCH (n)<-[r1:REFERS]-(o:one_hop)
OPTIONAL MATCH (o)<-[r2:REFERS]-(t:two_hop)
CALL apoc.create.vNode(['one_hop'],o{.*}) yield node as one
CALL apoc.create.vNode(['two_hop'],{title:t.papertitle}) yield node as two
call apoc.create.vRelationship(one,'REFERS',{},n) yield rel as rel1
call apoc.create.vRelationship(two,'REFERS',{},one) yield rel as rel2
return n, one, two, rel1, rel2

The above seems to be creating duplicated nodes. See screenshot below:

If I were solving this issue in SQL I would simply group by n, one but I am not sure how to do that in Cypher.

I am relatively new to Neo4j so I appreciate any patience afforded :)

Thanks in advance for any help and let me know if there are questions

I think the duplication comes from the order you are matching and creating the nodes/relationships. Your two optional match statements executed back-to-back will create rows where the same value of 'n' and 'o' are repeated for each row resulting from the second match on the value of 'o'. This then creates duplicate vNodes for 'o' and corresponding vRelationships.

For example, you could get rows of data like the following. When each row is passed through to the create.vNode on 'o' procedure, you will get two virtual nodes for '01', two for '02', one for '03', and three for '04'.

n, o1, t0
n, o1, t1
n, o2, t2
n, o2, t3
n, o3, t4
n, o4, t5
n, o4, t6
n, o4, t7

The following query creates the virtual nodes and relationships for the values of 'o' first, then matches on each 'o' to get its corresponding 't' nodes. This should avoid duplicating the virtual nodes of the 'o' nodes.

MATCH (n:Paper {paperid: '7608367'})
MATCH (n)<-[r1:REFERS]-(o:one_hop)
CALL apoc.create.vNode(['one_hop'],o{.*}) yield node as one
CALL apoc.create.vRelationship(one,'REFERS',{},n) yield rel as rel1
WITH n, o, one, rel1
OPTIONAL MATCH (o)<-[r2:REFERS]-(t:two_hop)
CALL apoc.create.vNode(['two_hop'],{title:t.papertitle}) yield node as two
call apoc.create.vRelationship(two,'REFERS',{},one) yield rel as rel2
return n, one, two, rel1, rel2

Hi @glilienfield,

This is great and works exactly as intended! Your efforts are much appreciated.

The query works perfectly in Bloom but seems to break in Browser with the following error:


Because our end users will be using Bloom I am not too concerned about the browser issue. That being said, if you do have insight on this I would appreciate it.

One more question, it would be great if I could run:

CALL apoc.create.vNode(['two_hop'],t{.*}) yield node as two

the above code uses

t{.*}

as opposed to

{title:t.papertitle}

the same way I do for the one_hop nodes, but am getting the following ominous error:
image

I assume this is because there are (t) nodes that don't exist because of the optional match. Do you have any advice on solving this? I could, but would rather not list all of the properties as such:

{title:t.papertitle, citationcount:t.citationcount, ..... paperid:t.paperid}

I don't have any insight on the first issue. Maybe comment out some code and rerun to see if you can identify which line is the cause.

You are correct on the second issue. I suspect this occurs when the match does not return anything, thus 't' is null. 't' being null, means 't{.*}' is null. This null value is passed to apoc.create.vNode, resulting in a null pointer exception from within the procedure. This does not occur with '{title:t.papertitle}', since this evaluates to '{title:null}', which does not cause a null pointer exception when passed to apoc.create.vNode. Sorry, I don't have data to test it.

You can try this refactored query:

MATCH (n:Paper {paperid: '7608367'})
MATCH (n)<-[r1:REFERS]-(o:one_hop)
CALL apoc.create.vNode(['one_hop'],o{.*}) yield node as one
CALL apoc.create.vRelationship(one,'REFERS',{},n) yield rel as rel1
WITH n, o, one, rel1
OPTIONAL MATCH (o)<-[r2:REFERS]-(t:two_hop)
CALL apoc.do.when(
    t is not null,
    "
        CALL apoc.create.vNode(['two_hop'],t{.*}) yield node
        CALL apoc.create.vRelationship(two,'REFERS',{},one) yield rel
        RETURN node as two, rel as rel2
    ",
    "",
    {t:t, one:one}
) yield value
return n, one, rel1, value.two as two, value.rel2 as rel2

Hi @glilienfield,

Thanks for the update and no need to apologize, I didn't provide the data! :smiley:

Interestingly, the updated cypher works in browser but not in Bloom!

The bloom failure produced the following:
AN ERROR OCCURRED WHILE EXECUTING SEARCH: Writing in read access mode not allowed. Attempted write to internal graph 1

image

I found the following Writing in read access mode not allowed when using Bloom and Data explorer for Neo4j Plugins and Problem writing to 4.1x · Issue #137 · adam-cowley/neode · GitHub which seem semi-related.

The response from the first link seems unlikely at this point as I can confirm that Bloom can handle vNodes and vRelationships. The second leads me to believe it may be a version issue (I know our versions are not up to date ... ). Do you have any ideas?

I am using Enterprise Neo4j V 4.4.3, Neo4j Bloom version: 2.0.0 on an AWS EC2.

The apoc.do.when procedure is annotated as a 'write' procedure, so maybe that is causing the issue. Change it to apoc.when, which is the read-only version.

You are actually the Neo4j Ninja.... Thanks for your help, that worked!

I needed to make a slight modification (adding here for any future readers). TLDR; the CALL apoc.create.vRelationship within CALL apoc.when was referring to (t), the actual node, when it should've point to the vNode, node, created in the previous CALL apoc.create.vNode call.

MATCH (n:Paper {paperid: '7608367'})
MATCH (n)<-[r1:REFERS]-(o:one_hop)
CALL apoc.create.vNode(['one_hop'],o{.*}) yield node as vOne
CALL apoc.create.vRelationship(vOne,'REFERS',{},n) yield rel as vRel1
WITH n, o, vOne, vRel1
OPTIONAL MATCH (o)<-[r2:REFERS]-(t:two_hops)
CALL apoc.when(
    t is not null,
    "CALL apoc.create.vNode(['two_hops'],t{.*}) yield node  
     CALL apoc.create.vRelationship(node,'REFERS',{},vOne) yield rel 
     RETURN node as vTwo, rel as vRel2",
    "",
    {t:t, vOne:vOne}
) yield value
return n, vOne, vRel1, value.vTwo as two, value.vRel2 as rel2

Thanks again @glilienfield

Good catch on that one...you are very welcome...

1 Like