How to avoid duplicate node rows in a graph?

DC1 · April 19, 2024, 5:16pm

This may be a newbie question but i couldn't find an answer.
I have a query that fetches a graph of nodes but the result duplicates the nodes per row with unique IDs each time. I'm wondering how if I can write cypher in a way to dedupe this output?

      MATCH (n)-[r]-(n2)
        RETURN n as source, r as relation, n2 as target

But some nodes are linked to more than one other. A--r--B and A-r-C
In the neo4j visualizer "table" view I see a row for each relation, which includes the start node each time even if it's the same named node:

Neo4Js own visualizer will dedupe these nodes:

However the data I'm getting back makes it hard to dedupe them since they all have different IDs, and the relations refer to the different node IDs for each row. I would have to write my own code to dedupe them based on looking up the nodes, finding labels, modifying the relations, or something similar.

Is there a way to write a better cypher query that will just reference the same node ID each time?

I'm using the JS driver so I have to manually parse stuff myself. In the end I have something like this, for each row and then dedupe the final nodes afterwards based on .name field


export function parseRelation(row: any) {
  const source = row.get("source")
  const relation = row.get("relation")
  const target = row.get("target")
  clog.log("parseRelation", { source, relation, target })

  const clump = {
    nodes: [
      {
        id: source.properties.name,
      },
      {
        id: target.properties.name,
      },
    ],
    links: [
      {
        source: source.properties.name,
        target: target.properties.name,
      },
    ],
  }
  return clump
}

glilienfield · April 19, 2024, 5:26pm

The query pattern you are matching on will return every single relationship, one per row, so you are going to get duplicate abound. What is the format you want to return?

DC1 · April 19, 2024, 5:35pm

I do want all the relations, but reusing the same IDs.
The eventual format I want is something like this:

  "graph": {
    "nodes": [
      { "id": "A",      },
      { "id": "B",      },
      { "id": "C",      },
  ]
  "links": [
      {
        "source": "A",
        "target": "B",
      },
      {
        "source": "A",
        "target": "C",
      },
   ]
}

I'm also wondering if there's an easier way to get the data turned into a JSON object from the JS driver, than walking through it all myself
Also a deeper 2 or 3 connections deep query:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)

glilienfield · April 19, 2024, 6:07pm

Let's address the first use case.

Test Data:

create (n:Test{id:0})-[:REL]->(:Test{id:1})
create (n)-[:REL]->(:Test{id:2})
create (n)-[:REL]->(:Test{id:3})
create (n)-[:REL]->(:Test{id:4})
create (n)-[:REL]->(m:Test{id:5})
create (m)-[:REL]->(:Test{id:6})
create (m)-[:REL]->(:Test{id:7})
create (m)-[:REL]->(:Test{id:8})

Query:

match(a)-[r]-()
with collect(distinct a) as nodes, collect(distinct r) as rels
return {
    graph: {
        nodes: [i in nodes | i{.id}],
        links: [i in rels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Query Result:

{
  "graph": {
    "nodes": [
      {
        "id": 0
      },
      {
        "id": 1
      },
      {
        "id": 2
      },
      {
        "id": 3
      },
      {
        "id": 4
      },
      {
        "id": 5
      },
      {
        "id": 6
      },
      {
        "id": 7
      },
      {
        "id": 8
      }
    ],
    "links": [
      {
        "source": 0,
        "target": 1
      },
      {
        "source": 0,
        "target": 2
      },
      {
        "source": 0,
        "target": 3
      },
      {
        "source": 0,
        "target": 4
      },
      {
        "source": 0,
        "target": 5
      },
      {
        "source": 5,
        "target": 6
      },
      {
        "source": 5,
        "target": 7
      },
      {
        "source": 5,
        "target": 8
      }
    ]
  }
}

Performing the match as match(a)-[r]-() will return the relationship in both directions, so 'a' will contain all the nodes. This makes it easier to remove the duplicates.

glilienfield · April 19, 2024, 6:21pm

Maybe something like this for the second case:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) + collect(distinct r2) as allRels
unwind allNodes as node
with allRels, collect (distinct node) as dedupedNodes
unwind allRels as rel
with dedupedNodes, collect(distinct rel) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Note: if you don't mind using the APOC library, you can remove the duplicates easier with the apoc.coll.toSet() method:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) as allRels
with 
    apoc.coll.toSet(allNodes) as dedupedNodes, 
    apoc.coll.toSet(allRels) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Another option is to remove the duplicates using the 'reduce' operation:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) as allRels
with 
    reduce(s=[], i in allNodes | CASE WHEN i in s THEN s ELSE s+[i] END) as dedupedNodes,
    reduce(s=[], i in allRels | CASE WHEN i in s THEN s ELSE s+[i] END) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

DC1 · April 19, 2024, 10:33pm

thanks so much for this. I do want to retain the directionality of the relation so I think I still will get some duplicates.

But this syntax is very interesting. So i guess there are a number of properties/functions available within cypher that can be called. Now that I know about it I know where to look!

graph: {
        nodes: [i in nodes | i{.id}],
        links: [i in rels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }

glilienfield · April 19, 2024, 10:52pm

You will get the directionality with the query I provided, because it uses the startNode() and endNode() functions with each relationships. The endNode() is the one being pointed to.

There are many cypher features I like, but I would suggest you get familiar with list comprehension and map projections.

Topic		Replies	Views
Duplicated rows when using multiple CALL subqueries Cypher cypher , neo4j-desktop	2	339	December 30, 2021
Find all paths from node without duplicated Neo4j Graph Platform browser , cypher	4	454	February 21, 2023
Why do I have same node1 and node2? How do I merge them into one? Neo4j Graph Platform import	5	337	September 27, 2021
Not detecting repeated nodes Neo4j Graph Platform migrated	7	165	January 20, 2023
Creating 2 unique nodes per row from a CSV Cypher cypher	1	287	January 24, 2022

How to avoid duplicate node rows in a graph?

Related topics