How to avoid duplicate node rows in a graph?

This may be a newbie question but i couldn't find an answer.
I have a query that fetches a graph of nodes but the result duplicates the nodes per row with unique IDs each time. I'm wondering how if I can write cypher in a way to dedupe this output?

      MATCH (n)-[r]-(n2)
        RETURN n as source, r as relation, n2 as target

But some nodes are linked to more than one other. A--r--B and A-r-C
In the neo4j visualizer "table" view I see a row for each relation, which includes the start node each time even if it's the same named node:


image

Neo4Js own visualizer will dedupe these nodes:

However the data I'm getting back makes it hard to dedupe them since they all have different IDs, and the relations refer to the different node IDs for each row. I would have to write my own code to dedupe them based on looking up the nodes, finding labels, modifying the relations, or something similar.

Is there a way to write a better cypher query that will just reference the same node ID each time?

I'm using the JS driver so I have to manually parse stuff myself. In the end I have something like this, for each row and then dedupe the final nodes afterwards based on .name field


export function parseRelation(row: any) {
  const source = row.get("source")
  const relation = row.get("relation")
  const target = row.get("target")
  clog.log("parseRelation", { source, relation, target })

  const clump = {
    nodes: [
      {
        id: source.properties.name,
      },
      {
        id: target.properties.name,
      },
    ],
    links: [
      {
        source: source.properties.name,
        target: target.properties.name,
      },
    ],
  }
  return clump
}

The query pattern you are matching on will return every single relationship, one per row, so you are going to get duplicate abound. What is the format you want to return?

I do want all the relations, but reusing the same IDs.
The eventual format I want is something like this:

  "graph": {
    "nodes": [
      { "id": "A",      },
      { "id": "B",      },
      { "id": "C",      },
  ]
  "links": [
      {
        "source": "A",
        "target": "B",
      },
      {
        "source": "A",
        "target": "C",
      },
   ]
}
  • I'm also wondering if there's an easier way to get the data turned into a JSON object from the JS driver, than walking through it all myself

  • Also a deeper 2 or 3 connections deep query:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)

Let's address the first use case.

Test Data:

create (n:Test{id:0})-[:REL]->(:Test{id:1})
create (n)-[:REL]->(:Test{id:2})
create (n)-[:REL]->(:Test{id:3})
create (n)-[:REL]->(:Test{id:4})
create (n)-[:REL]->(m:Test{id:5})
create (m)-[:REL]->(:Test{id:6})
create (m)-[:REL]->(:Test{id:7})
create (m)-[:REL]->(:Test{id:8})

Query:

match(a)-[r]-()
with collect(distinct a) as nodes, collect(distinct r) as rels
return {
    graph: {
        nodes: [i in nodes | i{.id}],
        links: [i in rels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Query Result:

{
  "graph": {
    "nodes": [
      {
        "id": 0
      },
      {
        "id": 1
      },
      {
        "id": 2
      },
      {
        "id": 3
      },
      {
        "id": 4
      },
      {
        "id": 5
      },
      {
        "id": 6
      },
      {
        "id": 7
      },
      {
        "id": 8
      }
    ],
    "links": [
      {
        "source": 0,
        "target": 1
      },
      {
        "source": 0,
        "target": 2
      },
      {
        "source": 0,
        "target": 3
      },
      {
        "source": 0,
        "target": 4
      },
      {
        "source": 0,
        "target": 5
      },
      {
        "source": 5,
        "target": 6
      },
      {
        "source": 5,
        "target": 7
      },
      {
        "source": 5,
        "target": 8
      }
    ]
  }
}

Performing the match as match(a)-[r]-() will return the relationship in both directions, so 'a' will contain all the nodes. This makes it easier to remove the duplicates.

Maybe something like this for the second case:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) + collect(distinct r2) as allRels
unwind allNodes as node
with allRels, collect (distinct node) as dedupedNodes
unwind allRels as rel
with dedupedNodes, collect(distinct rel) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Note: if you don't mind using the APOC library, you can remove the duplicates easier with the apoc.coll.toSet() method:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) as allRels
with 
    apoc.coll.toSet(allNodes) as dedupedNodes, 
    apoc.coll.toSet(allRels) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

Another option is to remove the duplicates using the 'reduce' operation:

MATCH (n1{chapterId:"chap1"})-[r1]-(n2)-[r2]-(n3)
with collect(distinct n2) + collect(distinct n3) as allNodes, collect(distinct r1) as allRels
with 
    reduce(s=[], i in allNodes | CASE WHEN i in s THEN s ELSE s+[i] END) as dedupedNodes,
    reduce(s=[], i in allRels | CASE WHEN i in s THEN s ELSE s+[i] END) as dedupedRels
return {
    graph: {
        nodes: [i in dedupedNodes | i{.id}],
        links: [i in dedupedRels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }
} as result

1 Like

thanks so much for this. I do want to retain the directionality of the relation so I think I still will get some duplicates.

But this syntax is very interesting. So i guess there are a number of properties/functions available within cypher that can be called. Now that I know about it I know where to look!

graph: {
        nodes: [i in nodes | i{.id}],
        links: [i in rels | {
            source: startNode(i).id,
            target: endNode(i).id
        }]
    }

You will get the directionality with the query I provided, because it uses the startNode() and endNode() functions with each relationships. The endNode() is the one being pointed to.

There are many cypher features I like, but I would suggest you get familiar with list comprehension and map projections.

1 Like