How to avoid getting paths and get only list of unique nodes and relationships?

igordata · August 18, 2020, 2:46pm

Hi everyone,

I have a node and I want to get all connected nodes in several steps.

MATCH (n:tx)-[:tx2tx*1..3]-(m:tx)
WHERE ID(m)=1791789334
RETURN *

What I get is all possible paths with node m in them, in total about 2k paths. I don't need the path information at all. I have about 40 nodes and 60 relationships. How can I get just a list of nodes and a list of relationships?

Adding DISTINCT doesn't work, collect() too.

Cobra · August 18, 2020, 2:59pm

Hello @igordata and welcome to the Neo4j community

You are not too far away:

MATCH (n:tx)-[r:tx2tx*1..3]-(m:tx)
WHERE ID(m)=1791789334
RETURN collect(DISTINCT m) AS nodes, collect(DISTINCT r) AS relations

Regards,
Cobra

igordata · August 18, 2020, 3:07pm

Thanks, but it returns same nodes several times and all relations come in arrays of 3 and it looks like it's relations from paths in the right order like paths has them. Is there a way to get only nodes and relations just a two lists of them?

Cobra · August 18, 2020, 3:09pm

Can you show me what it returns with the query and what you would like to get please (a little example)?

igordata · August 18, 2020, 3:20pm

This is a part of data I get. The relationship with id 1546728594 is included like about 16 times in my data.

,
    "relations": [
      [
        {
          "identity": 1546728594,  <-- Here it goes
          "start": 1927833568,
          "end": 1791789334,
          "type": "tx2tx",
          "properties": {

          }
        }
      ],
      [
        {
          "identity": 1546728593,
          "start": 1927833568,
          "end": 1785591632,
          "type": "tx2tx",
          "properties": {

          }
        },
        {
          "identity": 1546728594, <-- and here
          "start": 1927833568,
          "end": 1791789334,
          "type": "tx2tx",
          "properties": {

          }
        }
      ],
      [
        {
          "identity": 1803520488,
          "start": 2005583212,
          "end": 1785591632,
          "type": "tx2tx",
          "properties": {

          }
        },
        {
          "identity": 1546728593,
          "start": 1927833568,
          "end": 1785591632,
          "type": "tx2tx",
          "properties": {

          }
        },
        {
          "identity": 1546728594,  <-- and again
          "start": 1927833568,
          "end": 1791789334,
          "type": "tx2tx",
          "properties": {

          }
        }
      ],
      [
        {
          "identity": 1679121836,
          "start": 1968412308,
          "end": 1785591632,
          "type": "tx2tx",
          "properties": {

          }
        },
        {
          "identity": 1546728593,
          "start": 1927833568,
          "end": 1785591632,
          "type": "tx2tx",
          "properties": {

          }
        },
        {
          "identity": 1546728594,  <-- and so on...
          "start": 1927833568,
          "end": 1791789334,
          "type": "tx2tx",
          "properties": {

          }
        }
      ],

Sorry, if I explain the problem not clearly enough. I need to have a list of unique relationships to get more steps from my starting node. With 0..5 from m everything becomes extremely huge and slow. But if I could get rid of paths - that could save me lots of time and memory.

Cobra · August 18, 2020, 3:31pm

MATCH (n:tx)-[r:tx2tx*1..3]-(m:tx)
WHERE ID(m)=1791789334
RETURN apoc.coll.toSet(apoc.coll.flatten(collect(m))) AS nodes,
       apoc.coll.toSet(apoc.coll.flatten(collect(r))) AS relations

igordata · August 18, 2020, 3:36pm

got error about apoc unknown function, I'll read about how to enable it, it looks like it is a must have thing, thanks

Cobra · August 18, 2020, 3:37pm

Yeah, you must install APOC plugin

If you can't do it in Cypher, APOC will myabe do

igordata · August 18, 2020, 3:48pm

Found that kind of solution:

// uniq
MATCH ptx=(n:tx)-[:tx2tx*1..3]-(m:tx)
MATCH po=(o:output)-[rtxo]-(n)
WHERE ID(m)=1791789334
UNWIND nodes(ptx) as node
UNWIND relationships(ptx) as relationship
UNWIND nodes(po) as outputs
RETURN collect(distinct node) as nodes,
       collect(distinct relationship) as relationships,
       collect(distinct outputs) as outputs

Not sure how beautiful it is, but it works fine for now

igordata · August 18, 2020, 4:02pm

It's not a perfect solution due to I'm still getting duplicates from po that are already in ptx but I'll stick with that one for this week

Cobra · August 18, 2020, 4:19pm

You could merge the list with a comprehension list

igordata · August 18, 2020, 4:41pm

Sorry, how? Could you please provide me an example?

I already grew my query to that

// uniq with outputs with fringe
MATCH ptx=(m:tx)-[:tx2tx*1..3]-(n:tx)-[:tx2tx]-(f:tx) WHERE ID(m)=1791789334
MATCH po=(o:output)-[rtxo]-(n)
MATCH pa=(a:addr)-[ra]-(o)
UNWIND n as txnode
UNWIND f as fnode
UNWIND relationships(ptx) as txrel
UNWIND o as onode
UNWIND relationships(po) as orel
UNWIND a as anode
UNWIND ra as arel
RETURN
collect(distinct txnode) as txnodes,
collect(distinct txrel) as txrels,
collect(distinct onode) as onodes,
collect(distinct orel) as orels,
collect(distinct fnode) as fnodes,
collect(distinct anode) as anodes,
collect(distinct arel) as arels

I get n nodes in f nodes list and it is bad. Is there a way to politely ask f nodes to be only those who are not n already? I mean I get 30 n nodes and 300 f nodes, and this 30 n are included to f.

Thank you, btw

ameyasoft · August 18, 2020, 5:29pm

Try this:

MATCH (c) WHERE id(c) = 1791789334 CALL apoc.path.subgraphAll(c, {}}) YIELD nodes, relationships 
UNWIND nodes as n1
UNWIND relationships as r1
RETURN distinct type(r1) as rel, count(r1) as Cnt2, labels(n1) as lbl, count(n1) as Cnt order by rel

igordata · August 20, 2020, 9:33pm

Apoc rules.
I end up with this and it works great and damn fast


MATCH (t:tx) where id(t) = 1791789334
CALL apoc.path.subgraphAll(t, {
    relationshipFilter: "tx2tx",
    minLevel: 0,
    maxLevel: 4
})
YIELD nodes, relationships
UNWIND nodes as node
WITH node, nodes, relationships
OPTIONAL MATCH po=(a:addr)--(o:output)--(node)
OPTIONAL MATCH pf=(tf:tx)--(o) WHERE NOT tf IN nodes
RETURN
nodes,
relationships,
apoc.coll.toSet(apoc.coll.flatten(collect(o))) AS onodes,
apoc.coll.toSet(apoc.coll.flatten(collect(relationships(po)))) AS orels,
apoc.coll.toSet(apoc.coll.flatten(collect(a))) AS anodes,
apoc.coll.toSet(apoc.coll.flatten(collect(tf))) AS fnodes,
apoc.coll.toSet(apoc.coll.flatten(collect(relationships(pf)))) AS frels

Thank you guys for your help!

pesle.fanny · December 3, 2021, 11:02am

Thank you very much! APOC is so powerfull! Your solution downsized my JSON export from 73Mb (with so many duplicates) to 500Kb.

Topic		Replies	Views
Retrieve all the distinct nodes and relationships from list of paths Cypher performance , cypher , paths	2	1601	July 10, 2023
How to get all the connected nodes and relationship of a particular node? Cypher	15	10550	February 4, 2022
Neo4J - Get all paths without a repeating relationship type or node label Cypher cypher	4	236	March 14, 2024
Path that return a unique node and no relationship Cypher cypher	4	1079	March 8, 2021
How to get all the nodes with and without relationships Neo4j Graph Platform migrated	1	909	June 13, 2022

July Summer Fun!

How to avoid getting paths and get only list of unique nodes and relationships?

Related topics