How to export data with path id of each path in neo4j

nishspeak · January 23, 2020, 9:22pm

I have created a graph with 3 paths as below.

Data:
a,b
b,c
d,e
f,e
g,h

Graph
a->b->c
d->e<-f
g->h

Desired outupt
a,uuid_1
b,uuid_1
c,uuid_1
d,uuid_2
e,uuid_2
f,uuid_2
g,uuid_3
h,uuid_3

Note : I have 50 million nodes.

terryfranklin82 · January 23, 2020, 9:39pm

Wouldn't g and h return just 78, in keeping with the pattern? You have no 'node 9' from what I can see.

Aside from that, what is the problem you're actually trying to solve? There may be simpler options.

nishspeak · January 23, 2020, 10:17pm

Actually, in desired output second column would be any unique id(uuid) nothing like node id.
I just want to assign a unique id to each graph created in database so i can export in csv.

terryfranklin82 · January 23, 2020, 11:21pm

Assigning an id to each node as you create them and ensuring the ids are unique would be straightforward:

// create a unique constraint
CREATE CONSTRAINT unique_id on (n:Node) ASSERT n.id IS UNIQUE

// set id as you create nodes
CREATE (n:Node) set n.id = 1

But I still get the feeling that's not what your looking for, can you try rephrasing your question?

nishspeak · January 23, 2020, 11:37pm

I want to assign unique id at group level not at node level, here 3 group have been created.
Each group will have separate unique id and each node of a group will share the same unique id of corresponding group. It can be added as new property of nodes.

I have made some changes in my desired output.
Thanks is advance.

terryfranklin82 · January 24, 2020, 12:34am

With 50 million nodes, there must be a large number of groups too I guess?

What defines a group? I mean, what is the underlying logic you use to turn a->b->c into group 1?

Do you have group nodes with properties besides an id, or is it just a way of encapsulating the path a->b->c?

nishspeak · January 24, 2020, 1:45am

With 50 million nodes, there must be a large number of groups too I guess?
yes
What defines a group? I mean, what is the underlying logic you use to turn a->b->c into group 1?
yes
Do you have group nodes with properties besides an id, or is it just a way of encapsulating the path a->b->c?
What does group nodes meaning?

terryfranklin82 · January 24, 2020, 2:28am

Are there any (:Group) nodes in your graph? What defines a group from an outside perspective?

You could for example have (g:Group) where g.uuid = 1, and relate nodes (a), (b) and (c) to (g) somehow:

(a)-[:BELONGS_TO]->(g)

or alternatively attach a uuid property to the relationships you already have:

(a)-[:IS_GROUPED_WITH {uuid:1}]->(b)-[:IS_GROUPED_WITH {uuid:1}]->(c)

but it really comes down to how you plan out your data model.

nishspeak · January 24, 2020, 3:31am

only one type of nodes I have.

My Logic:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM 'file:///headers.csv' as line
MERGE (per1:person1 {person1: line.p1})
MERGE (per2:person1 {person1: line.p2})
CREATE (per1)-[:knows]->(per2)

terryfranklin82 · January 24, 2020, 4:20am

Your label & property names are a bit confusing, having a 'person1' property on a 'person1' node will be hard to manage. You'll also probably have an easier time if you use the Neo4j conventions (node labels begin with uppercase letter, relationships all uppercase):

MERGE (p1:Person {id: line.p1})
MERGE (p2:Person {id: line.p2})
CREATE (p1)-[:KNOWS]->(p2)

But to try and solve your original issue - it looks like each "group" of people appears on 1 line from your CSV? If that's the case, and there is a property on each line to indicate the group number (e.g. p0), you could do:

MERGE (p1:Person {id: line.p1, groupId: line.p0})
MERGE (p2:Person {id: line.p2, groupId: line.p0})
CREATE (p1)-[:KNOWS]->(p2)

If there is no group id available, you could use apoc.load.csv to get a unique line number for each row of your csv, and make that the stand-in group id:

CALL apoc.load.csv('file:///headers.csv')
YIELD lineNo, list as line
MERGE (p1:Person {id: line.p1, groupId: lineNo})
MERGE (p2:Person {id: line.p2, groupId: lineNo})
CREATE (p1)-[:KNOWS]->(p2)

nishspeak · January 24, 2020, 12:18pm

Thanks alot for the solution!

Actually, I have loaded the file already using neo4j import tool with relationship.
Now I just want to export data with group id as below(optimized way).

node,group_id
a,uuid_1
b,uuid_1
c,uuid_1
d,uuid_2
e,uuid_2
f,uuid_2
g,uuid_3
h,uuid_3

Logic you are suggesting would take so long time to upload.

terryfranklin82 · January 26, 2020, 10:42pm

If all of your nodes & relationships already exist in the graph, and you have no existing value for the group ids (they just need to be unique) you can use the apoc.path.subgraphNodes function to identify each unique cluster, and then label them with a randomly generated UUID (through another apoc function) to indicate their group:

match (p:Person) where p.groupId is null
with p, apoc.create.uuid() as newGroupId
call apoc.path.subgraphNodes(p, {relationshipFilter:"KNOWS", labelFilter:"Person"}) yield node as sibling
set p.groupId = newGroupId, sibling.groupId = newGroupId

nishspeak · January 29, 2020, 11:41pm

Above solution working fine in small dataset.
But In case of big dataset (50 million node) its running forever.

I can load the csv again if neo4j has better option.

Thanks alot for the reply.

terryfranklin82 · January 30, 2020, 12:02am

One of the periodic execution functions in apoc can probably help with that.

nishspeak · January 30, 2020, 12:05am

Thanks a lot, let me try this one.

nishspeak · February 6, 2020, 4:04pm

I really appreciate for your help.

I have made minor changes in the query by replacing uuid with id of node.

In apoc.periodic.commit function what is the meaning of limit clause?
In my case Its only running for 10000 nodes only which I passed in limit size. Its supposed to be run for all nodes in batch of limit size.

call apoc.periodic.commit(
'match (p:Person) where p.groupId is null with p limit  {limit}
call apoc.path.subgraphNodes(p, {relationshipFilter:"KNOWS", labelFilter:"Person"}) yield node as sibling
set p.groupId = id(p), sibling.groupId = id(p)',{limit:10000);

Thanks in advance.

nishspeak · February 6, 2020, 6:31pm

Issue resolved..

I just added return clause at the end.

Thanks a lot for the solution.

nishspeak · February 6, 2020, 10:10pm

I tried one more thing and this one also working fine.
But I need to check the performance of this solution.

CALL algo.unionFind('Person', 'KNOWS', 
{write: true,writeProperty: 'groupId'}) 
yield nodes RETURN nodes

Topic		Replies	Views
Regroup and set group labels Graph Algorithms/Graph Data Science apoc	5	739	May 15, 2020
Neo4j-admin import csv header Import / Export import , neo4j-admin	4	90	April 23, 2025
How to export a a set of connected nodes to csv Neo4j Graph Platform	2	730	May 17, 2019
Getting to a certain label Cypher	4	3969	May 7, 2019
Two question Neo4j Graph Platform	7	768	May 30, 2019

Demystifying Neo4j UX Research

How to export data with path id of each path in neo4j

Related topics