Facing difficulties to create a subgraph from original one in NEO4j

I am trying to make a subgraph from original graph in neo4j the situation is given bellowtest

Here node B and C are connected with node A. So I want to make a subgraph where B and C are connected if both B and C both have relation with A . I am confused about which cypher query will be appropriate for solving this problem.
Moreover i am thinking of using NetWorkx since neo4j has NetworkX binding . so kindly help me how would I approach this problem with neo4j cypher. I am using neo4j 3. 5.8

So you just want to create a relationship between nodes B and C provided some condition is met, or do you want to clone nodes B and C, and connect the cloned nodes if A was connected to both of them?

If it's the latter case, what do you want to do if A wasn't connected to both of them? Should they still be cloned, but not connect the clones?

Also, will you be matching directly to these nodes? Or are B and C supposed to be labels? Some pseudo-Cypher might help us understand.

I am trying to create relationship between B and C provided some condition .The idea is like if B works for A . and c works for A then I will create a subgraph B and C are connected . Here A B C are node labels

Okay, so something like this?

MATCH (a:A)<-[:WORKS_FOR]-(b:B), (a)<-[:WORKS_FOR]-(c:C)
MERGE (b)-[:WORKS_WITH]-(c)

This is explicitly looking for patterns where an :A node is connected to a :B node and a :C node, and if so it will MERGE a relationship between b and c (change up the relationship types to whatever you're using in your graph, of course).

Thanks for your help I will try it up then let you know :)

@andrew_bowman another thing is coming in my mind . I have two node label A and C. which holds multiple records(for A label there are a1 a2 up to n records, for C there are c1 c2 up to c records). with unique ids. the scenario is

for specific node label A, let's assume a record a1 is connected with two different record c1 and c2 . if c1 and c2 are connected with a1 of A label then
I want to show c1 and c2 are related or connected. how could I approach it. the desired subgraph will be consist of n times connected c's if they are connected with common A
for base case
MATCH (a:A)<-[:WORKS_FOR]-(b:B), (a)<-[:WORKS_FOR]-(c:C)
MERGE (b)-[:WORKS_WITH]-(c)
the query works for connects 2 nodes what about n nodes .
MATCH (a:A)<-[:WORKS_FOR]-(b:C) return b shows all C nodes those are connected with A
I want these C nodes to be connected with each other those are matched with common A
I am not sure how to merge multiple nodes with this condition
I am new kindly help me to learn

The query I gave should work fine for multiple nodes.

MERGE first checks to see if the desired relationship already exists, and only creates one if the relationship isn't already there, so you shouldn't see multiple :WORKS_WITH relationships between nodes even in cases that the same pairing of nodes occurs more than once.

As for connecting nodes that work for a single common node, you can use this:

MATCH (a:A)<-[:WORKS_FOR]-(c:C)
WITH a, collect(c) as cNodes
UNWIND cNodes as c1
UNWIND cNodes as c2
// now we have a cartesian product, need to filter out mirrored results
WITH c1, c2
WHERE id(c1) < id(c2) // prevents mirrored results
MERGE (c1)-[:WORKS_WITH]-(c2)

Hello @andrew_bowman i tried this but it is taking huge time. to be honest this query is loading still. the node volume is 8 million and relation size is 799940444. so is there any way I could reduce this amount of time . I can't test this query. My cloud ram size is 240G with 16 core cpu. Kindly help me

It would have been good to know the volume ahead of time, you need to batch this rather than doing this in a single transaction, this is going to blow your heap if you try this with that many nodes.

You should look at using apoc.periodic.iterate() for this.

@andrew_bowman
my apoc query is like this
CALL apoc.periodic.iterate('
MATCH(a:A)<-[:WORKS_FOR]-(c:C)
WITH a, collect(c) as cNodes
','
UNWIND cNodes as c1
UNWIND cNodes as c2
WITH c1, c2
WHERE id(c1)<id(c2)
MERGE (c1)-[:work_with]-(c2)
return c1 c2
',{batchSize:10000, iterateList:true, parallel:true})
but my query is not proper I guess this is showing bellow error
Failed to invoke procedure apoc.periodic.iterate: Caused by: org.neo4j.cypher.internal.v3_5.util.SyntaxException: Query cannot conclude with WITH (must be RETURN or an update clause) (line 3, column 1 (offset: 64))
I have tried return c1 and c2 here
I guess I am doing some silly mistakes but can't figure it out

The collect() probably shouldn't be in the driving query, that's an eager aggregation so it won't stream well.

How many :A nodes are there in the graph, how many :C nodes, and how many :WORKS_FOR relationships?

@andrew_bowman
total counts of A are = 345561
total counts of C are = 29501836
total counts of WORKS_FOR are =141362938

In that case I'd suggest using a pattern comprehension to get the c nodes that work for a, and doing the UNWIND portion and filtering in the driving query, reserving the MERGE for the updating query:

// driving query

MATCH (a:A)
WHERE (a)<-[:WORKS_FOR]-() // degree check
WITH a, [(a)<-[:WORKS_FOR]-(c:C) | c] as cNodes
UNWIND cNodes as c1
UNWIND cNodes as c2
WITH c1, c2
WHERE id(c1) < id(c2)
RETURN c1, c2 // this was missing from your driving query, which was the source of the syntax error

// updating query

MERGE (c1)-[:work_with]-(c2) // no return needed

I wouldn't use parallel:true for this one, as that may cause lock contention and possible deadlock, provided that :C nodes have multiple :WORKS_FOR relationships to multiple :A nodes.

1 Like

@andrew_bowman thanks a lot it worked for me but it took 8 hours to complete the query. I was thinking to make subgraph filtering by date for reducing time . Here my (primary)works_for relationship is coming from csv files where the fields are like first name, last name, time , relation(that is the work for). The load csv query helped me . I have written a py script for pipeline purpose. so the situation is like this I want to make subgraph where c nodes works_for common A nodes between date range. lets assume works_for in (1/1/2019-4/4/2019) . Any suggestion will be highly appreciated
my idea was to find common A in specific time. I have come up to bellow query

match(n:A)
WHERE (a)<-[:WORKS_FOR{date(datetime(n.TIMESTAMP))='2019-09-04' or date(datetime(n.TIMESTAMP)< '2019-09-05')}]-()
return a
Invalid input '(': expected an identifier character, whitespace, ':' or '}' (line 2, column 26 (offset: 44))
"WHERE (a)<-[:WORKS_FOR{date(datetime(n.TIMESTAMP))=
the query is throwing this error.May be I am missing very silly thing
the time stamp format is like
2019-09-04T10:31:42

The degree pattern check I used before only works when we only have the label and the origin node, so the properties being here won't work, we would need to move this back into the match pattern and use a WHERE clause for the timestamp range.

MATCH (a:A)<-[r:WORKS_FOR]-(c:C)
WITH a, date(datetime(r.TIMESTAMP)) as timestamp
WHERE timestamp < date('2019-09-05')
...

That said, this may take awhile to match, and you won't be able to use the pattern comprehension in this case, since you needed the pattern in the MATCH clause.

Since it looks like the date is in a string form, it may be faster to create a fulltext schema index on your :WORKS_FOR relationships specifying the TIMESTAMP property, that should allow you to make a fulltext query call to lookup relationships with that property in a range.

@andrew_bowman ok thanks I will look into it. another thing related this to say is how could I update this relationships ?like here a relationship (works_for) has already made. In future if I add more C nodes into graphdb which has a Common A . will above pattern comprehension query help to update "WORKS_FOR" relationship. I have already started it but could not find any increase of total nodes. The situation is " if some new c nodes are inserted into db those have WORKS_FOR relationship with common nodes 'A' but none of these nodes has connect relationship to each other "