How to project a graph?


(Kaptenh) #1

Hi!
I have a graph with two types of nodes, lets call them G for groups and P for persons. Persons belongs to a group with, call that relation r1 and persons know other persons, call that r2.
I want to "project" the graph to a graph of only Groups with a link between them if there is a two person chain between them, i.e. if there is a path
(G1)-[r1]-(P1)-[r2]-(P2)-[r1]-(G2),
I want this to show as
(G1)--(G2).
There might (is) tons of different such path between any groups with one path exists, but only interested in one.

Starting with a given group I've tried using the subgraphAll and expandConfig algorithms to do this. They work alright, but not quite.

expandConfig: gives me all the paths between Groups reachable from the start node with paths of the given type. Good, but takes quite a bit of time when the maxLevel goes up.

subgraphAll: When specifying the Group as a end node (using >Group) it is really fast and gives all the groups reachable in this way, but no relationships.
When not doing that it seems about as fast as expandConfig. Im also not entirely sure how to recreate the graph from this as the relationship list doesnt specify
which nodes the relationships are between?

Just to clarify, say the graph looks something like

(G 1)--(P 1)--(P 2)--(G 2)--(P 3)--(P 4)--(G 3)
\ /
(P 5)

then expandConfig gives a list containing: (G1)--(P1)--(P2)--(G2), (G1)--(P1)--(P5)--(G2), (G2)--(P3)--(P4)--(G3)
subgraphAll gives me [(G1), (P1), (P2), (P5),(G2), (P3), (P4), (G3)] and a list of relationships i really dunno what to do with

subgraphAll with endnodes just gives [(G1), (G2), (G3)]

subgraphAll with endnodes is really fast, and I am thinking that it has to find paths between the groups do its thing. Is there any way to get it to return just one of them?
Is there any other way to solve the problem fast?


(Michael Hunger) #2

You can run subgraph all per Group and aggregate the results?

What about a regular pattern match + aggregation?

MATCH (g1:Group)<--(:Person)--(:Person)-->(g2:Group)
RETURN g1, g2, count(*)

Possibly add rel-types.
Can you run the above with PROFILE?

How big is your graph and what's your machine setup?


(Kaptenh) #3

Right now the graph contains roughly 3700 groups and 170000 persons at the moment.
The machine at the moment is a linux with 64G memory and 8 cores.. but that might change.

I wouldnt be interested in doing this for the entire graph at once as that is too big, but have a start Group and then project the component of the graph that the start node is in.

Would one then do something like:
match (start group) CALL apoc.path.subgraphNodes(start, ) YIELD node as g return start_group, CALL apoc.create.vRelationship(someparams), end_group ?

Or are there more effective ways? Possible to do a final aggregation maybe?

EDIT:
The following:
match (start:Group) where group.id = CALL apoc.path.subgraphNodes(start, {relationshipFilter: "r1,r2,r1", labelFilter: ">Group", uniqueness: "RELATIONSHIP_LEVEL"}) yield node as group CALL apoc.call.expandConfig(group, {maxLevel: 3, relationshipFilter: "r1,r2,r1", labelFilter: ">Group", uniqueness: "RELATIONSHIP_LEVEL"} yield path with nodes(path) as g with g[0] as start, g[length(g)-1] as end CALL apoc.create.vRelationship(start, "CONNECTED_TO", {}, end) YIELD rel return start, rel, end;

Seems to work pretty well. Only problem is that the subgraphNodes doesnt return the start nodes.

Is there any way to add it to the yielded nodes so I dont have to run a separate, identical expandConfig for that node?