Create a named graph as a subgraph of another named graph

Hi, all,

I have a MultiDiGraph that has edges whick properties as date, frame1, frame2 and frame3, where frame1, frame2 and frame3 are boolean properties.

Given a specific date, I have to calculate pagerank and betweenness for each frame property. So I will calculate 6 metrics.

Now, I am creating 3 named graphs for a given date, for example, 2020-03-15, I create one named graph for 2020-03-15/frame1, one for 2020-03-15/frame2 and one for 2020-03-15/frame3, and calculate the two metrics with these 3 named graphs. This is faster(about 2 times) than calculating the six metrics with 6 anonymous graphs, one for each metrics/date/frame.

The time to build the named graph is about 90% of the total time to calculate the 2 metrics. I wonder if there is a way of creating a named graph for 2020-03-15 and then create the 3 frames subgraphs as new named graphs of this date named graph, that is already in memory, as a way to build these 3 frames named graphs faster. I know that it seems a litlle confusing.

Someone has a guess about these topic or this does not make sense?

Thanks in advance, Laufer

Hey Laufer,

I think what you're looking for is something like gds.beta.graph.subgraph().

It should allow you to filter your existing "2020-03-15" graph into separate sub-graphs based on your frame1/frame2/frame3 properties, though I'm not sure how much of a performance lift it will provide. But definitely report back with the results.

I hope that offers some help,
:slight_smile:

Sean

1 Like

Thank you very much, Sean.
I will try and return the results.

Best regards, Laufer

Hi, Sean,

Coming back to report the results.
There is an error in the documents in respect to the name of the method: the correct name is gds.beta.graph.create.subgraph()

I create the date graph with the query:

CALL gds.graph.create.cypher('date_20200519',
'MATCH (p)-[{data: date("2020-05-19")}]-()
RETURN DISTINCT id(p) as id',
'MATCH (p1)-[r {data: date("2020-05-19")}]->(p2)
RETURN id(p1) as source, id(p2) as target, r.atrib_resp as atrib_resp, r.conflito as conflito, r.moralidade as moralidade, r.conseq_pandemia as conseq_pandemia, r.med_contencao as med_contencao, r.met_tratamento as met_tratamento',
{validateRelationships: True})

Actually I have six frames: atrib_resp, conflito, moralidade, conseq_pandemia, med_contencao, met_tratamento. The date graph has 222.244 nodes and 444.246 edges. If I run, for example, pageRank over it, it' ok.

But when I try to create a subgraph I get an 'out of bounds error':

CALL gds.beta.graph.create.subgraph('date_20200519_atrib_resp_subgraph', 'date_20200519', '*', 'r.atrib_resp = 1')
YIELD graphName, fromGraphName, nodeCount, relationshipCount

ERROR Neo.ClientError.Procedure.ProcedureCallFailed
Failed to invoke procedure gds.beta.graph.create.subgraph: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 45647 out of bounds for length 3473

I searched for this error an got a similar closed issue that was resolved: gds.graph.create.cypher ArrayIndexOutOfBoundsException error · Issue #15 · neo4j/graph-data-science · GitHub
It was about edges with nodes not present in the graph. I cannot see where I am making a mistake.

Do you have any guess?

Thank you, Laufer