Trying to get get my head around graph creation in GDS

andy_hegedus · June 11, 2020, 4:41pm

Trying to get a clear understanding of what happens when a graph is created in GDS and what are the limits of the nodes and relationships included.
From the documentation (just to prove I am actually reading it. The following example is given.

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n) WHERE n:Author OR n:Article OR n:Book RETURN id(n) AS id, labels(n) AS labels',
    'MATCH (n:Author)-[r:WROTE]->(m) RETURN id(n) AS source, id(m) AS target'
)

Am I correct in understanding the ONLY type of nodes in the created graph are Author, Article and Book?
Are the relationships included between those nodes limited to :WROTE? For example if there was another relationship :Is_published_in that connects articles and and books, that relationship would not be included in the created graph, correct?
Does the variable r in the [r:WROTE] have any function in the created map since it is not, id, source, or target?
Is there any way ti visualize the created graph in the desktop browser? This would help me check if I am getting what I think I want.

Andy

soham.dhodapkar · June 11, 2020, 8:33pm

Hi @andy_hegedus,
gds.graph.cypher.create allows you to create an in-memory graph using the cypher projection mentioned in the statement, so that you don't have to pull in the entire graph. This lets you pick out all the entities in the graph that you want to use to run GDS algorithms. To answer your questions:

Yes, since the cypher statements mentioned are fetching Author, Article and Book nodes, only those nodes will be present in the in-memory graph.
Correct, similar to the specified nodes, relationships specified in the statements will be in the created graph.
I am not sure what you mean by map, but here the variable r is not used. In case of multiple relationship types, type(r) can be used to distinguish between multiple relationship types.
For eg if: [r:WROTE | PUBLISHED], then returning type(r) will let us know which of the two relationship type was used.
Currently, there is no way to visualize in-memory graph. Although you can export this graph to another database and work on it there. Refer Graph Catalog for details on export.

andy_hegedus · June 11, 2020, 9:03pm

Hi Soham,
Thank you for your response
The schema looks like this and my goal is to work with a subgraph that represents the patents assigned to a given company, but also include the relationships and I don't know if there is way to contain the subgraph because as you see the patents reference (cite) other patents and patents are further classified cpc which also reference (Reports_to) other cpc.
So it is clear how to identify the nodes, but less clear on how to specify the relationships when there are 3 potential players but only source and target. Or do I reuse source and target multiple times?

You can ignore the techHub node.
graph

soham.dhodapkar · June 12, 2020, 12:01am

Good question. I'm going to tag in a GDS expert here.
@alicia.frame Can you please take a look at this?

andy_hegedus · June 12, 2020, 12:12am

Thanks Soham,

I do love it when I ask a good question!
Andy

alicia.frame · June 12, 2020, 3:40pm

The first cypher query, in your graph.create.cypher identified the nodes to load into your in memory graph, and the second creates the relationships between them. For the example in your original post, you can actually use the native loader:

CALL GDS.graph.create{'my-native-graph', ['Author','Article','Book],['WROTE','PUBLISHED']}

If you don't need to create relationships that aren't present in your source graph, and you don't need to change labels, structure, etc then you can use the native loaders, which are typically much faster.

For your specific request, it sounds like you either need to refactor your graph to include the relationship you're trying to describe, or write a cypher query that retrieves the appropriate relationship.
It's probably most useful to start from: what algorithms are you trying to run, and what questions are you trying to answer. Algorithms like Louvain or PageRank expect a monopartite graph (one type of node) so you'd want to connect companies to eachother based on shared patents; where as node similarity is for a multipartite graph, measuring source node similarity based on targets.

andreas_kuczera · December 6, 2020, 6:40pm

Hi, any news about visualizing an in memory graph. It would be a nice feature for checking code and from a didactic point of view.

Topic		Replies	Views
Creating graph in gds Procedures & APOC	4	831	June 10, 2020
Visualizing in-memory graph Procedures & APOC	1	550	September 18, 2021
Create GDS graph from Virtual Relationships Graph Data Science / Graph Analytics	0	300	December 1, 2021
GDS graph create with in-memory database Graph Data Science / Graph Analytics kotlin , spring	1	763	February 6, 2021
Using gds.graph.create.cypher to create subgraph based on node properties and relationship types Graph Data Science / Graph Analytics	4	2575	October 25, 2021

Take the Course Then Join The Aura Agent Hackathon

Trying to get get my head around graph creation in GDS

Related topics

Take the Course Then Join
The Aura Agent Hackathon