Trying to get get my head around graph creation in GDS

Trying to get a clear understanding of what happens when a graph is created in GDS and what are the limits of the nodes and relationships included.
From the documentation (just to prove I am actually reading it. The following example is given.

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n) WHERE n:Author OR n:Article OR n:Book RETURN id(n) AS id, labels(n) AS labels',
    'MATCH (n:Author)-[r:WROTE]->(m) RETURN id(n) AS source, id(m) AS target'
)
  1. Am I correct in understanding the ONLY type of nodes in the created graph are Author, Article and Book?
  2. Are the relationships included between those nodes limited to :WROTE? For example if there was another relationship :Is_published_in that connects articles and and books, that relationship would not be included in the created graph, correct?
  3. Does the variable r in the [r:WROTE] have any function in the created map since it is not, id, source, or target?
  4. Is there any way ti visualize the created graph in the desktop browser? This would help me check if I am getting what I think I want.

Andy

Hi @andy.hegedus,
gds.graph.cypher.create allows you to create an in-memory graph using the cypher projection mentioned in the statement, so that you don't have to pull in the entire graph. This lets you pick out all the entities in the graph that you want to use to run GDS algorithms. To answer your questions:

  1. Yes, since the cypher statements mentioned are fetching Author, Article and Book nodes, only those nodes will be present in the in-memory graph.
  2. Correct, similar to the specified nodes, relationships specified in the statements will be in the created graph.
  3. I am not sure what you mean by map, but here the variable r is not used. In case of multiple relationship types, type(r) can be used to distinguish between multiple relationship types.
    For eg if: [r:WROTE | PUBLISHED], then returning type(r) will let us know which of the two relationship type was used.
  4. Currently, there is no way to visualize in-memory graph. Although you can export this graph to another database and work on it there. Refer Graph Catalog for details on export.

Hi Soham,
Thank you for your response
The schema looks like this and my goal is to work with a subgraph that represents the patents assigned to a given company, but also include the relationships and I don't know if there is way to contain the subgraph because as you see the patents reference (cite) other patents and patents are further classified cpc which also reference (Reports_to) other cpc.
So it is clear how to identify the nodes, but less clear on how to specify the relationships when there are 3 potential players but only source and target. Or do I reuse source and target multiple times?

You can ignore the techHub node.
graph

Good question. I'm going to tag in a GDS expert here.
@alicia.frame Can you please take a look at this?

Thanks Soham,

I do love it when I ask a good question!
Andy

The first cypher query, in your graph.create.cypher identified the nodes to load into your in memory graph, and the second creates the relationships between them. For the example in your original post, you can actually use the native loader:

CALL GDS.graph.create{'my-native-graph', ['Author','Article','Book],['WROTE','PUBLISHED']}

If you don't need to create relationships that aren't present in your source graph, and you don't need to change labels, structure, etc then you can use the native loaders, which are typically much faster.

For your specific request, it sounds like you either need to refactor your graph to include the relationship you're trying to describe, or write a cypher query that retrieves the appropriate relationship.
It's probably most useful to start from: what algorithms are you trying to run, and what questions are you trying to answer. Algorithms like Louvain or PageRank expect a monopartite graph (one type of node) so you'd want to connect companies to eachother based on shared patents; where as node similarity is for a multipartite graph, measuring source node similarity based on targets.