Random walk with Native Projection after GDS v 1.8 update

I was using gds version<1.6 for a long time. I had no problems with the 'alpha' version of the random walk algorithm, but with the 'beta' development I am seeing a big difference. For example:

My graph has many nodes, but I want to limit them to the following nodes ['Person', 'Location', 'Manager', 'Specialty', 'Activity']. The relationships are all directional but for the sake of the random walk I want to treat them as 'UNDIRECTED.' For instance, (Person1)-[KNOWS]->(Manager1). Even though it's directional, it's possible that Manager1 knows Person 1.


To do this on older GDS versions I would create the Native projection as follows:
PROJECTION 1:

	CALL gds.graph.create(
	                'myGraph',
	                ['Person', 'Location', 'Manager', 'Specialty', 'Activity'],
	                { undirected: {type: '*', orientation: 'UNDIRECTED'} }
	            ) YIELD graphName AS graph, nodeCount AS nodes, relationshipCount AS rels;

It would yield 68591 nodes and 8,341,820 relationships. With the older gds version, I could then run the random walk and specify a source node as follows:

MATCH (n:Person)
	WHERE n.name = 'John'
	CALL gds.beta.randomWalk.stream(
	  'myGraph',
	  {
	    sourceNodes: n, walksPerNode: 1, walkLength: 3, inOutFactor: 0.1, returnFactor: 5.0
	  }
	)
	YIELD nodeIds, path
	RETURN nodeIds, [node IN nodes(path) | node.name ] AS traversed_nodes, [node IN nodes(path) | labels(node) ]  AS traversed_labels

Note: in the 'alpha' development sourceNode was 'start', walksPerNode was 'walks', walkLength was 'steps' (or something similar).


It would return ['John', 'Alice', 'Seattle'] for example.



However, after the upgrading to GDS v1.8.2, it yields no results. But if I specify the projection the following way:
PROJECTION 2:

CALL gds.graph.create(
                'myGraph',
                '*',
                { undirected: {type: '*', orientation: 'UNDIRECTED'} }
            ) YIELD graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipProjection, relationshipCount AS rels;

nodes: 41526446, rels: 145444158


Using the random walk specified in PROJECTED 2 I get exactly what I want, but with nodes I don't care about. For instance, using the latter projection traversed_nodes yields ['John', 'linux experience','Software developer']. The second element, 'linux experience', has node label 'Skills' for which there are over 8 million descriptions with misspelling and/or blanks.


What has changed in version 1.8.2 that requires an adjustment to Native PROJECTION 1 above? How can I limit the label types similar to PROJECTION 1, while also specifying to treat directed relationships as undirected?

The difference I see between the two projections is that projection 1 specifies node labels (Person, Location, etc), while projection 2 uses * - which covers all node labels. That would explain why your random walks traverse Skill nodes in the second projection, but not the first.

It is possible to specify a source node with gds.beta.randomWalk - you just need to select it outside of the configuration parameters:

MATCH (p:Person)
WHERE p.name = 'John'
WITH COLLECT(p) as sourceNodes
CALL gds.beta.randomWalk.stream(
  'myGraph',
  {
    sourceNodes: sourceNodes,
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS pages

The syntax - and behavior- of graph creation hasn't changed. If the original (node filtered) graph create isn't returning any results, I'd check whether the start node is being found. You could test setting the walk length lower as well, to see if there isn't only a two hop walks etc.

If that doesn't solve it, you can open an issue in our team's github: Issues · neo4j/graph-data-science · GitHub

Hello,

Did you acheive your data sampling with this large data graph ?
I am looking for an example of a large data graph with its sampled version (wathever the data sampling algorithm tha has been used). I will be gratefull if you could give me these ressources.

Sincerely,

In older versions of GDS (Graph Data Science) before version 1.6, to achieve the desired functionality of limiting nodes to certain types and treating relationships as undirected for a random walk, you would typically set up a Native projection like this:

PROJECTION 1: [Specify your criteria for nodes and relationships]

This would allow you to filter your graph to only include nodes of types ['Person', 'Location', 'Manager', 'Specialty', 'Activity'] and treat the relationships as undirected for the random walk algorithm.Random walk with Native Projection after GDS v 1.8 update machine learning algorithms