I was using gds version<1.6 for a long time. I had no problems with the 'alpha' version of the random walk algorithm, but with the 'beta' development I am seeing a big difference. For example:
My graph has many nodes, but I want to limit them to the following nodes ['Person', 'Location', 'Manager', 'Specialty', 'Activity']
. The relationships are all directional but for the sake of the random walk I want to treat them as 'UNDIRECTED.' For instance, (Person1)-[KNOWS]->(Manager1).
Even though it's directional, it's possible that Manager1 knows Person 1.
To do this on older GDS versions I would create the Native projection as follows:
PROJECTION 1:
CALL gds.graph.create(
'myGraph',
['Person', 'Location', 'Manager', 'Specialty', 'Activity'],
{ undirected: {type: '*', orientation: 'UNDIRECTED'} }
) YIELD graphName AS graph, nodeCount AS nodes, relationshipCount AS rels;
It would yield 68591 nodes and 8,341,820 relationships. With the older gds version, I could then run the random walk and specify a source node as follows:
MATCH (n:Person)
WHERE n.name = 'John'
CALL gds.beta.randomWalk.stream(
'myGraph',
{
sourceNodes: n, walksPerNode: 1, walkLength: 3, inOutFactor: 0.1, returnFactor: 5.0
}
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS traversed_nodes, [node IN nodes(path) | labels(node) ] AS traversed_labels
Note: in the 'alpha' development sourceNode was 'start', walksPerNode was 'walks', walkLength was 'steps' (or something similar).
It would return ['John', 'Alice', 'Seattle']
for example.
However, after the upgrading to GDS v1.8.2, it yields no results. But if I specify the projection the following way:
PROJECTION 2:
CALL gds.graph.create(
'myGraph',
'*',
{ undirected: {type: '*', orientation: 'UNDIRECTED'} }
) YIELD graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipProjection, relationshipCount AS rels;
nodes: 41526446, rels: 145444158
Using the random walk specified in PROJECTED 2 I get exactly what I want, but with nodes I don't care about. For instance, using the latter projection traversed_nodes yields ['John', 'linux experience','Software developer']
. The second element, 'linux experience',
has node label 'Skills'
for which there are over 8 million descriptions with misspelling and/or blanks.
What has changed in version 1.8.2 that requires an adjustment to Native PROJECTION 1 above? How can I limit the label types similar to PROJECTION 1, while also specifying to treat directed relationships as undirected?