Error in graph-algorithms-algo-3.4.7

slygren · October 9, 2018, 9:49pm

Hi,

I've been trying to get the algo.pagerank.stream to work on my data, but from what I understand there seem to be a bug in how the function returns nodeId's

Following the examples provided for the library, here is my simple query

call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id',
'MATCH (n1:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n1) as source, id(n2) as target'
,{graph:'cypher'}
) yield nodeId, score
with nodeId, score
return nodeId limit 10

And the result

And quite obviously those id's are not at all related to any :DataState or :DisplacementState nodes at all, but rather completely different nodes totally unrelated to the query in issue.

Any suggestions?

michael.hunger · October 9, 2018, 10:18pm

You didn't limit the nodes in the node-list to those labels.

MATCH (n) WHERE n:DataState OR n:DisplacementState RETURN id(n) as id

slygren · October 9, 2018, 10:24pm

Ok I get that, but isn't the id(n) supposed to cover all id's and act as a 'function' to resolve the id's for the next match string (I believe I've seen it used like this by yourself in one example)?

Anyways - any suggestions to how to get the id's to use for only :DataState and :DisplacementState here?

michael.hunger · October 9, 2018, 10:32pm

See my query above.

Yes that's intentional the node list specifies the graph and the relationship-list fills it out.

slygren · October 10, 2018, 7:23am

Ok, so following your answer I tried to be very specific in identifying id's for only nodes that 'participates' in the second query, only this time with a slightly more advanced one like so

call algo.pageRank.stream(
'MATCH (f:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[:MAKE_DISPLACEMENTSTATE]->(n:DisplacementState) with collect(f)+collect(n) as nodes unwind nodes as n return id(n) as id',
'MATCH (n1:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n2) as source, id(n1) as target, count(r) as weight'
,{graph:'cypher'}
) yield nodeId,score
return nodeId

I tested the first query and from what I can see only the source and target node id's (for second query) is returned. However the returned result from pagerank is all 0's, i.e. no nodes/nodeId as expected from any of the queries. Obviously I must be fundamentally mistaken in how to work this thing - could you please explain where/what's wrong?

Btw - the intention here is to run the algorithm and then make a virtual graph from data to stream to gephi.

slygren · October 10, 2018, 10:46am

To the initial question it's worth mentioning the fact that a pagerank returning 'node' as opposed to 'nodeId' does in fact give the expected result

call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id','MATCH (n1:DataState)-[:MAKE_DISPLACEMENTSTATE]->(d1:DisplacementState) RETURN id(n1) as source, id(d1) as target'
,{graph:'cypher'}
) yield node,score 
WITH node, score ORDER BY score DESC limit 10
return node.type as type, score;

And that's even with using the approach of collecting all node id's

MATCH (n) RETURN id(n) as id

Anyone care to elaborate?

slygren · October 10, 2018, 11:05am

Answering myself - turns out the key point here is ordering and limiting results like so

ORDER BY score DESC limit 10

michael.hunger · October 10, 2018, 2:57pm

So you're all good now?

slygren · October 10, 2018, 3:52pm

Not really as even though the last approach works I still don't understand why the second one doesn't. Also I struggle to understand why, when using the match(n) return id(n) pattern and having a more specific 2nd query, the result of the pagerank call is returning all nodes/nodeId's

michael.hunger · October 10, 2018, 7:40pm

As I said the node-query is building up the graph. So you get all nodes in that projected graph.

The relationship-query only adds relationships between nodes that are in that projected graph.

So all id's are returned even if they have no connections, so their PR defaults to the initial value of 0.15

Topic		Replies	Views
algo.pageRank - Managing ID's that are Strings - Help? Thoughts? Graph Algorithms/Graph Data Science	4	888	March 8, 2019
PageRank recommendation query Neo4j Graph Platform migrated	0	191	January 20, 2023
Bug in pagerank Algo? Graph Algorithms/Graph Data Science cypher	0	680	August 6, 2019
Advice/help getting started with recommendation algorithms Graph Algorithms/Graph Data Science	2	1446	November 8, 2018
Neo4j PageRank algorithm is not working Procedures & APOC apoc	6	4057	September 28, 2018

Get Certified in June!

Error in graph-algorithms-algo-3.4.7

Related topics