I've been trying to get the algo.pagerank.stream to work on my data, but from what I understand there seem to be a bug in how the function returns nodeId's
Following the examples provided for the library, here is my simple query
call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id',
'MATCH (n1:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n1) as source, id(n2) as target'
,{graph:'cypher'}
) yield nodeId, score
with nodeId, score
return nodeId limit 10
And quite obviously those id's are not at all related to any :DataState or :DisplacementState nodes at all, but rather completely different nodes totally unrelated to the query in issue.
Ok I get that, but isn't the id(n) supposed to cover all id's and act as a 'function' to resolve the id's for the next match string (I believe I've seen it used like this by yourself in one example)?
Anyways - any suggestions to how to get the id's to use for only :DataState and :DisplacementState here?
Ok, so following your answer I tried to be very specific in identifying id's for only nodes that 'participates' in the second query, only this time with a slightly more advanced one like so
call algo.pageRank.stream(
'MATCH (f:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[:MAKE_DISPLACEMENTSTATE]->(n:DisplacementState) with collect(f)+collect(n) as nodes unwind nodes as n return id(n) as id',
'MATCH (n1:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n2) as source, id(n1) as target, count(r) as weight'
,{graph:'cypher'}
) yield nodeId,score
return nodeId
I tested the first query and from what I can see only the source and target node id's (for second query) is returned. However the returned result from pagerank is all 0's, i.e. no nodes/nodeId as expected from any of the queries. Obviously I must be fundamentally mistaken in how to work this thing - could you please explain where/what's wrong?
Btw - the intention here is to run the algorithm and then make a virtual graph from data to stream to gephi.
To the initial question it's worth mentioning the fact that a pagerank returning 'node' as opposed to 'nodeId' does in fact give the expected result
call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id','MATCH (n1:DataState)-[:MAKE_DISPLACEMENTSTATE]->(d1:DisplacementState) RETURN id(n1) as source, id(d1) as target'
,{graph:'cypher'}
) yield node,score
WITH node, score ORDER BY score DESC limit 10
return node.type as type, score;
And that's even with using the approach of collecting all node id's
Not really as even though the last approach works I still don't understand why the second one doesn't. Also I struggle to understand why, when using the match(n) return id(n) pattern and having a more specific 2nd query, the result of the pagerank call is returning all nodes/nodeId's