Neo4j PageRank algorithm is not working

apoc

(Babu Ganesh0708) #1

Hi All,

I am trying to use Pagerank algorithm in order to determine the rough estimate of how important the software data is,

MATCH (s:Software)
WITH collect(s) AS n
CALL apoc.algo.pageRank(n) YIELD node, score
RETURN node.Displayname, score ORDER BY score DESC LIMIT 10

The above image shows the score value is "0.0" for top 10 records and not sure why the rank value is not getting calculated based on data and also I am getting warning message by saying "the query used a deprecated procedure ('apoc.algo.pageRank') is no longer supported.

Please correct me If I am doing anything wrong in the query and kindly share the right documentation to use pagerank algorithm

Is it the right documentation to follow graph algorithm?

https://neo4j.com/docs/graph-algorithms/current/algorithms/page-rank/

Note:- I am using neo4j 3.4.1 community version with apoc jar (3.4.0.3)

Thanks,
Ganeshbabu R


(Mark Needham) #2

The documentation you're referring to is for the Graph Algorithms library so you want to use the procedures from that plugin - they all start with algo. rather than apoc.

In this case it'd be:

CALL algo.pageRank.stream("Subject", null) YIELD nodeId, score
RETURN algo.getNodeById(nodeId), score
ORDER BY score DESC
LIMIT 10

(Babu Ganesh0708) #3

I tried executing the below command,

CALL algo.pageRank.stream('Software','installed', {iterations:20, dampingFactor:0.85, concurrency:4}) YIELD node, score
RETURN algo.getNodeById(node), score
ORDER BY score DESC
LIMIT 10

but I am getting the error as

Do I need to use the latest jar or should I need to make any changes in neo4j configuration file?

Please let me know your thoughts.

Regards,
Ganeshbabu R


(Michael Hunger) #4

You have to install the graph algorithms library, e.g. in Neo4j Desktop.


(Babu Ganesh0708) #5

Thanks got this link and followed the step to install graph algorithm library outside of neo4j desktop and now its worked fine.

Regards,
Ganeshbabu R


(Babu Ganesh0708) #6

@michael.hunger/@mark.needham

As I am starting learning neo4j and I am trying to build graph using my asset data. Data which has both software and hardware informations and below is the sample json data,

{
	"DisplayName": "Google Chrome",
	"DisplayVersion": " 67.0.3396.99",
	"Publisher": " Google Inc.",
	"InstallDate": "20160629",
	"EstimatedSize": "",
	"HostName": "LTP-1001",
	"Manufacturer": "Dell Inc.",
	"Model": "Vostro 3458",
	"CPU": "Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz",
	"RAM": "3 GB",
	"IPAddress": "10.101.52.199 fe80::1040:f590:6d5:4346",
	"HDDCapacity": "465.66",
	"HDDSpace": "87.44 %",
	"OperatingSystem": "Microsoft Windows 7 Professional ",
	"ServicePack": "0",
	"LastReboot": "20180801130720.109999+330"
}

Note:- similarly the data has different softwares mapped to different hardware

Below is the cypher query I used to create nodes,

CALL apoc.load.json("file:/home/test/AssetInfo.json") YIELD value AS data
WITH data WHERE data.DisplayName <> 'null' AND data.DisplayName <> ''
MERGE (s:Software {DisplayName: data.DisplayName})
MERGE (h:Hardware {HostName: data.HostName})
SET h.Manufacturer = data.Manufacturer
SET h.Model = data.Model
SET h.CPU = data.CPU
SET h.RAM = data.RAM
SET h.IPAddress = data.IPAddress
SET h.HDDCapacity = data.HDDCapacity
SET h.HDDSpace = data.HDDSpace
SET h.OperatingSystem = data.OperatingSystem
SET h.ServicePack = data.ServicePack
SET h.UserloggedIn = data.UserloggedIn
SET h.LastReboot = data.LastReboot
WITH s, h, data, COUNT(*) AS count
MERGE (s)-[i:installed]->(h) ON CREATE SET i.DisplayVersion = data.DisplayVersion, i.Publisher = data.Publisher, i.InstallDate = data.InstallDate, i.EstimatedSize=data.EstimatedSize, i.count = count
RETURN s,h

Now I want to build the graph which should show the most important software connected to the hardware device and I am using pagerank graph algorithm to get the result but I am unable to get the right pagerank score.

Below is the pagerank query I executed and got the response and the score value is same for all the softwares name and not able to understand how the score values get generated and correct me If am doing anything wrong in the setup.

CALL algo.pageRank.stream('Software', 'installed', {iterations:20, dampingFactor:0.85})
YIELD nodeId, score
MATCH (node) WHERE id(node) = nodeId
RETURN node.DisplayName AS page,score
ORDER BY score DESC LIMIT 10

I am following this link to understand the importance of PageRank Algorithm,

https://neo4j.com/docs/graph-algorithms/3.4/algorithms/page-rank/

I also tried using cypher projection in pagerank and below is the query and the response,

CALL algo.pageRank.stream(
'MATCH (s:Software)-[:installed]->(h:Hardware) RETURN id(s) as source, id(h) as target') YIELD node,score with node,score order by score desc limit 10
RETURN node.HostName,node.DisplayName, score

I am little confused again how the pagerank score is calculated in this case and please share your thoughts and help me to resolve this.

Regards,
Ganeshbabu R


(Mark Needham) #7

Hi,

With this query:

CALL algo.pageRank.stream(
'MATCH (s:Software)-[:installed]->(h:Hardware) RETURN id(s) as source, id(h) as target') YIELD node,score with node,score order by score desc limit 10
RETURN node.HostName,node.DisplayName, score

As you haven't pass the config parameter graph: "cypher" it actually calculates PageRank across all nodes and labels. If you want to do that more explicitly you could do this:

CALL algo.pageRank.stream(null, null) 
YIELD node,score 
with node,score order by score desc limit 10
RETURN node.HostName,node.DisplayName, score

The reason you get scores of 0.15 when you pass in Software and installed is that there are no relationships between Software nodes and therefore the algorithms returns the initial PageRank assigned to each node (which is 0.15)