Gds page rank issue

Please keep the following things in mind:

Please format code + Cypher statements with the code </> icon, it's much easier to read.

Please provide the following information if you ran into a more serious issue:

  • version 2025.02, browser version , GDS -12.14 version ,
  • Implementing the gds page-rank algorithm
  • dataset - yellowtrip dataset 2022.03 parquet file

I have encountered a prob when executing the page-rank algo, previously I have uploaded and created the graph wrong. I considered every dropoff and pickup point as one relation which resulted in 42 nodes and 702 relations for the dataset.

now ,after update I was able to add multiple relations for nodes which results in the correct relations 1530 however in both cases the page-rank returns the same score for different graphs.I noticed weighted property is missing.

I am unable to add a weighted property to the pagerank.

My graph creation

#code

with self.driver.session() as session:
        for _, row in trips.iterrows():
        	pickup_location = row['PULocationID']
        	dropoff_location = row['DOLocationID']
        	pickup_time = row['tpep_pickup_datetime']
        	dropoff_time = row['tpep_dropoff_datetime']
        	distance = row['trip_distance']
        	fare = row['fare_amount']

        # Create Location nodes for Pickup and Dropoff locations if they don't exist
        	session.run(
            "MERGE (p:Location {id: $pickup_location}) "
            "SET p.name = $pickup_location",
            pickup_location=pickup_location
        	)

        	session.run(
            "MERGE (d:Location {id: $dropoff_location}) "
            "SET d.name = $dropoff_location",
            dropoff_location=dropoff_location
        	)

        # Create TRIP relationship between Pickup and Dropoff locations
        	session.run(
            "MATCH (p:Location {id: $pickup_location}), (d:Location {id: $dropoff_location}) "
            "CREATE (p)-[:TRIP {pickup_dt: $pickup_time, dropoff_dt: $dropoff_time, "
            "distance: $distance, fare: $fare}]->(d)",
            pickup_location=pickup_location,
            dropoff_location=dropoff_location,
            pickup_time=pickup_time,
            dropoff_time=dropoff_time,
            distance=distance,
            fare=fare
        	)
        # Count nodes
        result = session.run("MATCH (n) RETURN count(n) AS num_nodes")
        num_nodes = result.single()["num_nodes"]
        print(f"Total Nodes: {num_nodes}")

        # Count relationships
        result = session.run("MATCH ()-[r]->() RETURN count(r) AS num_relationships")
        num_relationships = result.single()["num_relationships"]
        
        result = session.run("MATCH (a)-[:TRIP]->(b) RETURN COUNT(*) AS num_edges")
        num_edges = result.single()["num_edges"]

My pagerank query

with self._driver.session() as session:
# Create the in-memory graph for GDS
session.run("CALL gds.graph.project('myGraph', 'Location', { TRIP:{ properties:'distance' } })")

        print("running qeuries now")
        result = session.run("CALL gds.graph.list() YIELD graphName, nodeCount, relationshipCount")
        print("called the query")
        for record in result:
            print(f"Graph: {record['graphName']}, Nodes: {record['nodeCount']}, Relationships: {record['relationshipCount']}")
        
        print("completed now moving on to the graph")
        
        
        # Run the PageRank algorithm
        query = f"""
        CALL gds.pageRank.stream('myGraph', {{
            maxIterations: {max_iterations},
            dampingFactor: 0.85,
            relationshipWeightProperty: 'distance'
        }})
        YIELD nodeId, score
        RETURN gds.util.asNode(nodeId).name AS location, score
        ORDER BY score DESC
        """
        result = session.run(query)
        
        rankings = result.data()

        if not rankings:
            return None, None  # Handle empty results

        max_rank = max(rankings, key=lambda x: x['score'])
        min_rank = min(rankings, key=lambda x: x['score'])
        print("lowest rank nodes",min_rank)
        print("higest rank nodes",max_rank)
        print("Rankings",rankings)
        
        return max_rank, min_rank

q1. do I have to have a weighted graph to run pagerank with weight because the scores given for reference do not match what I get ?
q2. i used distance as weighted property but this fails I could not find a reference on how to add the weighted property to the graph creation.I know i have added the weight property feature in pagerank gds correct.
q3. could data types affect the graph does it need to be a specific data type ? do they all have to same type

NO matter what I do despite the relationships have changed I can't seem to get the correct page rank values . I know that here I am creating directed graph.

Hi @alokesh1,

Would it be possible to share some more information?

As far as I can see, both the native project and the pagerank query appear correct (the pagerank has {{ and }} in the configuration when the normal format is { and } but I do not know if that is something python specific).

Could you please explain how "i used distance as weighted property but this fails" fails for Q2. Does it give invalid results? Does it throw some kind of error?

As for the rest of your questions
Q1. If no weight property is specified, pagernk assumes. weight of 1.0 for all relationships. What do you mean with "scores do not match" ?
Q3. Data accepted should jut be numbers, as long as the input is not for example array this should not matter for pagerank.

If you give me the answers, I hope I can help you more. Also please have a look at documentation it has some detailed examples on how to load a projected graph and run queries on it .

Best regards,
Ioannis.

New to neo4j.

It throws the following error
#code

Traceback (most recent call last):
File "/cse511/interface.py", line 116, in
main()
File "/cse511/interface.py", line 109, in main
max_rank, min_rank = interface.pagerank(20, "distance")
File "/cse511/interface.py", line 62, in pagerank
result = session.run("CALL gds.graph.list() YIELD graphName, nodeCount, relationshipCount")
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/session.py", line 310, in run
self._auto_result._buffer_all()
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/result.py", line 459, in _buffer_all
self._buffer()
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/result.py", line 448, in _buffer
for record in self:
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/result.py", line 398, in iter
self._connection.fetch_message()
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py", line 184, in inner
func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt.py", line 864, in fetch_message
res = self._process_message(tag, fields)
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt5.py", line 1208, in _process_message"/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt5.py", line 1208, in _process_message
response.on_failure(summary_metadata or {})
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
raise self._hydrate_error(metadata)
neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure gds.pageRank.stream: Caused by: java.lang.IllegalArgumentException: Relationship weight property distance not found in relationship types ['TRIP']. Properties existing on all relationship types: }

response.on_failure(summary_metadata or {})

File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
raise self._hydrate_error(metadata)
neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure gds.graph.project: Caused by: java.lang.NoSuchMethodError: 'void org.neo4j.internal.kernel.api.Read.relationshipProperties(long, long, org.neo4j.storageengine.api.Reference, org.neo4j.storageengine.api.PropertySelection, org.neo4j.internal.kernel.api.PropertyCursor)'}
The command '/bin/sh -c neo4j start && python3 interface.py && neo4j stop' returned a non-zero code: 1

this is the error with code shown above

however when i change the code,ie the graph now has no weighted property

session.run("CALL gds.graph.project('myGraph', 'Location','TRIP')")

and use weighted property for pagerank calculation only

CALL gds.pageRank.stream('myGraph', {{
maxIterations: {max_iterations},
dampingFactor: 0.85,
relationshipWeightProperty: 'distance'
}})
Isnt the graph structure supposed to assume weights 1 for each connection if not specified
but this fails with

"/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt5.py", line 1208, in _process_message
response.on_failure(summary_metadata or {})
File "/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
raise self._hydrate_error(metadata)
neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure gds.pageRank.stream: Caused by: java.lang.IllegalArgumentException: Relationship weight property distance not found in relationship types ['TRIP']. Properties existing on all relationship types: }

this is i understand there is no weight property specified.

Q. how can the pagerank score remain the same after the relationships have increased in number.

These are the erros that are thrown please let me know if anything else is needed

I have added the properties distance to the graph yet it fails.

Hi @alokesh1,

Thanks for reporting back to me.

Let me answer your second question first,

for the case of unweighted graphs:
relationshipWeightProperty: 'distance'
should not be part of the query, You can just run

CALL gds.pageRank.stream('myGraph', {{
maxIterations: {max_iterations},
dampingFactor: 0.85
}})

And pagerank will automatically assign the 1.0 weight to the relationships.

For the first question, it says that property distance cannot be found.

Just to be clear in case I've missed anything could you try running the following projection cypher query?

MATCH (source:Location)-[r:TRIP]->(target:Location)
RETURN gds.graph.project(
  'myGraph',
  source,
  target,
  { relationshipProperties: r { .distance } }
)

Then, you can call

CALL gds.graph.relationshipProperties.stream('myGraph',['distance']) YIELD propertyValue RETURN propertyValue LIMIT 1

This should return a numerical value if the relationship has been projected correctly.
If so, your query will work normally afterwards.

Otherwise, to me, it suggest that distance does not exist in TRIP in the first place, which can verify by doing

MATCH ()-[r:TRIP]->() RETURN r.distance

for example.

I do not understand the question below:

Q. how can the pagerank score remain the same after the relationships have increased in number.

Good luck and I'll be waiting for your answers to see how it goes
Ioannis.

Thank you @ioannis_panagio I was able to load the weighted graph now, the page-rank match the values expected perfectly.

I do have another small issue related to neo4j security I am unable to set a password for neo4j after its downloaded ,I am running these commands via a docker and some other files however I am unable to set the an initial password for the neo4j application.

Instead I have resorted to disabling the authentication , like so

RUN sed -i 's/^#dbms.security.auth_enabled=.*/dbms.security.auth_enabled=false/' /etc/neo4j/neo4j.conf

I did try to use the following 2 different commands

RUN echo "NEO4J_AUTH=neo4j/mypassword" >> /etc/environment

RUN echo "NEO4J_AUTH=neo4j/mypassword"
both fail to reset the default password.

The command below is how I install the neo4j i am not allwoed to change any of these commands.

Run the data loader script

RUN chmod +x /cse511/data_loader.py &&
neo4j start &&
python3 data_loader.py &&
neo4j stop

But I still fail to reset the password after installation which also happens through docker , the project requires that all tasks happen via docker ie installation , setup and loading data etc.

If you do have a suggestion on this please let me know. thank you anyway awaiting you response :slight_smile:

Hi again @alokesh1,

Happy to hear it has worked out!

For your new question, I'd suggest you open a new topic with a more related title so that someone more familiar can see it and answer.

Best,
Ioannis.

1 Like