GDS graph catalog function to get all loaded node Ids

I'm trying to iterate over all nodes in a named virtual graph (created by gds.graph.create), and I can't seem to find a function that streams/returns all nodeIds back without doing any additional computation.
For example:

YIELD nodeId, score

will return what I need, but will perform additional work (computing the score for each node).
I only need to get the nodeIds from the graph.
Am I missing something or do I need to fork gds and write my own algorithm for this?


Hello @naveh :slight_smile:

Maybe i misunderstood but you don't need GDS to get node id, if you only want internal Neo4j id of nodes, there is the function id().


If you don't need to use GDS for something else, then @Cobra's suggestion of using id() will work for you.

If you do need a list of IDs in your analytics graph, you can use gds.graph.streamNodeProperties (docs) and specify nodeId

Thanks for your replies!
Unfortunately I do need the graph for other GDS functions later on.
When I use gds.graph.streamNodeProperty or gds.graph.streamNodeProperties I get this error:

Failed to invoke procedure gds.graph.streamNodeProperty: Caused by: java.lang.IllegalArgumentException: No node projection with property key(s) ['nodeId'] found.

For testing purposes, the graph is created with all nodes, and all relationship of a certain type, like this:
CALL gds.graph.create('myGraph', '*', 'RELATIONSHIP_TYPE', {})

I'm also looking for an answer to this. I think this rewording might be good for clarification:

How do we create a GDS graph (e.g., using a native projection) and then simply retrieve the nodes and relationships in that graph without applying a specific graph algorithm?

gds.graph.streamNodeProperty only works once you apply an algorithm to the named graph, but I need to instead retrieve the contents of the graph for use in an external graph machine learning framework.

gds.graph.export and gds.beta.graph.export.csv are unfortunately not suitable due to the client/server configuration of our Neo4j database.

If you're not running any GDS procedures or algorithms, do you need to use the in-memory graph? Or is it simpler just to use the cypher id() function to retrieve the data?

Sorry, I totally missed this reply!

I'm not sure how the id() function would solve my use case. The database I'm working with contains in the order of 1M nodes and 2M relationships, and I need a way to rapidly stream native projections of that database (containing, for example, 700k nodes and 1.2M relationships) into an external machine learning model implemented in Python.

The only way I can see the id() function helping is if I were to tediously build Cypher queries that identified a spanning tree of each of the nodes in my native projection, but that seems both computationally infeasible and inflexible.

To make it a little more concrete, the database I'm working with has nodes corresponding to different types of biological entities (chemicals, genes, metabolic pathways, diseases, etc) and I'm trying to extract the subgraph containing 3 specific node types ("Chemical", "Gene", and "Assay") and all of the relationships linking those nodes.