Resources for understanding "Graph Data Science" execution modes

admin3 · May 22, 2020, 9:13am

Hi,
I am very new in the subject of GDS and don't have a database (yet) to test some things.

I am having difficulties in understanding the execution modes stream, write, mutate and stats. When to use one and when the other? Any use cases and reasons the specific mode was used? I understand the concept but an example will really help.

Also, when to use stored graphs and when to use anonymous graphs. Again, are there any example use cases and reasons for choosing one projection over the other?

I know it's a long discussion, that's why I am asking for resources.

Thanks.

mark.needham · May 22, 2020, 3:15pm

I am having difficulties in understanding the execution modes stream , write , mutate and stats .

stream - this one returns back the results to the client. So you'd use this if you wanted to see the results and not use them again later on, or if you want to do some processing on the results yourself.
write - this one stores the results back to the database. So you'd use this if you wanted to query the results afterwards. I find that I use the stream version when I'm playing around and the write version when I'm happy that it does what I want.
stats - returns a histogram of values as if the algorithm had been run
mutate - this is a more advanced mode for updating an in memory projected graph

You can read more about the modes here: Running algorithms - Neo4j Graph Data Science

Also, when to use stored graphs and when to use anonymous graphs . Again, are there any example use cases and reasons for choosing one projection over the other?

Both of these modes load an in memory graph.

When you're getting started I would use the anonymous graphs. The anonymous graph approach loads the Neo4j graph into memory for each algorithm that you run.

The stored graph technique is a more advanced feature. I would use that when you want to run multiple algorithms over the same graph.

admin3 · May 22, 2020, 6:13pm

Thanks for the extensive reply.

I guess what bothers me is the speed of the results so I won’t know till I try.
I am going to do a recommendation page, think of it exactly like Instagram which if you go to “browse” you see personalised recommendations for other accounts you may want to follow.

So I’m going to use the personalised pagerank algo to provide recommendations to my users (using sourcenodes). For starters I’m thinking of stream mode on an anonymous graph. What bothers me is, what happens if you have millions of users (and each one follows 50+ accounts etc). Won’t that algo become slow especially as you grow?

mark.needham · May 22, 2020, 8:16pm

For this use case I think you'd want to have a named graph then. With a named graph it means that you are doing the loading of the in memory projected graph up front.

So in the diagram above it will mean that steps 1 & 2 are done at the beginning and won't need to be re-run every time that you run the PageRank algorithm.

I haven't tried running the PageRank algorithm in a more 'real time' like scenario so I'd suggest testing it out on a sample dataset to make sure that you're going to get the performance that you need.

Topic		Replies	Views
Does gds.graph.create(..) create something or is it more like saving the query Neo4j Graph Platform data-science	2	653	May 11, 2020
How to stream nod labels from the graph projection? Graph Data Science / Graph Analytics operations	6	385	March 30, 2023
Neo4j and Streams Newbie Questions kafka	1	849	January 29, 2019
Visualization of output of GDS libraries Graph Data Science / Graph Analytics knowledge-base , visualization-tagged , graph-data-science , gds	7	243	November 7, 2024
KNN New GDS production changes - How to write to the default db Graph Data Science / Graph Analytics cypher	5	306	April 20, 2022

Take the Course Then Join The Aura Agent Hackathon

Resources for understanding "Graph Data Science" execution modes

Related topics

Take the Course Then Join
The Aura Agent Hackathon