Resources for understanding "Graph Data Science" execution modes

Hi,
I am very new in the subject of GDS and don't have a database (yet) to test some things.

I am having difficulties in understanding the execution modes stream, write, mutate and stats. When to use one and when the other? Any use cases and reasons the specific mode was used? I understand the concept but an example will really help.

Also, when to use stored graphs and when to use anonymous graphs. Again, are there any example use cases and reasons for choosing one projection over the other?

I know it's a long discussion, that's why I am asking for resources.

Thanks.

I am having difficulties in understanding the execution modes stream , write , mutate and stats .

  • stream - this one returns back the results to the client. So you'd use this if you wanted to see the results and not use them again later on, or if you want to do some processing on the results yourself.
  • write - this one stores the results back to the database. So you'd use this if you wanted to query the results afterwards. I find that I use the stream version when I'm playing around and the write version when I'm happy that it does what I want.
  • stats - returns a histogram of values as if the algorithm had been run
  • mutate - this is a more advanced mode for updating an in memory projected graph

You can read more about the modes here: https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/#running-algos-stats

Also, when to use stored graphs and when to use anonymous graphs . Again, are there any example use cases and reasons for choosing one projection over the other?

Both of these modes load an in memory graph.

When you're getting started I would use the anonymous graphs. The anonymous graph approach loads the Neo4j graph into memory for each algorithm that you run.

The stored graph technique is a more advanced feature. I would use that when you want to run multiple algorithms over the same graph.

Thanks for the extensive reply.

I guess what bothers me is the speed of the results so I won’t know till I try.
I am going to do a recommendation page, think of it exactly like Instagram which if you go to “browse” you see personalised recommendations for other accounts you may want to follow.

So I’m going to use the personalised pagerank algo to provide recommendations to my users (using sourcenodes). For starters I’m thinking of stream mode on an anonymous graph. What bothers me is, what happens if you have millions of users (and each one follows 50+ accounts etc). Won’t that algo become slow especially as you grow?

For this use case I think you'd want to have a named graph then. With a named graph it means that you are doing the loading of the in memory projected graph up front.

So in the diagram above it will mean that steps 1 & 2 are done at the beginning and won't need to be re-run every time that you run the PageRank algorithm.

I haven't tried running the PageRank algorithm in a more 'real time' like scenario so I'd suggest testing it out on a sample dataset to make sure that you're going to get the performance that you need.