Pearson Similarity

Hi everyone,

I am new to neo4j and I need some assistance.

I am following a blog post - Diversifying your portfolio.

One of the queries requires to call the Pearson Similarity algorithm, i have downloaded the GDS plugin.

My neo4j Desktop version is 5.12.0 and GDS library version is 2.6.0

This is the error message i got.
'There is no procedure with the name gds.alpha.similarity.pearson.write registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.'

Can someone help me out on this?

You are calling with ‘alpha’ in the name. Try removing that, as the procedure may have been promoted.

You can look for the procedure in the browser. Try the following query:

show procedures where name contains “pearson” 

See if you find yours and what the correct path is.

Hi @fatehali786shaikh ,

This blog-post is based on an old version of GDS and unfortunately does not reflect the current state of the library.

In particular, the procedure gds.alpha.similarity.pearson.write has been removed.
You can instead use gds.knn.write in its place, by specifying the similarity metric to be equal to Pearson.
This is achieved by setting the configuration parameter nodeProperties to {input: 'Pearson'}, where input` is the name of your property.

You can read more about the way the algorithm works on the documentation page.

There is one more crucial thing to consider:
Since version 2.0.0 of the library, to execute any algorithm you must first project your graph from neo4j to gds. You can read more about this concept on the documentation page.

Here are the two queries that will allow you to compute the pearson similarity between stocks

(1) Graph projection phase:

CALL gds.graph.project('graph', ['Stock'],'*', {nodeProperties:'close_array'}) YIELD *

(2) Pearson calculation

CALL gds.knn.write('graph', {
    writeRelationshipType: 'SIMILAR',
    writeProperty: 'score',
    topK: 3,
    similarityCutoff:0.2,
    nodeProperties:{close_array:'Pearson'}
}) YIELD *

Depending on the needs of the rest of the tutorial, you might have to project different graphs, and you can use gds.graph.drop to drop existing ones that you do not need anymore.

Let us know if the above solves your problem or if you need any more help,

Best regards,
Ioannis.

3 Likes

Thank you, Ioannis, for your prompt and accurate solution.

I had another query, where the blog follows a Louvain Modularity algorithm. i believe this isn't present in the library too.

The blog shares the following code

CALL gds.louvain.write({
  nodeProjection:'Stock',
  relationshipProjection:'SIMILAR',
  writeProperty:'louvain'
})

It would be a great help to know if there is an alternative to this algorithm?

Hi again @fatehali786shaikh,

The louvain algoritmh is still available ( see the docs for more information). The reason why the above query fails is again due to GDS dropping support for anonymous projections.

So, what you need to do is project another graph, with the SIMILAR relationship type

CALL gds.graph.project(
  'graph2',
  'Stock',
  'SIMILAR'
) 
YIELD *

and then just run louvain on the new graph

CALL gds.louvain.write('graph2', {writeProperty: 'louvain'}) YIELD *

Note that, you can also use mutate mode on the first knn query, which would add the relationships not in the neo4j database, but rather the projected graph (in this case graph), which would then alow you to execute louvain without the need to perform another projection.
These are all covered in the documentation pages I have linked in the topic, and they are in general useful concepts to know when working with GDS to make the user experience as simple as possible.

I hope that this gives you a way forward!

Best regards,
Ioannis.

P.S Your louvain results might be slightly different than what is shown in the post, as we have modified its behavior since that time (I am personally finding five communities instead of four)

1 Like

Thanks for your help again, Ioannis.

I have a quick question as I got stuck here.

I am following this GitHub link - blogs/stock_diversity/Stock diversification analysis.ipynb at master · tomasonjo/blogs · GitHub

the following query when used in neo4j doesnt load/show any results.

MATCH (s:Stock)-[:TRADING_DAY]->(day)
WHERE NOT exists { ()-[:NEXT_DAY]->(day) }
MATCH p=(day)-[:NEXT_DAY*0..]->(next_day)
SET next_day.index = length(p)

Apologies for asking multiple questions.
Really appreciate your support! :)

Hi again @fatehali786shaikh ,

What output were you expecting from this code segment?
When I execute from my neo4j browser, I get the following output
Set 9180 properties, completed after 2123 ms.

This is a SET operation, which assigns node properties, so it shouldn't return anything more besides that. As you can see in that notebook, there is no output following the execution of the query.

Let me know if that clears your confusion.

Best regards,
Ioannis.

Hi @ioannis_panagio

I am unable to get that output as well.

The query is just running endlessly. (below image for reference)

Update: i have got the below error.

Appreciate your continuous support!

Hi again @fatehali786shaikh ,

I am not sure about what could be causing this, probably something is not set up properly.

Does every query (e.g., even a normal MATCH (n) RETURN (n)) fail?

I will try to forward this to someone who is more familiar with these and we'll get back to you.

Best,
Ioannis.

The feedback I got is that potentially you are pushing your machine too heard, and could try breking down the query into smaller chunks.

I would also suggest perhaps dropping the database and starting the tutorial from scratch to see if the problem happens again.

Best,
Ioannis.