Trying to write a Cypher query for node similarity to design a movie recommendation system

node_labels
1 ["Movie"]
2 ["ProductionCompany"]
3 ["Genre"]
4 ["SpokenLanguage"]
5 ["ProductionCountry"]
6 ["Person"]
7 ["User"]

relationship_type
1 "RATING"
2 "ACTED_IN"
3 "PRODUCED_IN"
4 "PRODUCED_BY"
5 "CREWED_IN"
6 "HAS_GENRE"
7 "HAS_SPOKEN_LANGUAGE"
8 "SIMILAR"

I have the above nodes and relationships and have the required gds projection as well.

CALL gds.graph.project('movies', 
  ['User', 'Movie', 'ProductionCompany', 'Genre', 'SpokenLanguage', 'Country', 'Person'], 
  {
    RATING: {orientation: 'UNDIRECTED'},
    HAS_GENRE: {orientation: 'UNDIRECTED'},
    HAS_SPOKEN_LANGUAGE: {orientation: 'UNDIRECTED'},
    ACTED_IN: {orientation: 'UNDIRECTED'}
  }
);

I am trying to use Node Similarity to come up with possible movie recommendations within a Genre or similar to more movies. I seem to be a little lost when applying the gds.alpha.nodeSimilarity.filtered.stream or other APIs. I would love some clarity on how to frame my cypher query or just possible areas where I might be going wrong.

Hey @abhishekshankar88 ,
for movie recommendations, you want to find (:Movie)-[:SIMILAR_TO]-(:Movie).

With node similarity, you will find movies with the most common neighbor nodes.
I would try CALL gds.nodeSimilarity.filtered.stream('movies', {sourceNodeFilter:'Movie' , targetNodeFilter:'Movie' } ).

An alternative approach would be to first create node embeddings such as using FastRP (Fast Random Projection - Neo4j Graph Data Science).
This can be combined with filtered KNN to find similar Movies. I would assume this could get you better recommendations as you look at more than the intermediate neighbors.
You can find this workflow at End-to-end workflow - Neo4j Graph Data Science.

Hope this gives a better starting point :)

Hello @florentin_dorre thanks for the input. I tried doing the FastRP and I got similar movies, but I also I wanted to know if its possible to add multiple node filters. Like if I want to find the most similar movies inside a particular genre for example? I tried adding a list to the NodeFilter options but was met with errors. Any advice with respect to this? I have added my code below

CALL gds.graph.project('movies', 
              ['Movie','User','Genre','SpokenLanguage'], 
              {
                RATING:{orientation: 'UNDIRECTED',properties: 'rating'},
                HAS_GENRE:{orientation: 'UNDIRECTED'},
                HAS_SPOKEN_LANGUAGE:{orientation:'UNDIRECTED'}
              }
            );

CALL gds.fastRP.mutate(
          'movies',
          {
            embeddingDimension: 100,
            randomSeed: 42,
            mutateProperty: ['similarities'],
            embeddingDimension: 4,
            iterationWeights: [1, 1, 1, 1]
          }
        )
        YIELD nodePropertiesWritten;

MATCH (m1:Movie)-[:HAS_GENRE]->(:Genre {name: "Fantasy"})<-[:HAS_GENRE]-(m2:Movie)-[:SIMILAR]-(m3:Movie)
WHERE m1 <> m3 AND m1 <> m2 AND m2 <> m3 // Ensure distinct movies
RETURN DISTINCT m3.title AS SimilarMovie

This was one approach I was trying do let me know where I might be going wrong?

In your code example, you must have omitted the call to gds.knn.write?

I dont understand why m1, m2 and m3 should be distinct.
Reading m2 as the query movie, m3 as the recommendation, shouldnt you make sure, that m2 and m3 have the requested genre?

You could also dictionary encode the genres and use them as feature properties for the fastRP embedding.

Hey @florentin_dorre thanks for your reply. I was able to dabble around with gds.knn.write and create a similarity relationship among the movies. Yes I can see that I have made an error in the query and I have corrected it (I was just dabbling with the query so was not of much importance). Thanks for your help.

1 Like