Hey,
Yeh that makes sense. Sorry for the delayed reply - I've had this tab open and have been thinking of the best way to do this. If we did something like this I think it'd work?
MATCH (user:User)-[:HAS_TAG]->(tag)
WITH {item:id(user), categories: collect(id(tag))} as data
WITH collect(data) AS userTags
MATCH (post:Post)-[:HAS_TAG]->(tag)
WITH userTags, {item:id(post), categories: collect(id(tag))} as data
WITH userTags, collect(data) AS postTags
WITH userTags, postTags,
[userTag in userTags | userTag.item] AS sourceIds,
[postTag in postTags | postTag.item] AS targetIds
CALL algo.similarity.jaccard.stream(userTags + postTags, {topK: 1, similarityCutoff: 0.0, sourceIds: sourceIds, targetIds: targetIds})
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.getNodeById(item1).name AS from, algo.getNodeById(item2).name AS to, similarity
ORDER BY from
And then as you suggested - if you don't specify sourceIds
or targetIds
it'll assume you want to use all ids.
For the weight based similarity procedures we have support for Cypher statements. e.g. for Cosine similarity - Similarity - Neo4j Graph Data Science
WITH "MATCH (person:Person)-[likes:LIKES]->(c)
RETURN id(person) AS item, id(c) AS category, likes.score AS weight" AS query
CALL algo.similarity.cosine(query, {
graph: 'cypher', topK: 1, similarityCutoff: 0.1, write:true
})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p95
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
In terms of keeping the functions reasonably uniform I guess we could make a version that takes in a single query + target and source ids like this:
WITH "MATCH (user:User)-[:HAS_TAG]->(tag)
RETURN id(user) AS item, id(c) AS category
UNION
MATCH (post:Post)-[:HAS_TAG]->(tag)
RETURN id(post) AS item, id(c) AS category" AS query
MATCH (p:Post)
WITH query, collect(id(p)) as targetIds
MATCH (u:User)
WITH query, targetids, collect(id(u)) AS sourceIds
CALL algo.similarity.cosine(query, {
graph: 'cypher', topK: 1, similarityCutoff: 0.1, write:true, sourceIds; sourceIds, targetIds: targetIds
})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p95
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
That doesn't actually look very nice though to be honest! Perhaps instead we could have the sourceIds and targetIds treated as a Cypher query too if you have graph: "cypher"
?
WITH "MATCH (user:User)-[:HAS_TAG]->(tag)
RETURN id(user) AS item, id(c) AS category
UNION
MATCH (post:Post)-[:HAS_TAG]->(tag)
RETURN id(post) AS item, id(c) AS category" AS query,
"MATCH (u:User) RETURN id(u) AS item" as sourceQuery,
"MATCH (p:Post) RETURN id(p) AS item" as targetQuery
CALL algo.similarity.cosine(query, {
graph: 'cypher', topK: 1, similarityCutoff: 0.1, write:true, sourceIds; sourceQuery, targetIds: targetQuery
})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p95
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
I guess to start we can just do the no Cypher query version though...