Advice/help getting started with recommendation algorithms


(Jed Christiansen) #1

Hey, everyone. I'm trying to prototype/test some algorithms for some custom recommendation systems, but my graph isn't as simple as the examples here: https://neo4j.com/docs/graph-algorithms/current/algorithms/centrality/

My graph has a bunch of startups (with specific categories) and funding data (funding rounds, each with 1-n investor nodes). So something like this:

(Category)-[:]-(Company)-[:]-(FundingRound)-[:]-(Investor)

My goal:
What I'd like to do is given a set of categories/characteristics, recommend a list of investors.

It feels like I should be able to specify a subgraph (leading from a specific set of categories through to the investors) and then run some sort of PageRank or Centrality algorithm on that, but I feel I'm getting stuck on a combination of theory and execution. :slight_smile:

Here's a sample approach:

CALL algo.pageRank.stream(
  'MATCH (cat:Category {name:"artificial-intelligence"})<-[]-(c:Company)<-[]-(f:FundingRound)<-[]-(inv:Investor) RETURN inv',
  {graph:'cypher'}
)
YIELD node,score with node,score order by score desc limit 20

This is a particularly simple example - I plan to expand this to include multiple categories once I figure out if/how I can do this.

But I'm currently getting this error:

Neo.ClientError.Statement.SyntaxError: Type mismatch: expected String but was Map (line 3, column 3 (offset: 163))
"  {graph:'cypher'}"

(with a caret under the "c" in "cypher")

So I guess I have two questions:

  1. Am I on the right track with my overall approach, or have I totally missed the plot?
  2. Are there obvious syntax errors I'm missing, or something in the execution I don't understand?

I'd really appreciate any/all help!


(Bratanic Tomaz) #2

When you are using graph:'cypher' you need to provide two statements, first that returns id of nodes and the second one that returns source and target id of relationships.
Check the documentation for more.

If you can tell us more how your projected graph would look like we can help you more.
Another concept that might be of interested is categorical pagerank that I've written a blog post about it.


(Jed Christiansen) #3

Ahhhh!! Thank you! That was the key that I somehow missed. This query (and similar variations) have worked for me:

CALL algo.pageRank.stream(
  'MATCH (inv:Investor) RETURN id(inv) as id',
  'MATCH (c:Company)-[]->(btow:BToWhat {name:"b2b-smb"})
   MATCH (inv1:Investor)-[]->(:FundingRound)-[]->(c:Company)<-[]-(:FundingRound)<-[]-(inv2:Investor)
   RETURN id(inv1) as source, id(inv2) as target',
  {graph:'cypher'}
)
YIELD nodeId, score
MATCH (inv:Investor) where id(inv) = nodeId
RETURN inv.investorName, score
ORDER BY score DESC

So what I feel I'm getting from this graph is the most networked investors who have all invested in companies that sell B2B (to SMBs). The top of the list are the investors that have invested in either a lot of B2B-SMB companies, or who have invested in B2B-SMB companies that have a lot of other investors.

That's not entirely what I was aiming for, but I've got a MUCH better idea on how to use Neo4j for this now. :slight_smile: