Passing a list parameter into a cypher projection

We're trying to combine Elastic Search with a neo4j knowledge graph to improve search.

We can use a keyword search (e.g. "tax") to return the top n results from Elastic Search which we set as a parameter called es_results. This helps us "discover" 5 nodes / documents / pieces of content (:Cid) on the graph. This provides a starting point from which we generate a sub-graph by considering some relationships to other closely related content based on text similarity and our recommender system (node2vec trained on the hyperlinks between content and how users move around between site).

The following works and produces a small sub-graph (we can experiment later to increase hops).

:param es_results => ['5ef78674-7631-11e4-a3cb-005056011aef', '5ef79922-7631-11e4-a3cb-005056011aef', '5ef7a51e-7631-11e4-a3cb-005056011aef', '5ef792f3-7631-11e4-a3cb-005056011aef', '5ef7b0e3-7631-11e4-a3cb-005056011aef']

MATCH (cid:Cid)
WHERE cid.contentID IN $es_results
WITH cid
MATCH path = (c1:Cid)<-[r1:HAS_SUGGESTED_ORDERED_RELATED_ITEMS]-(cid)-[r2:HAS_SIMILAR_CONTENT]->(c2:Cid)
RETURN cid, collect(path) as paths;

What we wanted to try next, was through inspection of this sub-graph looking at similar content in the neighbourhood that is popular, by degree centrality for example. I've tried using the guidance in cypher projection to no avail. I'm struggling as to how to pass this sub-graph.

Do I pass parameters as in this example? Yes, that got it to work but not as expected, only 5 nodes were considered.

CALL algo.degree(
  'MATCH (u:Cid) WHERE u.contentID IN $es_results RETURN id(u) as id',
  'MATCH (u:Cid)-[:HAS_SUGGESTED_ORDERED_RELATED_ITEMS|:AS_SIMILAR_CONTENT]->(u2:Cid) RETURN id(u) as source, id(u2) as target',
  {graph:'cypher', write: false, writeProperty: "outDegree", params:{es_results:$es_results}}
)

╒═══════╤════════════╤═══════════════╤═════════════╤═══════╤═══════════════╕
│"nodes"│"loadMillis"│"computeMillis"│"writeMillis"│"write"│"writeProperty"│
╞═══════╪════════════╪═══════════════╪═════════════╪═══════╪═══════════════╡
│5      │822         │0              │-1           │false  │null           │
└───────┴────────────┴───────────────┴─────────────┴───────┴───────────────┘

How about matching on all nodes first to get all Cid ids? Not sure as to why, but worth a try - seems like we're missing something.

CALL algo.degree(
  'MATCH (u:Cid)  RETURN id(u) as id',
  'MATCH (u1:Cid)-[:HAS_SUGGESTED_ORDERED_RELATED_ITEMS|:AS_SIMILAR_CONTENT]->(u2:Cid) WHERE u1.contentID IN $es_results RETURN id(u1) as source, id(u2) as target',
  {graph:'cypher', write: false, writeProperty: "outDegree", params:{es_results:$es_results}}
)

╒═══════╤════════════╤═══════════════╤═════════════╤═══════╤═══════════════╕
│"nodes"│"loadMillis"│"computeMillis"│"writeMillis"│"write"│"writeProperty"│
╞═══════╪════════════╪═══════════════╪═════════════╪═══════╪═══════════════╡
│344735 │134         │24             │-1           │false  │null           │
└───────┴────────────┴───────────────┴─────────────┴───────┴───────────────┘

What we want returned is a list of the most influential nodes in the sub-graph excluding those returned by Elastic Search. The influential hubs in the neighbourhood. This could then provide some relevant alternative content in the relevant search neighbourhood for users to consider.

Any help or alternative ways / better approaches of combing ES and the graph database would be appreciated! :sweat_smile:

Update: tried an alternative approach.

MATCH (cid:Cid)
WHERE cid.contentID IN $es_results
WITH cid
MATCH (cid)-[:HAS_SIMILAR_CONTENT|:HAS_SUGGESTED_ORDERED_RELATED_ITEMS]->(c2:Cid)
RETURN collect(distinct cid.name) AS esResult, c2.name AS base_path, c2.description,
       apoc.node.degree(c2, 'HAS_SIMILAR_CONTENT|HAS_SUGGESTED_ORDERED_RELATED_ITEMS|USER_MOVEMENT') AS totalDegree
ORDER BY totalDegree DESC
LIMIT 5;



Would it be better to used pre-computed centrality score of some kind?

Following up on this, we added some additional filtering by document type to get more desirable results as in the query below. Joe informed us about using multiple-labels so we will look to add document_type as an additional label to our Cids to make the query more performant.

MATCH (cid:Cid)
WHERE cid.contentID IN $es_results
WITH cid
MATCH (cid)-[:HAS_SIMILAR_CONTENT|:HAS_SUGGESTED_ORDERED_RELATED_ITEMS]->(c2:Cid)
WHERE single(document_type IN c2.documentType WHERE document_type = "['detailed_guide']")
RETURN collect(distinct cid.name) AS esResult, c2.name AS base_path, c2.description AS description,
       apoc.node.degree(c2, 'HAS_SIMILAR_CONTENT|HAS_SUGGESTED_ORDERED_RELATED_ITEMS|USER_MOVEMENT') AS totalDegree,
       c2.documentType AS document_type
ORDER BY totalDegree DESC
;