Merge nodes within a larger graph on a given relation value

Hello everybody!

I recently started using Neo4j in my application and I'm working on a rather complex query. So far I managed to get it to work by breaking it down into smaller queries and gluing it together with some Ruby. I'm guessing there has to be a way to do it all in Cypher but haven't yet figured out how to do it and I hope you'll be able to help me.

In my app, I have Item nodes and two types of relations between them: :SIMILAR_TO and :CONNECTED_TO. The :SIMILAR_TO relation has a Float attribute called value.

The requirement is that when two Items are similar enough, they are considered identical. Basically, given a number ($some_number), I need to collapse all the nodes that have a :SIMILAR_TO between them with a value less than that number, and then I render the resulting graph with all Item nodes for the selected IDs (in $item_ids) and only the :CONNECTED_TO relations between them.

This is how I collapse the 'identical' nodes:

MATCH (i:Item)-[s:SIMILAR_TO]->(j:item)
WHERE s.value < $some_number
AND id(i) IN $item_ids
AND id(j) IN $item_ids
WITH i + collect(j) AS identical
CALL apoc.nodes.collapse(identical, { properties: 'combine' })
YIELD from, rel, to
RETURN from, rel, to

My problem with this query is that:


(1) it leaves out other items with the provided IDs if they don't have :SIMILAR_TO relation satisfying the value criterion, so they end up orphaned because the end node IDs on :SIMILAR_TO don't match the IDs of the virtual nodes, with negative IDs (at least that's what I think is happening), and


(2) it returns all relationships in rel, i.e. both :SIMILAR_TO and :CONNECTED_TO.


What I'm trying to do is something like:

MATCH (i:Item)-->(j:Item)
WHERE id(i) IN $item_ids
AND id(q) IN $item_ids

// Then collapse these nodes
MATCH (i)-[s:SIMILAR_TO]->(j)
WHERE s.value < $some_number
WITH i + collect(j) AS identical
CALL apoc.nodes.collapse(identical, { properties: 'combine' })
YIELD from, rel, to
RETURN from, rel, to

// Then return the results in this format
MATCH (from)-[rel:CONNECTED_TO]->(to)

As I said, so far I've been able to do it by breaking it down into three separate queries and passing the results from one to the next one in Ruby. It works, but it's not the cleanest piece of code.

I'm sure there's a nice way to do it in Cypher and I hope you'll be able to help me. Thanks in advance!