If I have a bunch of nodes identified by a MATCH, how do I get the relationships between those nodes and only those nodes? The Neo4j browser shows this to me when it shows the nodes matched by a query, but I've been unable to duplicate it.
I thought the following would work, but it gives me no results.
MATCH (rt:Something)
WITH rt,collect(rt) as rtc
MATCH (rt)-[r]-(rt2:Something)
WHERE rt2 in rtc
return r
I can issue the first line MATCH in the browser to see the set of nodes that I'm after. There are 7 nodes matched, and they are fully and directly interconnected by one type of relationship - which the browser shows me. So if the nodes are 1 through 7, 1 has one relationship with each of 2 through 7, 2 has one relationship with 1 and 3 through 7, etc. I can see the exact structure that I expect, but I cannot come up with a query that will return the relationships between those 7 nodes (and only those nodes).
Well, here's the seven nodes connected by relationships, for whatever value that provides. The nodes are returned by the initial MATCH. The browser fills in the relationships.
The node type is RecordTopology and the relationship is CONNECTED_TO.
I think I'm missing a big piece of the equation here...
...you just did.
It looks like you're simplifying the problem, to make it easier for us to help you (THANK YOU!). However, in this case, I think the complexity of your graph, and how to isolate those "7 nodes" is the real problem you need to solve.
Could you give me a little more context, and maybe some of the data on those nodes, and more nodes?
MATCH p=(:RecordTopology)-[]-(:RecordTopology)
RETURN p LIMIT 120
No, I have a query that returns the nodes. I don't have a means of referring to the relationships so that I can modify their attributes. That's the reason I'm here.
Could you elaborate on that? Why does the manner in which nodes are located have anything to do with obtaining additional information about them?
It's a bunch of RecordTopology nodes that are heavily interconnected by CONNECTED_TO relationships.
Unwinding that starting set is the same as not collecting them in the first place, but since it looks like you have a collection to work with, I'll show both methods:
MATCH (n:Topo) WITH n LIMIT 7 WITH collect(n) as startingSet
UNWIND startingSet AS n RETURN n
...will give you the exact same data to work with as...
MATCH (n:Topo) WITH n LIMIT 7
Get the relationships
Hang on a minute... this sounds like what you're running into... lets take a closer look at the graph...
... ah-ha... Topos 0 through 6 don't link to eachother at all, but most of them link to 7 and 8, so let's use a more specific subset:
MATCH (n:Topo)
WHERE id(n) IN [1,2,3,4,7,8,19]
MATCH (n)-[r:TO]-()
RETURN r
Thank you a ton for going through the work to figure this out. It's much appreciated.
This is what I ended up going with:
MATCH <something that produces an n>
WITH collect(id(n)) AS c
MATCH (n:Topo) WHERE id(n) IN c
MATCH (n2:Topo) WHERE id(n2) IN c
MATCH (n)-[r:TO]-(n2)
RETURN r
I'm a little disappointed that Cypher doesn't have a more natural way of referring to an element of a pattern match (n) twice. But I probably just don't understand how to Cypher well enough yet.
Yes, that's a much cleaner notation and it works just fine. Thank you. The resulting performance of the query is unchanged.
I noted that when I PROFILE each style of query (mine with node labels and your queries without them),
MATCH (n)-[r:TO]-(n2)
will generate 344 dbhits while
MATCH (n:Topo)-[r:TO]-(n2:Topo)
will generate 408 dbhits.
That sure looks like the unlabeled query provides better performance - which I find somewhat counterintuitive. I would have thought that providing more details about what I'm after would always be better.
That's because when you are using node ID's you are getting the node directly. When you add a label there in the query the query engine needs to do one extra check to see if the node returned by id has the label you specified. That's why you see increased db hits.
For index lookup's having label is mandatory. In traversals if you know the relationship traversal identifies the node distinctly not adding label makes query faster.
Also, if you have properties on those nodes that you are using to isolate the specific nodes you're after, then you should add them to an index, and use those properties in your where clause. Should speed things up a bit.