My goal is to create two paths, extract the nodes and relationships from each path and then use the nodes and relationships from both paths to do more discovery.
All is well until I uncomment the second MATCH (see my code snippet). With the code commented as you see it below, p1 returns the correct number of nodes and relationships. But when the second MATCH is introduced, the result of RETURN p1 is then incorrect (fewer nodes and relationships are returned). RETURN
My reading suggests that this should work? Can anyone explain why creating p2 (the second MATCH) impacts p1?
WITH ['A19','38K'] as EqpGroup
MATCH p1 = (f1:Flight)-[e1:Edge]->(f2:Flight)-[e2:Edge*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(p1) WHERE (fx.seg_Eqp IN EqpGroup))
WITH p1, EqpGroup
//*******
//MATCH p2 = (f3:Flight)-[e3:Edge]->(f4:Flight)-[e4:Edge*1..3]->(fy:Flight)
//WHERE f3.seg_Num = 112
//AND f4.seg_Num = 6313
//AND ALL(fy IN nodes(p2) WHERE (fy.seg_Eqp IN EqpGroup))
//WITH p1, p2
//*******
RETURN p1
You left out how you are collecting the nodes. Can you provide that?
The one issue I have with this query is that you are forming a Cartesian product between the p1 and p2 results, as the second match is not correlated with the first match.
UNION is a common way of writing a query like this. I need to know how you want to results in order to recommend a solution.
The network in this case are the possible paths which two types of aircraft can follow from city to city. Flights are joined by edges representing time on ground between flights. As you can see, the first and second flight for each path is pre-determined and causes each path to be somewhat distinct.
The query below appears to correctly return nodes which are common to both pA and pB. Does this appear to be better formed?
My goal is to discover ground events where fleet types can be interchanged. A feasible interchange point should exist if any pair of common nodes each have at least two inbound edges - one edge from a node in pA and one edge from a node in pB.
I will collect relationships next, but I am not sure how to use those relationships to query for the desired pattern described above. Any guidance would be appreciated.
WITH ['A19','38K'] as EqpGroup
//
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup)) //Filters the nodes in the path
WITH collect(Nodes(pA)) AS pathA, EqpGroup
UNWIND pathA as pathpairsA
UNWIND pathpairsA as NodesA //19 distinct nodes collected from P1
//
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 112
AND f4.seg_Num = 6313
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
WITH NodesA, collect(Nodes(pB)) AS pathB
UNWIND pathB as pathpairsB
UNWIND pathpairsB as NodesB
//
WITH NodesA, NodesB
WHERE NodesA = NodesB
RETURN DISTINCT(NodesA)
You changed the approach, but it did not remove the cartesian product issue. This results from you passing all the nodes in NodesA in individual rows. This will cause the cartesian product when following match is executed that is not correlated with the nodes passed to it.
Note: The process of collecting Nodes(Pa) followed by an unwind, results in Nodes(pA); thus it is doesn't make sense to collect something then unwind the result.
Here is a potential solution:
WITH ['A19','38K'] as EqpGroup
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup))
UNWIND Nodes(pA) as allPathNodesA
WITH EqpGroup, COLLECT(DISTINCT allPathNodesA) as NodesA
//
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 112
AND f4.seg_Num = 6313
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
UNWIND Nodes(pB) as allPathNodesB
WITH NodesA, COLLECT(DISTINCT allPathNodesB) as NodesB
//
RETURN [i in NodesA where i in NodesB] as commonPathNodes
One issue with the above approach is that it will return no results when either match does not return a result. If you want an empty set returned in this case, you can use the COLLECT subquery. Here is one solution.
WITH ['A19','38K'] as EqpGroup
WITH COLLECT {
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup))
UNWIND Nodes(pA) as allPathNodesA
RETURN DISTINCT allPathNodesA
} as NodesA,
COLLECT {
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 112
AND f4.seg_Num = 6313
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
UNWIND Nodes(pB) as allPathNodesB
RETURN DISTINCT allPathNodesB
} as NodesB
RETURN [i in NodesA where i in NodesB] as commonPathNodes
Here is the same approach using CALL subqueries instead of COLLECT subqueries:
WITH ['A19','38K'] as EqpGroup
CALL (EqpGroup) {
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup))
UNWIND Nodes(pA) as allPathNodesA
RETURN COLLECT(DISTINCT allPathNodesA) as NodesA
}
CALL (EqpGroup) {
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 112
AND f4.seg_Num = 6313
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
UNWIND Nodes(pB) as allPathNodesB
RETURN COLLECT(DISTINCT allPathNodesB) as NodesB
}
RETURN [i in NodesA where i in NodesB] as commonPathNodes
Very helpful sir. I understand my error. Of your three suggestions, which would you favor as I now involve the relationships associated to these common nodes?
The last dragon I am trying to slay involves creating and examining the relationships. To preserve my self-esteem, I won't paste my attempt. Seems simple, but I am not getting it.
Now that the common nodes are discovered, any set of common nodes joined by the relationships below would represent a feasible location in the network for two aircraft to mutually swap between Path A and Path B. All of these relationships would need to be true. No need for an edge directly between (N1 and N2) or (N3 and N4):
I don’t understand your data model. Don’t flights go between airports or bases? Would it be common airports along both paths where flights could switch?
Isn’t a flight unique for the day and it is defined by where it starts, where it ends, and what Tim it takes off.
Can you give me a little more clarity? Give me what you got. It may help and don’t worry about shame. We all start from the same place.
Because of complex business rules which constrain the swap-ability across aircraft types (time on ground, aircraft capacity, route capability, etc.), I chose to pre-construct the network with tools other than Neo. The vertices and edges which I have imported into Neo represent the network of possible flight paths for two distinct aircraft types. The first two nodes of each path force a swap between types (at a feasible convergence point). The goal is to find the next location in the network where the two fleet types converge again (an opportunity to undo the swap).
Your suggestion #3 works well and very quickly outputs the nodes common to both networks. In the two paths below 10K nodes and 150K relationships (three operating days for two fleet types) returns 4 common nodes.
If I can join the set of common nodes to any relationship from either pA or pB where the start node and end node are both matched in the set of common nodes, the resulting pattern will determine if the edges represent a feasible location to un-swap.
I haven't been successful with my attempts to unwind RelsA and RelsB and join those relationships to the commonNodes (when the start node and end node are both in the set of commonNodes).
Appreciate your thoughts...
WITH ['A19','38K'] as EqpGroup
CALL (EqpGroup) {
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 6312
AND f2.seg_Num = 7831
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup))
UNWIND Nodes(pA) as allPathNodesA
UNWIND Relationships(pA) as allRelsA
RETURN COLLECT(DISTINCT allPathNodesA) as NodesA, COLLECT(DISTINCT allRelsA) as RelsA
}
CALL (EqpGroup) {
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 112
AND f4.seg_Num = 6313
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
UNWIND Nodes(pB) as allPathNodesB
UNWIND Relationships(pB) as allRelsB
RETURN COLLECT(DISTINCT allPathNodesB) as NodesB, COLLECT(DISTINCT allRelsB) as RelsB
}
WITH [i in NodesA where i in NodesB] as commonPathNodes, RelsA, RelsB
UNWIND commonPathNodes AS individualNodes
WITH individualNodes, RelsA, RelsB
CREATE (newNodes:commonFLT {flightID: individualNodes.seg_Num})
WITH COLLECT(newNodes) AS commonNodes, RelsA, RelsB
//
// Next Steps
// 1. Unwind relationships
// 2. Union relationships from pA and pB to create a single list of relationships?
// 3. Join all relationships which include only commonNodes to the commonNodes
I created a simple example which includes the pattern of nodes and relationships which I would like to return to the user. The query below generates the depicted network.
After more thought, I believe the problem can be solved in the context of a single path - if I can successfully MATCH and return only those groups of nodes (there will always be four) which are connected exactly as 3A, 4A, 7B and 8B below.
Chip
WITH ['A19','38K'] as EqpGroup
MATCH path = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = "1A" OR f1.seg_Num = "5B"
AND f2.seg_Num = "2A" OR f2.seg_Num = "6B"
AND ALL(fx IN nodes(path) WHERE (fx.seg_Eqp IN EqpGroup))
RETURN path
Correct. The query below seems to generate the desired result (also attached).
In this prototype, a flight node has two properties. A distinct id number (seg_Num) and a given equipment type (seg_Eqp). As mentioned, we are identifying opportunities to swap the type of aircraft (seg_Eqp). To do so, there must be a mutual zig-zag of no more than one hop.
Is it correct that all nodes from the first MATCH statement (including f1 and fx would be evaluated in the "fy" position found in the second MATCH statement?
Any better way to achieve this desired result?
WITH ['A19','38K'] as EqpGroup
//
MATCH path1 = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = "1A" OR f1.seg_Num = "5B"
AND f2.seg_Num = "2A" OR f2.seg_Num = "6B"
AND ALL(fx IN nodes(path1) WHERE (fx.seg_Eqp IN EqpGroup))
WITH path1
MATCH path2 = (f2:Flight)-[]->(fx:Flight)<-[]-(fy:Flight)
WHERE f2.seg_Eqp = fx.seg_Eqp
AND fy.seg_Eqp <> fx.seg_Eqp
RETURN path2
You need to be careful when visualizing your results in desktop, as the relationships are automatically should between any nodes returned. As such, it is not necessarily true that path2 contains all those rendered relationships. There is a setting to turn off this behavior. I would do so and return the query. It is under settings when in the editor. The setting is "Connect result nodes".
It looks like we can ignore common nodes and you want two nodes, one in each path, that are related to the same two nodes, one from each path. Is that correct?
Yes. Exactly. Like the four-node pattern in my last post (involving 3A, 4A, 7B, 8B).
Also, when I configured desktop to not "connect result nodes" I determined that my single path query may not be reliable. So, I returned to the dual path approach.
The RETURN statement below is added for validation only. Everything up to the RETURN statement works well and fast. We extract those nodes which are common to both paths and then create a group of new nodes from those. All good.
If I can simply join the commonPathnodes to the relationships which are collected from pA and pB, this may actually end well. This will allow me to see the relationships and identify the pattern we discuss above.
The issue now is that RelsAB seems to be a list rather than a relationship object
If it were not for your help, I would likely abandon ship and go back to my old ways. I very much appreciate your guidance.
WITH ['A05','38K'] as EqpGroup
CALL (EqpGroup) {
MATCH pA = (f1:Flight)-[]->(f2:Flight)-[r1*1..3]->(fx:Flight)
WHERE f1.seg_Num = 1141
AND f2.seg_Num = 2009
AND ALL(fx IN nodes(pA) WHERE (fx.seg_Eqp IN EqpGroup))
UNWIND Nodes(pA) as allPathNodesA
UNWIND Relationships(pA) as allRelsA
RETURN COLLECT(DISTINCT allPathNodesA) as NodesA, COLLECT(DISTINCT allRelsA) as RelsA
}
CALL (EqpGroup) {
MATCH pB = (f3:Flight)-[]->(f4:Flight)-[r2*1..3]->(fy:Flight)
WHERE f3.seg_Num = 2008
AND f4.seg_Num = 5128
AND ALL(fy IN nodes(pB) WHERE (fy.seg_Eqp IN EqpGroup))
UNWIND Nodes(pB) as allPathNodesB
UNWIND Relationships(pB) as allRelsB
RETURN COLLECT(DISTINCT allPathNodesB) as NodesB, COLLECT(DISTINCT allRelsB) as RelsB
}
WITH [i in NodesA where i in NodesB] as commonPathNodes, RelsA, RelsB
WITH RelsA + RelsB as RelsAB, commonPathNodes
UNWIND commonPathNodes AS individualNodes
WITH individualNodes, RelsAB
CREATE (newNodes:FLT {NewID: individualNodes.seg_Num})
WITH newNodes, RelsAB
MATCH (newNodes:FLT)
RETURN newNodes //Correctly returns the commonPathNodes
//Here is what I am trying to do but with no success......
//WHERE newNodes.NewID IN [RelsAB.startNodeId, RelsAB.endNodeId]
//WITH RelsAB, Collect(newNodes) as nodes
//MERGE (start:Node {id: RelsAB.startNodeId})
//MERGE (end:Node {id: RelsAB.endNodeId})
//MERGE (start)-[r:Edge]->(end)
Sorry, I still don't fully understand. You illustration of the four nodes (two from each path, with crossover relationships) does not depict commonality between the nodes.
In the query you are finding nodes that are identical and are on both pathA and pathB. This means you have graphs similar to the ones depicted in my diagram. The difference between the rows are possibilities as the length of the variable length relationship transition from 1 to 3 hops (note: I only illustrated one possibility for the case of length 3). Am I viewing this correctly?
Sorry if I am creating confusion. Let's try a different approach.
There are 18K distinct FLIGHTS in the database. The query below requires all hops to begin from two specific root nodes. The relationships join FLIGHTs operated by either of the EqpGroup aircraft types.
The resulting network (pathCombined) contains only 300 of the 18K FLIGHT nodes. All good.
Here is where I am struggling. How do I use only the 300 nodes and relationships returned in pathCombined to find this pattern in pathCombined? fx, fy, fz and fm are distinct FLIGHTS.
WITH
['19W', '38K'] as EqpGroup,
[11369, 3455] as RootNodes
//
MATCH pathCombined = (fx:Flight)-[*1..5]->(fy:Flight)
WHERE ALL(node IN nodes(pathCombined) WHERE (node.seg_Eqp IN EqpGroup))
AND fx.seg_Num IN RootNodes
...
...
The query switch at little. How do you get 300 distinct nodes from pathCombined, as the path can have a maximum of 6 nodes. Do you get multiple paths, where the total nodes included in all the paths is 300 nodes?
Are you looking for that pattern to exist somewhere along a given pathCombined? If so, that will not happen, as the pathCombined consists only of outgoing relationships, while you pattern has both in and outgoing relationships.
Right. Each of the paths which begin at the two root nodes have many different branches. These branches contain 300 connected, distinct nodes (out of the entire population of 18K).
The pattern I am trying to express in cypher is this:
And, when I run the query below on the entire population of 18K nodes I get many such crossed patterns. So it does work. I have validated that the patterns I am returning are indeed correct.
I simply need to understand how I MATCH this pattern using the 300 nodes and their relationships which are returned by the previously discussed 5 hop query, Can the nodes and relationships be unwound and then those lists somehow used in the below MATCH?
WITH
['19W', '38K'] as EqpGroup,
//
MATCH pathCombined = (fx:Flight)-->(fy:Flight)<--(fz:Flight)-->(fm:Flight)<--(fx:Flight)
WHERE ALL(node IN nodes(pathCombined) WHERE (node.seg_Eqp IN EqpGroup))
AND fz.seg_Eqp <> fm.seg_Eqp
AND fx.seg_Eqp <> fy.seg_Eqp
AND fz.seg_Eqp = fy.seg_Eqp
AND fx.seg_Eqp = fm.seg_Eqp
AND fx.seg_Num <> fm.seg_Num
AND fx.seg_Num <> fy.seg_Num
AND fz.seg_Num <> fm.seg_Num
AND fz.seg_Num <> fy.seg_Num
RETURN DISTINCT
"fz " + fz.seg_Num + " -> fy " + fy.seg_Num + " " + fz.seg_Eqp + " -> " + fy.seg_Eqp + "
fz " + fz.seg_Num + " -> fm " + fm.seg_Num + " " + fz.seg_Eqp + " -> " + fm.seg_Eqp + "
fx " + fx.seg_Num + " -> fy " + fy.seg_Num + " " + fx.seg_Eqp + " -> " + fy.seg_Eqp + "
fx " + fx.seg_Num + " -> fm " + fm.seg_Num + " " + fx.seg_Eqp + " -> " + fm.seg_Eqp
RETURN pathCombined from the above returns many patterns in the 18K population of flights connected exactly as I define in the MATCH statement. Here is a snippet...