I can write the query for it pretty easily with the following:
MATCH (from:Tank)<-[:DRAINS]-(l: Transfer)-[:FEEDS]->(to: Tank)
MATCH (l)-[:IS]->(o:Code)
WHERE from.community_id = 1043
RETURN from, to, l, o
But when I try to aggregate on the data, because tank 3063 in the center is both a from and a to tank, I am getting duplicate results for it in any of my aggregations. I try to WITH DISTINCT it back to one, but because it is connected to two legs it is coming back twice still. Is there something obviously wrong with the query I have written? I feel like I have tried a hundred different versions at this point...
Here is the most recent attempt at a query:
MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1043
WITH DISTINCT COLLECT(from) + COLLECT(to) as tanks, l
MATCH (l)-[:IS]->(o:Code)
UNWIND tanks as tank
WITH sum(tank.flow) AS netFlow, sum(o.weight) as codeWeight, tank.community_id as community,
RETURN community, netFlow, codeWeight
I appreciate any second set of eyes looking at my query!
I'm not entirely sure what you want (or what you are getting that isn't correct).
could you provide a small example dataset (CREATE statements), and the text of desired output?
Try this:
MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1043
MATCH (l)-[:IS]->(o:Code)
MATCH (from1:Tank)<-[:DRAINS]-(l1:Transfer))-[:FEEDS]->(to)
WHERE from1.community_id <> from.community_id
MATCH (l1)-[:IS]->(o)
WITH COLLECT(DISTINCT from) + COLLECT(DISTINCT from1) as tanks, o
UNWIND tanks as tank
WITH sum(tank.flow) AS netFlow, sum(o.weight) as codeWeight, tank.community_id as community
RETURN community, netFlow, codeWeight
This will create the sub-graph of data I am working with (There will actually be tons of these little subgraphs identified by the community_id, I am just working with one as an example)
also I don't know if there is an easier way to put a table in here with mark up, but Geeze formatting that correctly (even after copying it out of neo4j) sucked... lol
This one fails because "tanks" contains: ["3013","6016","6016","3013"] and I end up with duplicates, I essentially need the "tanks" collection to include ["3013", "3063", "6016"]
EDIT:
Also, this part WHERE from1.community_id <> from.community_id breaks it too because all three tanks and both legs should ALL have the same community ID, I am identifying them as a group with gds.wcc.write and that show I am grouping on them to identify the flow of the whole group.
I'm pretty sure I still don't understand what we're doing here, but I imagine you could rapidly narrow the search space by also adding group id to the query, so you only consider the group you are working with. Here is a first attempt, I didn't know why there was a second path search out to code so I just dropped it.
MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1
MATCH (l)-[:IS]->(o:Code)
WITH o, collect(distinct from)+collect(distinct to) as tanks
UNWIND tanks as t
return t.community_id as community, collect(distinct t.name) as equipment, sum(t.flow) as netFlow, sum(o.weight) as opCodeWeight, collect(distinct o.name) as opcodes
output
community equipment netFlow opCodeWeight opcodes
1 [6016,3013,3063] 0 300 [6206]
Or from the longer path.
MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)<-[:FEEDS]-(l1:Transfer)-[:DRAINS]->(from1:Tank)
WHERE from.community_id = 1
MATCH (l)-[:IS]->(o:Code)
WITH o, collect(distinct from)+collect(distinct to)+collect(from1) as tanks
UNWIND tanks as t
return t.community_id as community, collect(distinct t.name) as equipment, sum(t.flow) as netFlow, sum(o.weight) as opCodeWeight, collect(distinct o.name) as opcodes
Essentially, I am grouping all connected Tanks via their bridge Leg nodes. Each Leg is associated with a Code, that is weighted. I am using the Weakly Connected Communities algorithm to identify each of the groups with an ID, and then using that ID to group them in the end output query.
I need to look at each community, and get the sum of community flow, each Tank in the community, and the weight associated with the leg Code. I need to identify flows that are either too high or too low as my end output.
Try this:
match (l:Transfer)-[]-(a:Tank)
with collect(distinct a.name) as a1
match (l1:Transfer)-[]-(b:Tank)
where b.name in a1
match (l1)-[]-(o:Code)
with distinct b.community_id as cid, sum( distinct b.flow) AS netFlow, sum(distinct o.weight) as codeWeight, a1, o.name as opcode
return cid, a1, netFlow, codeWeight, opcode
The only issue was when you had more items in the graph, it wasn't grouping each subgraph by the community_id. So I added a.commiunity_id to the first WITH statement and it worked perfectly
match (l:Leg)-[]-(a:Tank)
with collect(distinct a.name) as a1, a.community_id as cid
match (l:Leg)-[]-(b:Tank)
where b.name in a1
match (l)-[]-(o:Code)
with distinct b.community_id as cid, sum( distinct b.flow) AS netFlow, sum(distinct o.weight) as codeWeight, a1, COLLECT(DISTINCT o.name) as opcode
return cid, a1, netFlow, codeWeight, opcode
I adjusted it so that each leg has a different code as well and it still outputs as expected!