Query Syntax to avoid duplicate results

I am chasing a pattern of data like this:

I can write the query for it pretty easily with the following:

MATCH (from:Tank)<-[:DRAINS]-(l: Transfer)-[:FEEDS]->(to: Tank)
MATCH (l)-[:IS]->(o:Code)
WHERE from.community_id = 1043
RETURN from, to, l, o

But when I try to aggregate on the data, because tank 3063 in the center is both a from and a to tank, I am getting duplicate results for it in any of my aggregations. I try to WITH DISTINCT it back to one, but because it is connected to two legs it is coming back twice still. Is there something obviously wrong with the query I have written? I feel like I have tried a hundred different versions at this point...

Here is the most recent attempt at a query:

MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1043
WITH DISTINCT COLLECT(from) + COLLECT(to) as tanks, l
MATCH (l)-[:IS]->(o:Code)
UNWIND tanks as tank
WITH sum(tank.flow) AS netFlow, sum(o.weight) as codeWeight, tank.community_id as community, 
RETURN community, netFlow, codeWeight

I appreciate any second set of eyes looking at my query!

I'm not entirely sure what you want (or what you are getting that isn't correct).
could you provide a small example dataset (CREATE statements), and the text of desired output?

Try this:

MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1043
MATCH (l)-[:IS]->(o:Code)

MATCH (from1:Tank)<-[:DRAINS]-(l1:Transfer))-[:FEEDS]->(to)
WHERE from1.community_id <> from.community_id
MATCH (l1)-[:IS]->(o)

WITH COLLECT(DISTINCT from) + COLLECT(DISTINCT from1) as tanks, o
UNWIND tanks as tank

WITH sum(tank.flow) AS netFlow, sum(o.weight) as codeWeight, tank.community_id as community 
RETURN community, netFlow, codeWeight
1 Like

This will create the sub-graph of data I am working with (There will actually be tons of these little subgraphs identified by the community_id, I am just working with one as an example)

CREATE (:Tank {name: 6016, flow: -150, community_id: 1})<-[:DRAINS]-(l1:Transfer {name: 5, community_id:1})-[:FEEDS]->(:Tank {name: 3063, flow: 200, community_id: 1})<-[:FEEDS]-(l2: Transfer {name: 3, community_id: 1})-[:DRAINS]-> (:Tank {name: 3013, flow: -50, community_id: 1}), (l1) -[:IS]->(o:Code {name: 6206, weight:100})<-[:IS]-(l2)

My desired output is the following

╒═══════════╤═════════════════════════════╤═════════════════╤══════════════╤═════════════════════════════╕
│"community"│"equipment"                   │"netFlow"         │"opCodeWeight"│"opcodes"                     │
╞═══════════╪═════════════════════════════╪═════════════════╪══════════════╪═════════════════════════════╡
│1043       │["3013","3063","6016"].        │0.0              │100           │["6206"]                      │
└───────────┴─────────────────────────────┴─────────────────┴──────────────┴─────────────────────────────┘

This is what I am currently getting:

╒═══════════╤═════════════════════════════╤═════════════════╤══════════════╤═════════════════════════════╕
│"community"│"equipment"                   │"netFlow"         │"opCodeWeight"│"opcodes"                     │
╞═══════════╪═════════════════════════════╪═════════════════╪══════════════╪═════════════════════════════╡
│1043       │["3013","3063","6016","3063"] │-100              │400           │["6206","6206","6206","6206"] │
└───────────┴─────────────────────────────┴─────────────────┴──────────────┴─────────────────────────────┘

also I don't know if there is an easier way to put a table in here with mark up, but Geeze formatting that correctly (even after copying it out of neo4j) sucked... lol

Thanks for any help!

This one fails because "tanks" contains: ["3013","6016","6016","3013"] and I end up with duplicates, I essentially need the "tanks" collection to include ["3013", "3063", "6016"]

EDIT:
Also, this part WHERE from1.community_id <> from.community_id breaks it too because all three tanks and both legs should ALL have the same community ID, I am identifying them as a group with gds.wcc.write and that show I am grouping on them to identify the flow of the whole group.

It's not easier, but to get a clean table I export to csv, then copy paste from the app I open the csv with (e.g. libreoffice, vi)

1 Like

I'm pretty sure I still don't understand what we're doing here, but I imagine you could rapidly narrow the search space by also adding group id to the query, so you only consider the group you are working with. Here is a first attempt, I didn't know why there was a second path search out to code so I just dropped it.

MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)
WHERE from.community_id = 1
MATCH (l)-[:IS]->(o:Code)
WITH o, collect(distinct from)+collect(distinct to) as tanks
UNWIND tanks as t
return t.community_id as community, collect(distinct t.name) as equipment, sum(t.flow) as netFlow, sum(o.weight) as opCodeWeight, collect(distinct o.name) as opcodes

output

community     equipment         netFlow opCodeWeight    opcodes
1             [6016,3013,3063]  0       300             [6206]

Or from the longer path.

MATCH (from:Tank)<-[:DRAINS]-(l:Transfer)-[:FEEDS]->(to: Tank)<-[:FEEDS]-(l1:Transfer)-[:DRAINS]->(from1:Tank)
WHERE from.community_id = 1
MATCH (l)-[:IS]->(o:Code)
WITH o, collect(distinct from)+collect(distinct to)+collect(from1) as tanks
UNWIND tanks as t
return t.community_id as community, collect(distinct t.name) as equipment, sum(t.flow) as netFlow, sum(o.weight) as opCodeWeight, collect(distinct o.name) as opcodes

1 [6016, 3013, 3063] -200 500 [6206]

1 Like

That looks like its getting what I need!

Essentially, I am grouping all connected Tanks via their bridge Leg nodes. Each Leg is associated with a Code, that is weighted. I am using the Weakly Connected Communities algorithm to identify each of the groups with an ID, and then using that ID to group them in the end output query.

I need to look at each community, and get the sum of community flow, each Tank in the community, and the weight associated with the leg Code. I need to identify flows that are either too high or too low as my end output.

Try this:

match (l:Transfer)-[]-(a:Tank)
with collect(distinct a.name) as a1  
match (l1:Transfer)-[]-(b:Tank)
where b.name in a1
match (l1)-[]-(o:Code)

with distinct b.community_id as cid, sum( distinct b.flow) AS netFlow, sum(distinct o.weight) as codeWeight, a1, o.name as opcode
return cid, a1, netFlow, codeWeight, opcode

Result:

1 Like

One minor change and we have a winner!

The only issue was when you had more items in the graph, it wasn't grouping each subgraph by the community_id. So I added a.commiunity_id to the first WITH statement and it worked perfectly

match (l:Leg)-[]-(a:Tank)
with collect(distinct a.name) as a1, a.community_id as cid  
match (l:Leg)-[]-(b:Tank)
where b.name in a1
match (l)-[]-(o:Code)

with distinct b.community_id as cid, sum( distinct b.flow) AS netFlow, sum(distinct o.weight) as codeWeight, a1, COLLECT(DISTINCT o.name) as opcode
return cid, a1, netFlow, codeWeight, opcode

I adjusted it so that each leg has a different code as well and it still outputs as expected!

Thank you all so much for the help!