How to Choose Relationship with Lowest Property Value

Hello,

How do I choose the relationship with the lowest property rule value between a set of nodes?

Say this is my graph schema:

For the Person node, how can I tune my cypher query so that it traverses through the PART_OF_LAW3 relationship type which contains the lowest property value stored in the law_number property type?

I tried the following:

MATCH(p:Person)-[:CONTAINS]->(Attribute)-[part_rel]->(Law_Num)
WITH COLLECT(part_rel.law_number) AS laws, p
MATCH(p)-[:CONTAINS]->(Attribute)-[part_rel]->(Law_Num)
WHERE part_rel.la_number=min(laws)
RETURN p AS Person, Law_Num

Any help would be greatly appreciated.

Thank you

I would try sorting the results in an ascending order by the value of the relationship, and limiting by 1 to only get the lowest value.
Something like:

MATCH (p:Person)-[:CONTAINS]->(Attribute)-[part_rel]->(Law_Num)
WITH p as Person, Law_Num as Law, part_rel.law_number as relation_value
ORDER BY relation_value ASC
RETURN Person, Law LIMIT 1

Hi @SiAs ,

Thank you for your input. Your method works!

Now, the thing is, I have 100,000 Person nodes and 200,000 laws. What if I wanted to compute the unique relationship between Person and Law nodes where I only need to traverse the lowest law number? I also have indeces on person_id, attribute_x, law_number on the property (New Feature in Neo4j for relationship indeces), and Law_X.law_id.

I'm currently testing this query with 20 Person nodes and it's taking several hours since this has to compute the unique relationship between 20 Person nodes to 200,000 rule values. Therefore 20x200,000 comes out to 400,000 computations.

If I wanted to do this for all 100,000 Person nodes, then I end up with 20,000,000,000 computations (100,000 Person nodes times 200,000 law nodes).

I slightly modified my query to this:

MATCH(p:Person)-[:CONTAINS]->(Attribute)-[part_rel]->(Law_Num)
WITH min(part_rel.law_number) AS minlaw, p, Attribute, Law_num
MATCH(p)-[:CONTAINS]->(Attribute)-[part_rel]->(Law_Num)
WHERE part_rel.la_number=minlaw
RETURN *

Is it possible to make this query faster?

There is an APOC procedure, unsurprisingly, to help with this type of query. For some reason Cypher has this odd behavior that even when you specify the lowest law number in the PART_OF_LAW property, it will still display additional rows that contain different fields on other columns. This article written by @andrew_bowman further explains the problem I'm seeing with Cypher:

This query can now be written as:

MATCH(p:Person)-[:CONTAINS]->(Attributes)-[part_rel]->(Laws)
WITH p, apoc.minItems(Attributes.attribute_field, part_rel.law_number) AS minData, Law
RETURN p.name, minData.items AS Attribute, minData.value AS min_law, Law.title

This gives me the output I need, but I have two questions:

  1. Is it possible to include additional items in apoc.minItems()? For instance, in my graph all the Attribute nodes also have a category, but if I include the attribute.category in the RETURN statement, the apoc function does not return the lowest law number any more.

  2. What if instead of choosing the lowest law number between Attribute nodes and Law nodes, I want to return all of the relationships that EXCLUDE the lowest law number?

Thank you

For 1, if you want to keep additional properties per item (and not as part of the grouping key), then you can include it in some structure as the first parameter to the minItems() function. If you want multiple properties from the same node, you can use map projection to select them:

...
WITH p, apoc.minItems(Attributes {.attribute_field, .category}, part_rel.law_number) AS minData, Law
...

Alternately you can use your own structure, whether a map or a list:

...
WITH p, apoc.minItems({attribute:Attributes.attribute_field, category:Attributes.category}, part_rel.law_number) AS minData, Law
...

For 2, you will need to find the lowest law number first, so you could use a regular min() aggregation, or you could use minItems() to find the part_rel relationships that have the lowest law number, then re-MATCH to your pattern WHERE part_rel NOT IN lowestLawRels.

Thank you very much! I was struggling with this, but this makes a lot of sense. I'll chew on this immediately and I'll report my findings here.

Hi @andrew_bowman ,

This query is working as expected when I anchor the :Person node with the unique person ID and the :Law node with the unique law_id. This returns the relationship with the lowest law number property between the Attribute nodes and the Law nodes.

But when I try to generalize the query to all 300 Person from a specific location to all Law nodes, it does not seem to choose the lowest law_number any more.

How can I generalize this query so that it works on every single Person node? Let's say I want to find all of the applicable Law nodes based on the lowest law_number to the Person nodes that live in Smallville. I tried this query but it's not working:

MATCH(t:Town {name: "smallville"})<-[:LIVES_IN]-(p:Person)-[:CONTAINS]->(Attributes)-[part_rel]->(Laws)
WITH p, apoc.minItemsAttributes {.attribute_field, .category}, part_rel.law_number) AS minData, Law
RETURN
p.name,
minData.items AS Attribute,
minData.value AS min_law,
Law.title

If I were doing this in Python, I would loop through every Person ID and then a second loop to loop through the Law nodes, and at every inner loop, I would execute the cypher query above that is proven to work for a single person mapped to a single law. This is a Big O(n^2) operation so it's not too bad, but I was still wondering if this could be done directly in Cypher.

Thank you for all of your help.

What's happening under the hood will be similar, but Cypher is declarative, not imperative. It's important to understand that most Cypher operations (such as matches and such) execute per row, and yield rows. So your initial MATCH to a p:Person node will generate a row per path, extracting the p variable, so subsequent operations, since they execute per row, would effectively execute per person.

We could improve the query a bit with subqueries such that further work happens scoped per person, that works more like your looping approach, and can also keep the aggregation cheap, so instead of doing a large aggregation across the full result set from your initial MATCH, it would be doing many small aggregations, one per person. We can look at that a bit later, that's an optimization, but we need to address the issue of bad results.

The problem is with your grouping key. The grouping key is the context for the aggregation, and when we perform an aggregation (such as max(), collect(), count(), and any of the APOC agg functions, such as apoc.agg.minItems()), the non-aggregation terms form the grouping key. Your issue is that Law (or Laws, there's some inconsistency so the variables you're using) is present when you aggregate, so it's part of the grouping key. It's calculating the minItems() per combo of person / law, which is not what you want. You need to remove Law from your WITH clause when you aggregate, then the minItems() aggregation will be grouped per person.

1 Like

Yes, thank you. It's working. How can I proceed with the subquery approach to optimize my query? In reality I'm working with far more Person nodes than 300 which all need to be mapped (per the lowest law number) to thousands of laws.

Any help would be greatly appreciated.

Sure, here's the same approach using subqueries to scope the work per-person:

MATCH (:Town {name: "smallville"})<-[:LIVES_IN]-(p:Person)
CALL {
 WITH p
 MATCH (p)-[:CONTAINS]->(Attribute)-[part_rel]->(Law)
 WITH apoc.agg.minItems(Attribute, part_rel.law_number) AS minData
 WITH [item IN minData.items | item {.attribute_field, .category}] as attributes, minData.value as min_law
 RETURN attributes, min_law
}
RETURN p.name as person, attributes, min_law

The subquery call executes per row, and the prior MATCH ensures a person who lives in smallville is on each row.

This means all the subquery work happens per-person, even if we drop the p out of scope, the context of the subquery is still for an individual person, so our aggregation is implicitly finding he attribute associated with the minimum law_number per person.

We can delay the property projection from Attribute, since we should only apply it to the results of the aggregation instead of unnecessarily for all attribute nodes since most will be filtered out anyway. We apply the projection on each attribute in the list with the same min law value (as there may be ties for attributes with the same law).

We only need to return the attributes and min_law from the subquery, since the p is still in scope from earlier.