How to deal with list of properties and do a comparison between two lists in neo4j?

Hello everyone,

I am actually working on a database (Community version 4.4.4) and facing an issue. I am wondering how neo4j deals with a list of properties (either on nodes or on links).
I am trying to find longest coherent path and what I want to do is to verify at each step of the path the coherence between the information on the nodes and the one on the links. So for each step of the path I want to compare each item of a list of properties (on a link) to a list of properties (on nodes).

Here is my code :

 MATCH (n:Node{prop1: ['X', 'Y', 'Z']})-[r:links*]->(next)
WHERE not((next)-[r:links]->())
*AND extract (rel in r | rel.prop_rel)* > what to add or how to modify this line ?
RETURN [t in nodes(path) | t.name] AS path

Let's say I have my node A with prop1 which is a list of properties, and I want to compare each of X, Y and Z to the list of properties of "r" and if I have at least one of X, Y and Z included in the list of rel properties, I can move forward in the path and if not I have to start over.

Could you help please ?
Do not hesitate to ask questions in the case it's not really clear.

Thanks a lot,

Leila

To verify my understanding, are you looking for paths with that pattern that also have the characteristic that each node and subsequent relationship have at least one value in common among their corresponding ‘prop1’ arrays?

Or

Does the ‘prop1’ list in each node and relationship represent a set of property keys, where each node and subsequent relationship must have at least one key/value pair in common?

Or

Neither?

I would say the first one :
Each relationship has to have at least one item of its properties list (prop_rel) among the properties list (props1) of the nodes it's pointing to.
If not we can't go further in the path we're going through. We have to go back to the lower level or to start over from another node.

Still not clear on the requirement. I don't understand the 'level' concept. Anyway, maybe we can iteratively achieve your goal. The following query will find all paths matching your pattern that have the characteristic that each of their nodes and immediate relationship have at least one value in common from their respective lists in their prop1 properties.

MATCH p=(n:Node{prop1: ['X', 'Y', 'Z']})-[:links*]->(next)
WHERE not((next)-[:links]->())
WITH p, relationships(p) as relationships
WHERE all(r in relationships where any(i in startNode(r).prop1 where i in r.prop1))
RETURN [t in nodes(p) | t.name] AS path

I re-read your requirement and adjusted the query to be more accurate about which list must be in which list. I also adjusted the property names for each list, as they don't have the same key.

Is this what you are looking for?

MATCH p=(n:Node{prop1: ['X', 'Y', 'Z']})-[:links*]->(next)
WHERE not exists ((next)-[:links]->())
WITH p, relationships(p) as relationships
WHERE all(r in relationships where any(i in r.prop_rel where i in startNode(r).prop1))
RETURN [t in nodes(p) | t.name] AS path

Thanks a lot for your time.
I am going to give the query a try. It seems like all the requirements are included but I actually don't really get the "i in startNode(r)". If A is linked to B with the relation links, what I want to do is to compare links.prop_rel with B.prop1 meaning that the intersection between the two lists has to contain at least one element. So it may not be startNode but endNode (if it exists).
This being, I will anyway try the query and get back to you.

Thanks again,

Leila

You are very welcome.

Your cypher pattern is a directed path from. A node through a variable number of ‘link’ relationships. The query captures all the relationships along each path that matches with the relationship(p) call. This is collection. Each element is a relationship, where each relationship gives access to its start node, end node, and properties, via the startNode, endNode, and property methods. The start and end nodes are determined by the direction of the relationship. You mentioned you wanted a node and it’s subsequent relationship to have the common properties, so I compared the startNode properties to the relationship properties. If you prefer to have the relationship properties compared to its subsequent node’s properties, then replace ‘startNode’ with ‘endNode’.

startNode(r) is the startNode for relationship ‘r’, while startNode(r).prop1 is the collection stored in the nodes ‘prop1’ property. The ‘i in startNode(r).prop1‘ is a predicate testing if the value of ‘i’ is in the list. This has to be true for at least one value of ‘i’.

Okay thanks, I got it and understood the subtlety.
I ran the query on a graph with 1227 nodes and 2935 relations and it is still ongoing. Do you think it's normal that it takes this long ?

Thank you :slight_smile:

Leila

Sorry I don't have any insight. The query is doing a lot of comparing of lists. Maybe investigate the bottleneck in two ways. First, is it taking time to find the anchor node, as you are looking for nodes with a specific list of elements for its property. It would have to inspect each node and each element of the list. BTW, the list has to be exactly as you describe, in that order for a match. You can test this phase of the query by observing the time to process the simplified query:

MATCH (n:Node{prop1: ['X', 'Y', 'Z']}) return id(n)

if this is slow, do you have another property that has a singleton value that you can use to identify the anchor node?

You can test the speed of the second part of the query if you give it the id of the node with prop1=['X', 'Y', 'Z'] from the first test.

MATCH p=(n:Node where id(n)=)-[:links*]->(next)
WHERE not exists ((next)-[:links]->())
WITH p, relationships(p) as relationships
WHERE all(r in relationships where any(i in r.prop_rel where i in startNode(r).prop1))
RETURN [t in nodes(p) | t.name] AS path

If the second part is slow, maybe using an apoc function to determine if any overlap exists between the two sets is faster than the following cypher:

any(i in r.prop_rel where i in startNode(r).prop1)

instead, use

not isEmpty(apoc.coll.intersection(r.prop_rel, startNode(r).prop1))

updated query with apoc:

MATCH p=(n:Node where id(n)=)-[:links*]->(next)
WHERE not exists ((next)-[:links]->())
WITH p, relationships(p) as relationships
WHERE all(r in relationships where not isEmpty(apoc.coll.intersection(r.prop_rel, startNode(r).prop1)))
RETURN [t in nodes(p) | t.name] AS path

Many thanks to you for all the precisions and for the clarity of your explanations.
I am going to give these a try, constrain more the query and use apoc instead.

Thanks again,

Leila