Depth first search on Neo4j with filtering on node properties content

nathalie.jeanray · March 11, 2022, 8:48am

Hello Neo4j Community,

I would like to perform a depth first search on my graph and so, get all the paths existing from a given node ('N1456' in my example), and all the nodes of theses path must have the same property "PROPERTY_TO_FILTER". Typically, my graph is composed of two types of node, and two types of relations. For now, I tested the following request :

WITH "
MATCH (my_node{name : 'N1456'})
CALL apoc.path.expandConfig(protein, {uniqueness:'NODE_GLOBAL', bfs : FALSE}) YIELD path
WITH path, my_node, last(nodes(path)) as subgraph
WHERE my_node<> subgraph and my_node.my_property CONTAINS 'PROPERTY_TO_FILTER'
RETURN nodes(path), length(path) AS len
ORDER BY len DESC" AS query
CALL apoc.export.json.query(query, "my_results.json", {})
YIELD properties, data
RETURN properties, data;

However, the results are not the ones attended. I get a list of paths but only the first node has the property "PROPERTY_TO_FILTER" ; this filter is not taken into account for the other nodes...

I guess I should put a filter at apoc.path.expandConfig level, but I see in the documentation that this is only possible to filter the node label, not the node properties.

Could someone help please ?

Best regards,
Nathalie

cobra · March 11, 2022, 9:07am

Hello @nathalie.jeanray

You can use predicate functions like all() in your case:

MATCH (n:Node {id: 0})
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE node.my_property CONTAINS 'PROPERTY_TO_FILTER')
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

Regards,
Cobra

nathalie.jeanray · March 11, 2022, 9:55am

Hello @cobra ,

Thanks for your answer.
However I don't get the attempted result.
It looks like the DFS isn't launched :-/
I guess that it is maybe due to the fact that the graph has the following structure :

n_Node -> rel:a -> m_Node -> rel:b -> n_Node

and m_Node hasn't the property "PROPERTY_TO_FILTER". So, I'd like to perform my DFS on all the nodes of the graph, using the filter "PROPERTY_TO_FILTER" on nodes n_Nodes, but systematically taking all the nodes between them "m_Node".

Would it be possible ?

cobra · March 11, 2022, 10:05am

So you only want the first node and the last node of the path to have the property "PROPERTY_TO_FILTER"? In my query the all() predicate make sure all the nodes of the path meet the condition, that's why your path is not returned.

nathalie.jeanray · March 11, 2022, 10:16am

Actually not :

I'd like that all the nodes "n" (Country) have the property value "PROPERTY_TO_FILTER", but I also want the nodes located into each pair of nodes Country (Town). Actually, the "Town" nodes doesn't have the same properties as "Country" ones. This is why I cannot filter them (and I don't want to :) ).

I'd like to get the paths looking like that :

Country_1 --> Town_x --> Country_4 --> Town_y --> Country_59 --> Town_a --> Country_45

Country nodes, all having as property value "PROPERTY_TO_FILTER"

cobra · March 11, 2022, 10:24am

I think I get it, in the WHERE clause, if it's a Country it will check the property value otherwise it will ignore the node:

MATCH (n:Country {name: "France"})
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE ("Country" IN labels(n) AND node.my_property CONTAINS 'PROPERTY_TO_FILTER') OR ("Town" IN labels(n)))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

nathalie.jeanray · March 11, 2022, 10:47am

Well, it looks like the same result...
The first node is correct, but it only returns this.
Normally I should get, from this starting node :

Country_1 --> Town_x --> Country_2 --> Town_a --> Country_10 --> Town_d --> Country_20
Country_1 --> Town_y --> Country_4 --> Town_b --> Country_08
Country_1 --> Town_z --> Country_6 --> Town_c --> Country_11
Country_1 --> Town_w --> Country_7

But I only get as result :

Country_1

cobra · March 11, 2022, 10:56am

Can you provide a little dataset that I can load into Neo4j?

nathalie.jeanray · March 11, 2022, 12:27pm

Please, find my data here :

I'm loading it that way :

LOAD CSV WITH HEADERS FROM 'file:///neo4j_exp.csv' AS row
MERGE (p:Country{name : row.IDA, id: row.COUNTRYA, groupe: row.Groupe_ATLAS})
MERGE(s:Town {name : row.RESIDUE, id :row.index_residue})
MERGE (m:Country {name : row.IDB, id: row.COUNTRYB, residue_exp: row.RESIDUE_EXP, groupe: row.Groupe_Experimental})
WITH p,s,m,row
CREATE (p)-[rel:action]->(s)-[relation:on]->(m)
RETURN rel, relation;

As you can see, the filter is then performed on the value contained in "groupe" parameter, associated to Country nodes, which can have the values COND1_COMP1, ATLAS, COND1_COMP2, etc.

So I'd like to find the paths encompassing the Country nodes having one specific "groupe" (COND1_COMP1 as an example), linked by nodes Town, without any condition on them.

Thanks again for your precious help.

Best regards

cobra · March 11, 2022, 1:04pm

Do you have a start node I can use? I think there is a problem with the data. But I'm sure the query answer the solution:

MATCH (n:Country) 
WHERE n.groupe_exp = "COND1_COMP1"
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(n IN nodes(path) WHERE ("Town" IN labels(n)) OR ("Country" IN labels(n) AND n.groupe_exp = "COND1_COMP1"))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

It only returns path of length 0 or 1

nathalie.jeanray · March 11, 2022, 1:10pm

On the condition, we should use actually:

WHERE all(n IN nodes(path) WHERE ("Town" IN labels(n)) OR ("Country" IN labels(n) AND n.groupe_exp IN ['ATLAS','COND1_COMP1'])

I tried with start node 'Q13315' .

cobra · March 11, 2022, 1:14pm

This first node you gave me doesn't have the property groupe. I really think the data is not properly loaded into your database. You should have one file for nodes and another one for relations then load nodes then relationships.

nathalie.jeanray · March 11, 2022, 1:28pm

It's strange that I see one in my database.
Normally the property groupe has at least value "ATLAS".

I guess I made a typo when I wrote the load request in the chat :

LOAD CSV WITH HEADERS FROM 'file:///neo4j_exp.csv' AS row
MERGE (p:Country{name : row.IDA, id: row.COUNTRYA, groupe: row.Groupe_ATLAS})
MERGE(s:Town {name : row.RESIDUE, id :row.index_residue})
MERGE (m:Country {name : row.IDB, id: row.COUNTRYB, residue_exp: row.RESIDUE_EXP, groupe: row.Groupe_Experimental})
WITH p,s,m,row
CREATE (p)-[rel:action]->(s)-[relation:on]->(m)
RETURN rel, relation;

I tried with another node : 'Q13315'

MATCH (n:Country{name:'Q13315'}) WHERE n.groupe in 
['ATLAS','COND1_COMP1'])

It is possible that some Country node don't have all the possible values in propertiy Groupe..

cobra · March 11, 2022, 1:33pm

MERGE clause overwrite everything in the node. So normally, you MERGE a node based on its id then SET its properties like MERGE (c:Country {id: row.COUNTRYA}) SET c.name = row.IDA, etc.

That's why, it's better to have a file for nodes and another one for relationships.

nathalie.jeanray · March 11, 2022, 2:28pm

Well, I tried with this request :

WITH "MATCH (n:Country{name : 'P11309'})
WHERE n.groupe IN ['ATLAS','COND3_COMP1']
CALL apoc.path.expandConfig(n, {uniqueness:'NODE_GLOBAL', bfs : FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE ('Town' IN labels(n)) OR ('Country' IN labels(n) AND n.groupe in ['ATLAS','COND3_COMP1']))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC" AS query
CALL apoc.export.json.query(query, "P11309-start-'ATLAS','COND3_COMP1'.json", {})
YIELD properties, data
RETURN properties, data;

It gaves me 7212394 properties, but again I can see other "groupe" property value than COND3_COMP1 and ATLAS in my json. It is so weird..

cobra · March 11, 2022, 2:31pm

Can you rework your data file to have one file for nodes and another for relations?

After that, we will load properly and test the query.

nathalie.jeanray · March 11, 2022, 2:49pm

Yes, thank you. Could you please advice on how the files must look like ?

nathalie.jeanray · March 11, 2022, 3:34pm

I actually followed these instructions :

So, I first loaded the nodes, then I added the relationships with MERGE.

I still have the same issue however :-/

cobra · March 11, 2022, 3:45pm

You have one file for nodes and one for relations ?

nathalie.jeanray · March 14, 2022, 7:18am

Not yet, I used the same one, as the article did also. But I can also try with two files if you think it can make a difference ?

Topic		Replies	Views
Help with the post filtering after the apoc.path.subgraphAll Cypher apoc , cypher	4	420	February 9, 2023
Filter relationships on relationship properties returned by subGraphAll Procedures & APOC	15	3148	September 11, 2019
How to write labelFilter with unspecified number of the same label nodes Neo4j Graph Platform migrated	2	172	July 12, 2022
apoc.path.subgraphAll Filter Issue Neo4j Graph Platform migrated	12	267	December 15, 2022
How to filter path relationships by relationship properties? Cypher	1	5617	January 10, 2019

Depth first search on Neo4j with filtering on node properties content

Related topics