Depth first search on Neo4j with filtering on node properties content

Hello Neo4j Community,

I would like to perform a depth first search on my graph and so, get all the paths existing from a given node ('N1456' in my example), and all the nodes of theses path must have the same property "PROPERTY_TO_FILTER". Typically, my graph is composed of two types of node, and two types of relations. For now, I tested the following request :

WITH "
MATCH (my_node{name : 'N1456'})
CALL apoc.path.expandConfig(protein, {uniqueness:'NODE_GLOBAL', bfs : FALSE}) YIELD path
WITH path, my_node, last(nodes(path)) as subgraph
WHERE my_node<> subgraph and my_node.my_property CONTAINS 'PROPERTY_TO_FILTER'
RETURN nodes(path), length(path) AS len
ORDER BY len DESC" AS query
CALL apoc.export.json.query(query, "my_results.json", {})
YIELD properties, data
RETURN properties, data;

However, the results are not the ones attended. I get a list of paths but only the first node has the property "PROPERTY_TO_FILTER" ; this filter is not taken into account for the other nodes...

I guess I should put a filter at apoc.path.expandConfig level, but I see in the documentation that this is only possible to filter the node label, not the node properties.

Could someone help please ?

Best regards,
Nathalie

Hello @nathalie.jeanray :slight_smile:

You can use predicate functions like all() in your case:

MATCH (n:Node {id: 0})
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE node.my_property CONTAINS 'PROPERTY_TO_FILTER')
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

Regards,
Cobra

Hello @Cobra ,

Thanks for your answer.
However I don't get the attempted result.
It looks like the DFS isn't launched :-/
I guess that it is maybe due to the fact that the graph has the following structure :

n_Node -> rel:a -> m_Node -> rel:b -> n_Node

and m_Node hasn't the property "PROPERTY_TO_FILTER". So, I'd like to perform my DFS on all the nodes of the graph, using the filter "PROPERTY_TO_FILTER" on nodes n_Nodes, but systematically taking all the nodes between them "m_Node".

Would it be possible ?

So you only want the first node and the last node of the path to have the property "PROPERTY_TO_FILTER"? In my query the all() predicate make sure all the nodes of the path meet the condition, that's why your path is not returned.

Actually not :

I'd like that all the nodes "n" (Country) have the property value "PROPERTY_TO_FILTER", but I also want the nodes located into each pair of nodes Country (Town). Actually, the "Town" nodes doesn't have the same properties as "Country" ones. This is why I cannot filter them (and I don't want to :) ).

I'd like to get the paths looking like that :

Country_1 --> Town_x --> Country_4 --> Town_y --> Country_59 --> Town_a --> Country_45

Country nodes, all having as property value "PROPERTY_TO_FILTER"

I think I get it, in the WHERE clause, if it's a Country it will check the property value otherwise it will ignore the node:

MATCH (n:Country {name: "France"})
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE ("Country" IN labels(n) AND node.my_property CONTAINS 'PROPERTY_TO_FILTER') OR ("Town" IN labels(n)))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

Well, it looks like the same result...
The first node is correct, but it only returns this.
Normally I should get, from this starting node :

Country_1 --> Town_x --> Country_2 --> Town_a --> Country_10 --> Town_d --> Country_20
Country_1 --> Town_y --> Country_4 --> Town_b --> Country_08
Country_1 --> Town_z --> Country_6 --> Town_c --> Country_11
Country_1 --> Town_w --> Country_7

But I only get as result :

Country_1

Can you provide a little dataset that I can load into Neo4j?

Please, find my data here :

I'm loading it that way :

LOAD CSV WITH HEADERS FROM 'file:///neo4j_exp.csv' AS row
MERGE (p:Country{name : row.IDA, id: row.COUNTRYA, groupe: row.Groupe_ATLAS})
MERGE(s:Town {name : row.RESIDUE, id :row.index_residue})
MERGE (m:Country {name : row.IDB, id: row.COUNTRYB, residue_exp: row.RESIDUE_EXP, groupe: row.Groupe_Experimental})
WITH p,s,m,row
CREATE (p)-[rel:action]->(s)-[relation:on]->(m)
RETURN rel, relation;

As you can see, the filter is then performed on the value contained in "groupe" parameter, associated to Country nodes, which can have the values COND1_COMP1, ATLAS, COND1_COMP2, etc.

So I'd like to find the paths encompassing the Country nodes having one specific "groupe" (COND1_COMP1 as an example), linked by nodes Town, without any condition on them.

Thanks again for your precious help.

Best regards

Do you have a start node I can use? I think there is a problem with the data. But I'm sure the query answer the solution:

MATCH (n:Country) 
WHERE n.groupe_exp = "COND1_COMP1"
CALL apoc.path.expandConfig(n, {uniqueness: 'NODE_GLOBAL', bfs: FALSE}) YIELD path
WHERE all(n IN nodes(path) WHERE ("Town" IN labels(n)) OR ("Country" IN labels(n) AND n.groupe_exp = "COND1_COMP1"))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC

It only returns path of length 0 or 1

On the condition, we should use actually:

WHERE all(n IN nodes(path) WHERE ("Town" IN labels(n)) OR ("Country" IN labels(n) AND n.groupe_exp IN ['ATLAS','COND1_COMP1'])

I tried with start node 'Q13315' .

This first node you gave me doesn't have the property groupe. I really think the data is not properly loaded into your database. You should have one file for nodes and another one for relations then load nodes then relationships.

It's strange that I see one in my database.
Normally the property groupe has at least value "ATLAS".

I guess I made a typo when I wrote the load request in the chat :

LOAD CSV WITH HEADERS FROM 'file:///neo4j_exp.csv' AS row
MERGE (p:Country{name : row.IDA, id: row.COUNTRYA, groupe: row.Groupe_ATLAS})
MERGE(s:Town {name : row.RESIDUE, id :row.index_residue})
MERGE (m:Country {name : row.IDB, id: row.COUNTRYB, residue_exp: row.RESIDUE_EXP, groupe: row.Groupe_Experimental})
WITH p,s,m,row
CREATE (p)-[rel:action]->(s)-[relation:on]->(m)
RETURN rel, relation;

I tried with another node : 'Q13315'

MATCH (n:Country{name:'Q13315'}) WHERE n.groupe in 
['ATLAS','COND1_COMP1'])

It is possible that some Country node don't have all the possible values in propertiy Groupe..

MERGE clause overwrite everything in the node. So normally, you MERGE a node based on its id then SET its properties like MERGE (c:Country {id: row.COUNTRYA}) SET c.name = row.IDA, etc.

That's why, it's better to have a file for nodes and another one for relationships.

Well, I tried with this request :

WITH "MATCH (n:Country{name : 'P11309'})
WHERE n.groupe IN ['ATLAS','COND3_COMP1']
CALL apoc.path.expandConfig(n, {uniqueness:'NODE_GLOBAL', bfs : FALSE}) YIELD path
WHERE all(node IN nodes(path) WHERE ('Town' IN labels(n)) OR ('Country' IN labels(n) AND n.groupe in ['ATLAS','COND3_COMP1']))
RETURN nodes(path), length(path) AS len
ORDER BY len DESC" AS query
CALL apoc.export.json.query(query, "P11309-start-'ATLAS','COND3_COMP1'.json", {})
YIELD properties, data
RETURN properties, data;

It gaves me 7212394 properties, but again I can see other "groupe" property value than COND3_COMP1 and ATLAS in my json. It is so weird..

Can you rework your data file to have one file for nodes and another for relations?

After that, we will load properly and test the query.

Yes, thank you. Could you please advice on how the files must look like ?

I actually followed these instructions :

So, I first loaded the nodes, then I added the relationships with MERGE.

I still have the same issue however :-/

You have one file for nodes and one for relations ?

Not yet, I used the same one, as the article did also. But I can also try with two files if you think it can make a difference ?