we are working on a traceability graph for a product production process.
This is a pilot for us and we are still learning modelling and the cypher query language (after following some Neo4j training).
The production process has diverging an converging paths. A simplified version of our graph is shown below. All Products (green) in the graph can be sold and in that case will have a quality check (not always shown).
I would like to return a path from a sold product back to a supplied product that did not pass a quality check. The supplied product ("Crop" or "Ingredient") has two leave nodes; a supplier and a certificate.
I can define a query that returns the supplied product and either the supplier or the certificate info, but not both.
Example query:
MATCH path = ((:Customer)<-[DELIVERED_TO]-(:Product:SalesItem)<-[*1..4]-(c:Crop)-[IS_CERTIFIED]->(:Certificate {passed_inspection:'false'})) RETURN path
I would like to add something like the following to return the Farm-information
, (c)-[GROWN_ON]->(f:Farm)
Is this possible with a Cypher query?
Do we need to modify our model?
Thanks
Your query is for a general path. There is not anchor node. This will find all such paths in your database. What is the query given the specific product that you are concerned with, i.e. how do we identify the product?
Also, if you have a specific product and are concerned with the certificate and farm, do you need a path back to the customer?
Yes, we would like to retrieve the path from sales item back to the source.
We can't start with a specific product as the purpose is to trace back all sold products to a source-product that is faulty (does not comply to a certain quality requirement).
The only constraint, not shown in this example, is that the query will be limited to a given time-period (for example 'yesterday').
The question can also be turned around: given a faulty source-product, give me the supplier and all sold products that were produced based on this source-product.
I hope that this helps to clarify our requirements.
I don't know what's your end-goal, but some recommendations as well:
add temporality to your certificate node, eventually you will want to filter by time ranges.
your query is returning a path (means that your query will output a sequence of connected nodes and relationships that satisfy the conditions you've specified) and you want to add nodes, so something like this:
(Personally prefer the anchor node(s) to be front-end, so i've inverted the query's order)
MATCH path = ((:Certificate {passed_inspection:'false'})
<-[IS_CERTIFIED]-(c:Crop)
-[*1..4]->(:Product:SalesItem)
-[DELIVERED_TO]->(:Customer))
WITH path AS p
MATCH (c)-[GROWN_ON]->(f:Farm)
RETURN p, f
but it is likely this can get huge, so be mindful if you want to split the query, return only nodes (e.g. Customers affected) if that can be the case
Which node is represented by the "faulty source-product"?
--> Crop or Ingredient, but it depends on the value of the certificate-node.
Which node is represented by the "supplier"
--> Farm or Supplier
Which nodes represent all the sold products that were produced based on the faulty source product?
--> All of the "green" nodes can be sold to a customer... But let's focus on the the Product nodes that are linked to the Customer nodes.
Thanks. I wasn't aware that I could relate the "(c)-[GROWN_ON]->(f:Farm)"-part to the rest of the path like this.
But somehow this query returns all Farm-nodes in my database. Even if the crop-node is not related.
I modified the query a bit to return the path in both parts of the query
MATCH path = ((:Certificate {passed_inspection:'false'})
<-[IS_CERTIFIED]-(c:Crop)
-[*1..4]->(:Product:SalesItem)
-[DELIVERED_TO]->(:Customer))
WITH path AS p
MATCH p2 = ((c)-[GROWN_ON]->(f:Farm))
RETURN p, p2
The Farm-nodes are colored light-blue, the Crop (and Seed-) nodes dark-blue. The certificate-node is green, the Customer-node orange.
Note that for now we like to return the full path because it helps with validating the results.
In the final application we will most probably only return the Customer-nodes (as you mentioned in your 3rd bullet).
We will also add temporality. But this causes a different problem. I will ask a question on that in a separate post.
It seems like there are some more generalized scenarios based on the other types of "product" nodes. We can try to generalized the query if you want the full list.
Anyways, if you want something like a report of all faulty crop products, you can format the output a little. Also, I like pattern comprehension with I need to additional results and don't need to join the results. Here is an example:
MATCH (:Certificate {passed_inspection:'false'})<-[IS_CERTIFIED]-(c:Crop)
RETURN
c as crop,
[(c)-[*1..4]->(p:Product&SalesItem) | p.name] as soldProducts,
head([(c)-[GROWN_ON]->(f:Farm|Supplier) | f.name]) as supplier
I extracted "name" properties from the sold products and farm just to show what is possible. You can return the entire node, or a subset of properties too.
P.S. The issue with the earlier query that returned all Farm-nodes is that the 'c' variable was not passed in the 'WITH' clause, so the match to create 'p2' was not constrained to the 'Crop' node from the first match.
I marked the simplified query by Josh as the solution as that provides exactly what we needed; a complete path that includes both the Certificate and Farm nodes related to the Crop. We need this at this moment in time for verification of the query results.
Gary's solution that only returns the node properties make the query much more efficient.
So i guess that this will be used for the final solution when we have much, much more data.
Surprising for us is to see that there are multiple possible query definitions.
Yes, you will find there a many ways to approach a neo4j query.
Anyways, you can also return the nodes instead of extracting specific properties:
MATCH (:Certificate {passed_inspection:'false'})<-[IS_CERTIFIED]-(c:Crop)
RETURN
c as crop,
[(c)-[*1..4]->(p:Product&SalesItem) | p] as soldProducts,
head([(c)-[GROWN_ON]->(f:Farm|Supplier) | f]) as supplier