I am very new to graph databases and to neo4j, I am trying to understand the limitation of the tools and would like to see if I am able to solve the following problem with the database alone (not having to use the application layer at all).
My goal is to calculate the score properties of the Asset nodes and the Process nodes. The score should be the average of the attributes related to each asset and its children, weighted by the type. If the attribute isn't associated with the type the attribute should be weighted at 0. In the case of a circular reference, each child should only appear once in the calculation.
My direct questions are:
Is this "schema" viable? Or should I try to use properties instead of these attributes I have created (value/weight).
Is there a query which will calculate the score value as described above (with this or any other approach)?
My progress so far:
Calculating the average value of attributes without consideration for weight (from type).
For Process -->
MATCH path = (p:Process {name: "process 1"})-[:PROCESS_ASSET*]->(a:Asset)-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)
For Asset --> (I would expect this can be done in a more elegant way) // doesn't work
MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v1)
MATCH (a:Asset {name: "asset 1"})-[:ATTRIBUTE_VALUE_OF]->(v2)
RETURN (avg(v1.value) + avg(v2.value)) / 2
Updated -->
MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)
Is there a reason you have broken out the attribute and weight values as entities? Could we simplify the schema by incorporating these as relationship properties? Maybe something like the following:
Does this accurately represent your domain model? We can help once we understand the data model?
Hi Gary, Thanks for the warm welcome and the fast and helpful response.
To answer your question I have no problem simplifying the model into this format. What you have shown looks good to me, I will work on updating things on my end and keep working (and updating the OP as I go).
What result do you expect from the second query? I think the query will include attribute '1' multiple times. I think the first match will include it twice and the second match once, for the graph presented.
I have updated the query with something that actually works. NB, as mentioned this does not take into consideration the weights, I will have to address this once I understand things better.
MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)
I believe your query will result in the average equalling avg(r.1value, r1.value, r2.value). Is this what you want? If so, I think you can get the same result with the following query:
MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*0..]->(:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)
The key difference is that it uses a variable length path criteria that includes a '0' length option. As such, it is able to include attribute 1 directly attached to asset1, as well as the other two paths through the "CHILD_ASSET" relationship. Using "*" as a variable length path criteria doesn't include zero.
To give you a little guidance on a way to incorporate the weight, the following is an example assuming the data model in the diagram and the above query are correct. It finds all the paths to attributes that are related to types. Attributes without a type will not be included, which is effectively the same as assuming a weight of zero in the average calculation. The 'where' clause ensures only those paths that also tie back to the original asset are included. It then averages over the product of the attribute's value and the type's weight. Maybe it can also help with ideas if you end up with a different data model.
MATCH (a:Asset {name: "asset 1"})-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:Type)
where exists((a)-[:HAS_TYPE]->(t))
RETURN avg(r1.value * r2.weight)