Schema Design / Queries

I am very new to graph databases and to neo4j, I am trying to understand the limitation of the tools and would like to see if I am able to solve the following problem with the database alone (not having to use the application layer at all).

I have the following graph:
graph

Also defined by this graphql schema:

  type Process {
    id: ID! @id
    name: String! @unique
    assets: [Asset!]! @relationship(type: "PROCESS_ASSET", direction: OUT)
    score: Int! @computed(from: ["id"])
  }

  type Asset {
    id: ID! @id
    name: String! @unique
    type: AssetType! @relationship(type: "TYPE_OF", direction: OUT)
    childAssets: [Asset!]! @relationship(type: "CHILD_ASSET", direction: OUT)
    attributes: [AttributeValue!]!
      @relationship(type: "ATTRIBUTE_VALUE_OF", direction: OUT)
    score: Int @computed(from: ["id"])
  }

  type AssetType {
    id: ID! @id
    name: String! @unique
    attributes: [AttributeWeight!]!
      @relationship(type: "ATTRIBUTE_WEIGHT_OF", direction: OUT)
  }

  type Attribute {
    id: ID! @id
    name: String! @unique
  }

  type AttributeValue {
    id: ID! @id
    attribute: Attribute!
      @relationship(type: "VALUE_ATTRIBUTE_OF", direction: OUT)
    value: Int!
  }

  type AttributeWeight {
    id: ID! @id
    attribute: Attribute!
      @relationship(type: "WEIGHT_ATTRIBUTE_OF", direction: OUT)
    weight: Int!
  }

My goal is to calculate the score properties of the Asset nodes and the Process nodes. The score should be the average of the attributes related to each asset and its children, weighted by the type. If the attribute isn't associated with the type the attribute should be weighted at 0. In the case of a circular reference, each child should only appear once in the calculation.

My direct questions are:

  1. Is this "schema" viable? Or should I try to use properties instead of these attributes I have created (value/weight).
  2. Is there a query which will calculate the score value as described above (with this or any other approach)?

My progress so far:

Calculating the average value of attributes without consideration for weight (from type).
For Process -->

MATCH path = (p:Process {name: "process 1"})-[:PROCESS_ASSET*]->(a:Asset)-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

For Asset --> (I would expect this can be done in a more elegant way) // doesn't work

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v1)
MATCH (a:Asset {name: "asset 1"})-[:ATTRIBUTE_VALUE_OF]->(v2)
RETURN (avg(v1.value) + avg(v2.value)) / 2

Updated -->

MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)

First, welcome aboard.

Is there a reason you have broken out the attribute and weight values as entities? Could we simplify the schema by incorporating these as relationship properties? Maybe something like the following:

Screen Shot 2022-03-31 at 11.01.07 AM

Does this accurately represent your domain model? We can help once we understand the data model?

2 Likes

Hi Gary, Thanks for the warm welcome and the fast and helpful response.

To answer your question I have no problem simplifying the model into this format. What you have shown looks good to me, I will work on updating things on my end and keep working (and updating the OP as I go).

1 Like

What result do you expect from the second query? I think the query will include attribute '1' multiple times. I think the first match will include it twice and the second match once, for the graph presented.

1 Like

I have updated the query with something that actually works. NB, as mentioned this does not take into consideration the weights, I will have to address this once I understand things better.

MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)

I believe your query will result in the average equalling avg(r.1value, r1.value, r2.value). Is this what you want? If so, I think you can get the same result with the following query:

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*0..]->(:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

The key difference is that it uses a variable length path criteria that includes a '0' length option. As such, it is able to include attribute 1 directly attached to asset1, as well as the other two paths through the "CHILD_ASSET" relationship. Using "*" as a variable length path criteria doesn't include zero.

1 Like

I have to make a correction. The intermediate node can't have a type.

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*0..]->()-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

To give you a little guidance on a way to incorporate the weight, the following is an example assuming the data model in the diagram and the above query are correct. It finds all the paths to attributes that are related to types. Attributes without a type will not be included, which is effectively the same as assuming a weight of zero in the average calculation. The 'where' clause ensures only those paths that also tie back to the original asset are included. It then averages over the product of the attribute's value and the type's weight. Maybe it can also help with ideas if you end up with a different data model.

MATCH (a:Asset {name: "asset 1"})-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:Type)
where exists((a)-[:HAS_TYPE]->(t))
RETURN avg(r1.value * r2.weight) 
1 Like

Wow! Thanks a tonne, this has me very excited for neo4j :smiley:

Final GraphQL Schema -->

type Process {
  id: ID! @id
  name: String! @unique
  assets: [Asset!]! @relationship(type: "HAS_ASSET", direction: OUT)
  score: Float!
    @cypher(
      statement: "MATCH (p:Process {id: this.id})-[:HAS_ASSET]->(a:Asset)-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:AssetType) WHERE EXISTS((a)-[:HAS_TYPE]->(t)) RETURN avg(r1.value * r2.value)"
    )
}

type Asset {
  id: ID! @id
  name: String! @unique
  type: AssetType! @relationship(type: "HAS_TYPE", direction: OUT)
  childAssets: [Asset!]! @relationship(type: "HAS_ASSET", direction: OUT)
  attributes: [Attribute!]!
    @relationship(
      type: "HAS_ATTRIBUTE"
      direction: OUT
      properties: "HasAttribute"
    )
  score: Float!
    @cypher(
      statement: "MATCH (a:Asset {id: this.id})-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:AssetType) WHERE EXISTS((a)-[:HAS_TYPE]->(t)) RETURN avg(r1.value * r2.value)"
    )
}

type AssetType {
  id: ID! @id
  name: String! @unique
  attributes: [Attribute!]!
    @relationship(
      type: "HAS_ATTRIBUTE"
      direction: OUT
      properties: "HasAttribute"
    )
}

type Attribute {
  id: ID! @id
  name: String! @unique
}

interface HasAttribute @relationshipProperties {
  value: Int!
}

Graph -->
graph (1)

Terrific and glad to help. Your excitement is warranted. Cypher is so powerful and rocks over sql.