Schema Design / Queries

jvm986 · March 31, 2022, 12:39pm

I am very new to graph databases and to neo4j, I am trying to understand the limitation of the tools and would like to see if I am able to solve the following problem with the database alone (not having to use the application layer at all).

I have the following graph:
graph

Also defined by this graphql schema:

  type Process {
    id: ID! @id
    name: String! @unique
    assets: [Asset!]! @relationship(type: "PROCESS_ASSET", direction: OUT)
    score: Int! @computed(from: ["id"])
  }

  type Asset {
    id: ID! @id
    name: String! @unique
    type: AssetType! @relationship(type: "TYPE_OF", direction: OUT)
    childAssets: [Asset!]! @relationship(type: "CHILD_ASSET", direction: OUT)
    attributes: [AttributeValue!]!
      @relationship(type: "ATTRIBUTE_VALUE_OF", direction: OUT)
    score: Int @computed(from: ["id"])
  }

  type AssetType {
    id: ID! @id
    name: String! @unique
    attributes: [AttributeWeight!]!
      @relationship(type: "ATTRIBUTE_WEIGHT_OF", direction: OUT)
  }

  type Attribute {
    id: ID! @id
    name: String! @unique
  }

  type AttributeValue {
    id: ID! @id
    attribute: Attribute!
      @relationship(type: "VALUE_ATTRIBUTE_OF", direction: OUT)
    value: Int!
  }

  type AttributeWeight {
    id: ID! @id
    attribute: Attribute!
      @relationship(type: "WEIGHT_ATTRIBUTE_OF", direction: OUT)
    weight: Int!
  }

My goal is to calculate the score properties of the Asset nodes and the Process nodes. The score should be the average of the attributes related to each asset and its children, weighted by the type. If the attribute isn't associated with the type the attribute should be weighted at 0. In the case of a circular reference, each child should only appear once in the calculation.

My direct questions are:

Is this "schema" viable? Or should I try to use properties instead of these attributes I have created (value/weight).
Is there a query which will calculate the score value as described above (with this or any other approach)?

My progress so far:

Calculating the average value of attributes without consideration for weight (from type).
For Process -->

MATCH path = (p:Process {name: "process 1"})-[:PROCESS_ASSET*]->(a:Asset)-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

For Asset --> (I would expect this can be done in a more elegant way) // doesn't work

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(b:Asset)-[:ATTRIBUTE_VALUE_OF]->(v1)
MATCH (a:Asset {name: "asset 1"})-[:ATTRIBUTE_VALUE_OF]->(v2)
RETURN (avg(v1.value) + avg(v2.value)) / 2

Updated -->

MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)

glilienfield · March 31, 2022, 3:27pm

First, welcome aboard.

Is there a reason you have broken out the attribute and weight values as entities? Could we simplify the schema by incorporating these as relationship properties? Maybe something like the following:

Screen Shot 2022-03-31 at 11.01.07 AM

Does this accurately represent your domain model? We can help once we understand the data model?

jvm986 · March 31, 2022, 3:38pm

Hi Gary, Thanks for the warm welcome and the fast and helpful response.

To answer your question I have no problem simplifying the model into this format. What you have shown looks good to me, I will work on updating things on my end and keep working (and updating the OP as I go).

glilienfield · March 31, 2022, 3:38pm

What result do you expect from the second query? I think the query will include attribute '1' multiple times. I think the first match will include it twice and the second match once, for the graph presented.

jvm986 · March 31, 2022, 3:39pm

I have updated the query with something that actually works. NB, as mentioned this does not take into consideration the weights, I will have to address this once I understand things better.

MATCH (:Asset {name: "asset 1"})-[:CHILD_ASSET*]->(c)
WITH collect(c) AS uc
MATCH (p:Asset) WHERE p.name = "asset 1"
WITH collect(p) AS up, uc
UNWIND (up + uc) AS v
WITH v
MATCH (v)-[:ATTRIBUTE_VALUE_OF]-(r)
RETURN avg(r.value)

glilienfield · March 31, 2022, 4:00pm

I believe your query will result in the average equalling avg(r.1value, r1.value, r2.value). Is this what you want? If so, I think you can get the same result with the following query:

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*0..]->(:Asset)-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

The key difference is that it uses a variable length path criteria that includes a '0' length option. As such, it is able to include attribute 1 directly attached to asset1, as well as the other two paths through the "CHILD_ASSET" relationship. Using "*" as a variable length path criteria doesn't include zero.

glilienfield · March 31, 2022, 4:12pm

I have to make a correction. The intermediate node can't have a type.

MATCH (a:Asset {name: "asset 1"})-[:CHILD_ASSET*0..]->()-[:ATTRIBUTE_VALUE_OF]->(v)
RETURN avg(v.value)

glilienfield · March 31, 2022, 6:21pm

To give you a little guidance on a way to incorporate the weight, the following is an example assuming the data model in the diagram and the above query are correct. It finds all the paths to attributes that are related to types. Attributes without a type will not be included, which is effectively the same as assuming a weight of zero in the average calculation. The 'where' clause ensures only those paths that also tie back to the original asset are included. It then averages over the product of the attribute's value and the type's weight. Maybe it can also help with ideas if you end up with a different data model.

MATCH (a:Asset {name: "asset 1"})-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:Type)
where exists((a)-[:HAS_TYPE]->(t))
RETURN avg(r1.value * r2.weight)

jvm986 · March 31, 2022, 7:22pm

Wow! Thanks a tonne, this has me very excited for neo4j

Final GraphQL Schema -->

type Process {
  id: ID! @id
  name: String! @unique
  assets: [Asset!]! @relationship(type: "HAS_ASSET", direction: OUT)
  score: Float!
    @cypher(
      statement: "MATCH (p:Process {id: this.id})-[:HAS_ASSET]->(a:Asset)-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:AssetType) WHERE EXISTS((a)-[:HAS_TYPE]->(t)) RETURN avg(r1.value * r2.value)"
    )
}

type Asset {
  id: ID! @id
  name: String! @unique
  type: AssetType! @relationship(type: "HAS_TYPE", direction: OUT)
  childAssets: [Asset!]! @relationship(type: "HAS_ASSET", direction: OUT)
  attributes: [Attribute!]!
    @relationship(
      type: "HAS_ATTRIBUTE"
      direction: OUT
      properties: "HasAttribute"
    )
  score: Float!
    @cypher(
      statement: "MATCH (a:Asset {id: this.id})-[:HAS_ASSET*0..]->()-[r1:HAS_ATTRIBUTE]->(:Attribute)<-[r2:HAS_ATTRIBUTE]-(t:AssetType) WHERE EXISTS((a)-[:HAS_TYPE]->(t)) RETURN avg(r1.value * r2.value)"
    )
}

type AssetType {
  id: ID! @id
  name: String! @unique
  attributes: [Attribute!]!
    @relationship(
      type: "HAS_ATTRIBUTE"
      direction: OUT
      properties: "HasAttribute"
    )
}

type Attribute {
  id: ID! @id
  name: String! @unique
}

interface HasAttribute @relationshipProperties {
  value: Int!
}

Graph -->
graph (1)

glilienfield · March 31, 2022, 10:50pm

Terrific and glad to help. Your excitement is warranted. Cypher is so powerful and rocks over sql.

Topic		Replies	Views
Querying relationship properties in GraphQL GraphQL & GRANDstack	8	3967	December 3, 2021
Return relationships-with-properties as Query results GraphQL & GRANDstack graphql	0	1633	April 12, 2019
Help Needed to Define Schema GraphQL & GRANDstack	5	402	April 11, 2021
GraphQL schema query problem Neo4j Graph Platform cypher , graphql	1	433	July 17, 2020
Problem with Relationship Property GraphQL & GRANDstack	4	938	July 1, 2020

July Summer Fun!

Schema Design / Queries

Related topics