There are several nodes: Variety. Each node is connected with the nodes **:Control** (no treatment applied), **:P1R** - one treatment is applied and **:P2B** another treatment is applied. Each node ( **:Control**, **:P1R**, **:P2B**) is connected to the **:Protein** node by the relation **:CONTAINS** (contains a certain percentage of protein).

**The protein content of each variety must be determined.**

Is such a variant of the subgraph possible?

How can it be determined to which variety the value in the Protein node belongs?

A couple questions to help understand your donain:

- What is the formula to determine the protein content for one variety?
- Why do you have three CONTAINS relationships connecting each treatment and control node to the protein node?

For each variety there is a control group and a group with the applied treatment.

Control group of Variety 1 (Clavera) contains x% protein (Mean value);

The second group of the same variety treated with the first Compound (P1R) contains x% protein;

The third group of the same variety is treated with P2B and contains x% protein.

And the same goes for all varieties.

Based on this model, the influence of the treatment on the protein content will be assessed. The Pearson Similarity algorithm could be used.

```
Per your model, you need to store some info about Variety node as a property in CONTAINS relationship:
create (pl)-[rel:CONTAINS {percentages: 33.56, paid: 4}]->(prot)
With this you find the protein percentages for each Variety,
```