Create Relationships with a count property based on distinct properties

Beamfreak · July 13, 2023, 4:23pm

Hello there,

I can't seem to solve this problem of mine.
I have a class algorithm that has the relationship overperforms conneting to other algorithm classes, and a result that should rank these algorithms based on their number of overperforms compared to the total algorithm number. Additionally, a group of algorithms can be compared in different experiments.
The problem however is, that these overperform contain a metric (e.g., F-Score), and my result should only count the overperform relations with the same metric in the same experiment for each algorithm.

Heres what I tried:

MATCH (b:RESULT)<-[y:HAS_RESULT]-(a:EXPERIMENT)<-[z:EVALUATED_IN]-(m:ALGORITHM)-[x:OVERPERFORMS]->(n:ALGORITHM) 
WHERE a.ID = x.Experiment AND m.ID <> n.ID
WITH m as pipe, z.Metric as metr, a.ID as exp, count(x) as cc, x.Metric as xm, b
CALL apoc.create.relationship(b, "RANKS", 
    apoc.map.fromValues([
        "Number_of_Pipelines", cc, 
        "Experiment", exp, 
        "Metric", metr])
    , pipe)
YIELD rel
RETURN pipe.ID , rel
;

But when I do this, I get multiple relations with the wrong count.
Heres a example of what I get for one algorithm:

[:RANKS {Metric: "Precision",Experiment: "ID152",Number_of_Pipelines: 4}]
[:RANKS {Metric: "Precision",Experiment: "ID152",Number_of_Pipelines: 4}]
[:RANKS {Metric: "Precision",Experiment: "ID152",Number_of_Pipelines: 3}]
[:RANKS {Metric: "F1Score",Experiment: "ID152",Number_of_Pipelines: 4}]  
[:RANKS {Metric: "F1Score",Experiment: "ID152",Number_of_Pipelines: 4}]  
[:RANKS {Metric: "F1Score",Experiment: "ID152",Number_of_Pipelines: 3}]  
[:RANKS {Metric: "Recall",Experiment: "ID152",Number_of_Pipelines: 4}]   
[:RANKS {Metric: "Recall",Experiment: "ID152",Number_of_Pipelines: 4}]   
[:RANKS {Metric: "Recall",Experiment: "ID152",Number_of_Pipelines: 3}]

Heres what it should look like:

[:RANKS {Metric: "Precision",Experiment: "ID152",Number_of_Pipelines: 4}]
[:RANKS {Metric: "F1Score",Experiment: "ID152",Number_of_Pipelines: 4}]  
[:RANKS {Metric: "Recall",Experiment: "ID152",Number_of_Pipelines: 3}]

Now if I delete the x.Metric as xm from my WITH count adds all ranks regardless of the metric

[:RANKS {Metric: "Precision",Experiment: "ID152",Number_of_Pipelines: 11}]
[:RANKS {Metric: "F1Score",Experiment: "ID152",Number_of_Pipelines: 11}]  
[:RANKS {Metric: "Recall",Experiment: "ID152",Number_of_Pipelines: 11}]

I appreciate any help or ideas. Maybe is there a way to specifiy which of the double rels should be deleted (2,3 for the first; 1,3 for the second; 1, 2 for the third)

Best regards

nathan.smith1 · July 13, 2023, 4:58pm

Does it help if you add AND x.Metric = z.Metric as part of your WHERE clause?

ameyasoft · July 13, 2023, 9:44pm

Try this: modify your third line as shown below:

WITH distinct b, m as pipe, z.Metric as metr, a.ID as exp, count(x) as cc, x.Metric as xm

Beamfreak · July 14, 2023, 8:59am

Thanks. It seems like that solved the problem.
Do you have any idea how I can create RANKS for the algorithms, that have NO overperforms --> NumberOfPipelines always 0

I tried it with where none ()-[:RANKS]->() but that only works for the first experiment for each algorithm.
Another approach was:

MATCH (b:Result)<-[y:HAS_RESULT]-(a:Experiment)<-[z:EVALUATED_IN]-(m:Algorithm)<-[x:OVERPERFORMS]-(n:Algorithm),
(b)-[ra:RANKS]->(m)
with collect(ra) as ran, a, b, x, y, z, m, n
WHERE a.ID = x.Experiment AND x.Metric = z.Metric and m.ID <> n.ID 
and none (rank in ran where rank.Metric = z.Metric and rank.Experiment = a.ID)

But that gets me no changes no records.
What I need is to test, whether the relation with the certain experiment and metric already exists, and if not create one.

glilienfield · July 14, 2023, 1:03pm

I think some of your difficulty extracting the information you want may be due to your data model. What I see is you have a chain of relationships between entities, but then you have an additional constraint that nodes along the path also have to have equal identifiers another that two entities need to have the same metric type. I feel a data model should not have such constraints, as it means some nodes along the path are not actually related. The identifier constraint is kinda like an inner join in a relational database. If the Experiment and OVERPAYMENT are related via the Id constraint, then maybe move the overperforms relationship from the Algorithm and put it on the experiment that resulted in this Algorithm outperforming the other algorithms If not there, maybe the Result node. Then you drop tracking the identifier in the over perform relationship, as it is no longer needed. From either of these changes, you could easily count the number of other algorithms a single algorithm outperforms in the same experiment and with the same metric.

Also, there would be query performance implications with this type of model. Typically you would be able to traverse the paths looking for specific node labels and relationship types. This should be fairly fast. In your case, all the potential paths will need to be filtered to remove the ones that don't have the extra constraint(s). This means accessing the node/relationship properties to perform the filtering.

Of course, I don't fully understand your model and don't know the decisions you went through to come up with this model, so I apologize if I am off basis. These are just things I think about when I create a model for my needs.

Beamfreak · July 14, 2023, 6:32pm

Thanks for your input. Yeah I'm sure my model is definetly improvable in many aspects, but as this is my first Graph Database and I wanted it to function based on my defined ontology (which is probably also a bit to complex) there is definetly room for improvement.
Sadly I don't have the time anymore to redo my ontology and GDB structure, but I will keep that in mind for the future.

Though I think I managed to solve my problem with the missing RANKS like this:

MATCH (b:Result)<-[y:HAS_RESULT]-(a:Experiment)<-[z:EVALUATED_IN]-(m:Algorithm)<-[x:OVERPERFORMS]-(n:Algorithm)
where a.ID = x.Experiment AND x.Metric = z.Metric and not exists((m)<-[:RANKS {Metric: x.Metric}]-(b))
WITH m as pipe, z.Metric as metr, a.ID as exp, b
CALL apoc.create.relationship(b, "RANKS", 
    apoc.map.fromValues([
        "Number_of_Pipelines", "0", 
        "Experiment", exp, 
        "Metric", metr])
    , pipe)
YIELD rel
RETURN pipe.ID , rel
;

glilienfield · July 14, 2023, 9:21pm

Btw, you don’t need the apoc function to create the map fir the relationship properties since the keys are literal strings. You can pass a literal map as such:

{"Number_of_Pipelines": "0", 
        "Experiment": exp, 
        "Metric": metr}

Also, since your relationship tyoe is know and fixed, you can create the relationship with cypher. You could even use merge to create the relationship and it would not create it again.

Topic		Replies	Views
Count number of relationships per type Cypher	2	6916	March 14, 2019
Count distinct values in the neighborhood Cypher	2	205	December 26, 2023
Create Relationship based on the column header Cypher apoc , performance , cypher	6	331	June 14, 2021
Count distinct value for a relationship property Cypher	2	1026	January 18, 2021
Create relationships according to the order of the nodes property values Cypher cypher	3	493	December 5, 2019

Create Relationships with a count property based on distinct properties

Related topics