How can I make the count of frequency for each cluster separately

operations

(Mehdi Ajroud) #1

I am using this query to count the frequency of each "CPV" in cluster by using "total = 238" which is the number of distinct contract :

MATCH (c1:Contrat1018)<-[r:HAS1018]-(p:CPV1018)-[x:HAS1018]->(c2:Contrat1018)
with distinct p.id as CPV , collect(p.id) AS Total , tofloat(count(p.id)) as Occurence , c1.clusterId as Cluster , c2.clusterId as Cluster2 , 3980 as total
where Cluster=Cluster2  
return Cluster , Cluster2 , CPV , Occurence  , TOFLOAT((Occurence / total )* 100) as frequence
order by Occurence desc

But the result that I am getting is the frequency of each CPV in all clusters . And what I really need to have as a result is the frequency of CPV in its cluster (in the cluster where he is working )

Here is the result that I am getting btw :

I need your help please :) thanks in advance !


(Mehdi Ajroud) #2

Knowing that the total which is 3980 here , will change from a cluster to another since it's the number of contracts distinct which changes from a cluster to another .


(Andrew Bowman) #3

Which of your properties are unique to nodes of the given types?

I would guess that id is a unique property of :CPV1018 nodes. Is that so? What about clusterId of :Contrat1018 nodes? Will c1 and c2 ever be the same node, or are they going to be different nodes with the same clusterId?

You will probably want to have an id predicate for c1 and c2 to avoid symmetric results (2 rows with the same elements but with c1 and c2 nodes switched). You can do this with: WHERE id(c1) < id(c2) (this will also prevent cases where c1 = c2....so if this is a valid possibility, use <= instead.


(Andrew Bowman) #4

Okay...so how do you get the working cluster for a CPV?


(Mehdi Ajroud) #5

Hey Andrew ,
I dont have any unique property for my nodes . Even Contracts are using id as well .
About the clusterId , all nodes in my db (Contract and CPV in my case have a property called clusterId , btw I used this query to generate the clusterId property :

CALL algo.unionFind('CPV1018', 'HAS1018', {write:true, partitionProperty:"clusterId",weightProperty:'weight',
defaultValue:0.0, threshold:2.0, concurrency: 1})
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;

c1 and c2 are different nodes with the same clusterId yes .
I will try the WHERE condition and I will let you know .


(Mehdi Ajroud) #6

I used this query :

CALL algo.unionFind('CPV1018', 'HAS1018', {write:true, partitionProperty:"clusterId",weightProperty:'weight',
defaultValue:0.0, threshold:2.0, concurrency: 1})
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;

(Mehdi Ajroud) #7

@andrew.bowman , Shall I maybe use FOREACH , to go through each cluster and count only the frequency of CPVs present in that cluster ?


(Andrew Bowman) #8

No, FOREACH is only used for writing clauses, you won't be able to use MATCH or WITH within them, or get anything out of them.


(Mehdi Ajroud) #9

Thank you @andrew.bowman for your answer :)