Hi Tony.
I found a way to do what I needed!!!!
First of all, thanks again for taking the time to help me and provide great insights. I really appreciate it!!
This is my graph for one topuser.
For a selected cluster (topuser=522 and direct neighbours=444,770,1320,1137,192), I needed to calculate a βclustering coefficientβ, which should be total existing relationships between this cluster (βlinksβ variable in my code = 12 in this cluster) divided by the total of possible relationships (for 5 direct neighbours, there are β5*(5-1)=20 onesβ).
At per my first post, I expected to count the number of unique pair of node.ids. To make it easier to see what I did I make a list of attempts below.
MATCH (u1 {id:522})-[r1:InteractsWith]-(u2)
WITH collect(distinct u2.id) as neighbours
MATCH (u2)-[r2:InteractsWith]-(u3)
WHERE u2.id in neighbours AND u3.id in neighbours
RETURN .....
Attempt 101
RETURN u2.id, u3.id
= βββββββββ€ββββββββ
β"u2.id"β"u3.id"β
βββββββββͺββββββββ‘
β192 β1137 β
βββββββββΌββββββββ€
β192 β1137 β
βββββββββΌββββββββ€
β444 β1137 β
βββββββββΌββββββββ€
β444 β1137 β
βββββββββΌββββββββ€
...
βββββββββΌββββββββ€
β770 β1137 β
βββββββββΌββββββββ€
β770 β1320 β
βββββββββ΄ββββββββ
*74 rows*
Attempt 102
RETURN [u2.id, u3.id]
= [u2.id, u3.id]
[192, 1137]
[192, 1137]
[444, 1137]
[444, 1137]
...
[770, 1137]
[770, 1320]
*74 rows*
Attempt 103
RETURN distinct [u2.id, u3.id]
= [u2.id, u3.id]
[192, 1137]
[444, 1137]
...
[770, 444]
*14 rows*
Attempt 104
RETURN collect(distinct [u2.id, u3.id])
= [192, 1137], [444, 1137], [444, 1320], [444, 770], [1320, 1137], [1320, 770], [1320, 444], [1137, 444], [1137, 770], [1137, 1320], [1137, 192], [770, 1320], [770, 1137], [770, 444]]
*1 row (with 14 pairs)*
Attempt 1001
RETURN size(collect(distinct [u2.id, u3.id]))
=14
Then, my final code was....
WITH [394,2067,1087,209,554,1627,999,516,461,668] as topusers
UNWIND topusers as topuser
MATCH (u1 {id: topuser})-[r1:InteractsWith]-(u2)
WITH topuser, collect(distinct u2.id) as neighbours, count(distinct u2.id) as neighboursCount
# So far, select a topuser, list the βdirect neighboursβ and count them.
MATCH (u2)-[r2:InteractsWith]-(u3)
WHERE u2.id in neighbours AND u3.id in neighbours
# So far, consider only the neighboursβ neighbours who are part of the βdirect neighboursβ list.
WITH topuser, neighboursCount, size(collect(distinct [u2.id, u3.id]))/2+neighboursCount as links
# So far, select all pair of neighbours (nodesβ id), then select distinct pair (to ignore multiple edges between them), collect all unique pair as a list (meaning unique relations between these neighbours), calculate this list size (how many relations there are), divide by two (to remove bi-directional relations,i.e., A-B, B-A must be counted as 1), add number of relations between the topuser and the βdirect neighboursβ, and finally save the total existing links between the all βdirect neighboursβ including topuser.
WITH topuser, tofloat(links)/(neighboursCount*(neighboursCount-1)) as coefficient
RETURN topuser, round(100*coefficient)/100 as coefficient order by coefficient desc
# Finally, for each topuser calculate the coefficient with two decimal digits
Labels and scheme
Finally, I do have labels. I just didnβt use any usage in my query. Am I missing something?
Labels:
CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE;
CREATE CONSTRAINT ON (t:Team) ASSERT t.id IS UNIQUE;
CREATE CONSTRAINT ON (c:TeamChatSession) ASSERT c.id IS UNIQUE;
CREATE CONSTRAINT ON (i:ChatItem) ASSERT i.id IS UNIQUE;
Simplification of Graph Scheme:
(u)-[:CreateChat]->(i)-[:PartOf]->(c)-[:OwnedBy]->(t)
(u)-[:CreatesSession] ->(c)
(u)-[:Joins] ->(c)
(u)-[:Leaves] ->(c)
(u)<-[:Mentioned]-(i)
(i1)-[:ResponseTo]->(i2)
(u1)-[:InteractsWith]->(u2)
Final considerations
I know Iβve written a lot but I wanted to properly provide a feedback on your help and thoughts. And hopefully help others that might have the same doubt.
Thanks again!!!!!

.