Hi Tony.
I found a way to do what I needed!!!!
First of all, thanks again for taking the time to help me and provide great insights. I really appreciate it!!
This is my graph for one topuser.
For a selected cluster (topuser=522 and direct neighbours=444,770,1320,1137,192), I needed to calculate a “clustering coefficient”, which should be total existing relationships between this cluster (“links” variable in my code = 12 in this cluster) divided by the total of possible relationships (for 5 direct neighbours, there are “5*(5-1)=20 ones”).
At per my first post, I expected to count the number of unique pair of node.ids. To make it easier to see what I did I make a list of attempts below.
MATCH (u1 {id:522})-[r1:InteractsWith]-(u2)
WITH collect(distinct u2.id) as neighbours
MATCH (u2)-[r2:InteractsWith]-(u3)
WHERE u2.id in neighbours AND u3.id in neighbours
RETURN .....
Attempt 101
RETURN u2.id, u3.id
= ╒═══════╤═══════╕
│"u2.id"│"u3.id"│
╞═══════╪═══════╡
│192 │1137 │
├───────┼───────┤
│192 │1137 │
├───────┼───────┤
│444 │1137 │
├───────┼───────┤
│444 │1137 │
├───────┼───────┤
...
├───────┼───────┤
│770 │1137 │
├───────┼───────┤
│770 │1320 │
└───────┴───────┘
*74 rows*
Attempt 102
RETURN [u2.id, u3.id]
= [u2.id, u3.id]
[192, 1137]
[192, 1137]
[444, 1137]
[444, 1137]
...
[770, 1137]
[770, 1320]
*74 rows*
Attempt 103
RETURN distinct [u2.id, u3.id]
= [u2.id, u3.id]
[192, 1137]
[444, 1137]
...
[770, 444]
*14 rows*
Attempt 104
RETURN collect(distinct [u2.id, u3.id])
= [192, 1137], [444, 1137], [444, 1320], [444, 770], [1320, 1137], [1320, 770], [1320, 444], [1137, 444], [1137, 770], [1137, 1320], [1137, 192], [770, 1320], [770, 1137], [770, 444]]
*1 row (with 14 pairs)*
Attempt 1001
RETURN size(collect(distinct [u2.id, u3.id]))
=14
Then, my final code was....
WITH [394,2067,1087,209,554,1627,999,516,461,668] as topusers
UNWIND topusers as topuser
MATCH (u1 {id: topuser})-[r1:InteractsWith]-(u2)
WITH topuser, collect(distinct u2.id) as neighbours, count(distinct u2.id) as neighboursCount
# So far, select a topuser, list the “direct neighbours” and count them.
MATCH (u2)-[r2:InteractsWith]-(u3)
WHERE u2.id in neighbours AND u3.id in neighbours
# So far, consider only the neighbours’ neighbours who are part of the “direct neighbours” list.
WITH topuser, neighboursCount, size(collect(distinct [u2.id, u3.id]))/2+neighboursCount as links
# So far, select all pair of neighbours (nodes’ id), then select distinct pair (to ignore multiple edges between them), collect all unique pair as a list (meaning unique relations between these neighbours), calculate this list size (how many relations there are), divide by two (to remove bi-directional relations,i.e., A-B, B-A must be counted as 1), add number of relations between the topuser and the “direct neighbours”, and finally save the total existing links between the all “direct neighbours” including topuser.
WITH topuser, tofloat(links)/(neighboursCount*(neighboursCount-1)) as coefficient
RETURN topuser, round(100*coefficient)/100 as coefficient order by coefficient desc
# Finally, for each topuser calculate the coefficient with two decimal digits
Labels and scheme
Finally, I do have labels. I just didn’t use any usage in my query. Am I missing something?
Labels:
CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE;
CREATE CONSTRAINT ON (t:Team) ASSERT t.id IS UNIQUE;
CREATE CONSTRAINT ON (c:TeamChatSession) ASSERT c.id IS UNIQUE;
CREATE CONSTRAINT ON (i:ChatItem) ASSERT i.id IS UNIQUE;
Simplification of Graph Scheme:
(u)-[:CreateChat]->(i)-[:PartOf]->(c)-[:OwnedBy]->(t)
(u)-[:CreatesSession] ->(c)
(u)-[:Joins] ->(c)
(u)-[:Leaves] ->(c)
(u)<-[:Mentioned]-(i)
(i1)-[:ResponseTo]->(i2)
(u1)-[:InteractsWith]->(u2)
Final considerations
I know I’ve written a lot but I wanted to properly provide a feedback on your help and thoughts. And hopefully help others that might have the same doubt.
Thanks again!!!!!

.