Getting to a certain label

So I have an identification of node, let say a user id. From this user id, I want to get the subgraph of he is connected to other entities such as other users, or other institution such as bank or company. Let say I want to take the eight-degree separation:

(u { = '32dsf51'} )-[*1..8]-()

This is because of a transaction between two entities is represented as a node.

The question that I want to know is whether we can count the number of unique node label? For example, I want to know how many and what are the bank accounts within the subgraph above?

I assume we use the function Node() but I don't know what to put inside? Is this where we use FOREACH?

When you do something like this, you're matching a path, not a node. On that path (that could be between 1 and 8 hops long) there could be a lot of nodes!

So you want this function:

And you want to use it together with binding the path, like this:

MATCH p=(u { = '32dsf51'} )-[*1..8]-()
RETURN length(p), nodes(p)

If you want to count the unique node labels of everything in nodes(p), this is left as an exercise to the reader. :) But what you want to look into is that nodes(p) returns a list. You'll have lots of paths, so you'll have lots of lists. You'll need to work through those lists and build the unique labels of all of the nodes.

I'll add on a bit.

For getting distinct nodes of a subgraph, APOC Procedures should help you out, notably the path expander proc apoc.path.subgraphNodes(). This uses a special type of expansion behavior that is optimized for finding distinct nodes and otherwise pruning potential paths if we've visited a node previously.

Once we have these nodes we can UNWIND the labels of those nodes and get the count of distinct labels.

Oh, and you should definitely be using labels yourself in your match pattern, as otherwise this will do an all nodes scan to find u, which will hurt the performance of your query (you'll also want an index on the label+id for quick lookup).

So let's assume that we're using the label :Node in your graph (replace it with whatever you're actually using).

The query would be:

MATCH (u:Node {id:'32dsf51'}) // though you'll want to parameterize this
CALL apoc.path.subgraphNodes(u, {maxLevel:8}) YIELD node
WITH node
SKIP 1 // ignore the starting node
UNWIND labels(node) as label
RETURN count(DISTINCT label) as uniqueNodeLabels
1 Like

Thanks, David and Andrew,

I haven't explored enough on APOC, but I will definitely download it and give it a try.

Something that I tried:

match (b:node)
    where = '32dsf51'//total_amount > 5000000
match (b)-[*1..3]-(u)
    collect(CASE WHEN ANY(x IN labels(u) WHERE x='LABEL1') THEN ELSE null END) AS list_of_label1_id, 
    count(*) as num_label1

This gives me the result that I want, but is there a better way of filtering label that is faster than case when?

If you're filtering for only specific, known labels that are hardcoded (in this case just 'LABEL1'), then you can use a list comprehension to do both filtering and extraction (of the node id) at once:

match (b:node)-[*1..3]-(u)  // just do the entire pattern here
    where = '32dsf51'//total_amount > 5000000
    [node in collect(u) WHERE node:Label1 |] AS list_of_label1_id
return list_of_label1_id, size(list_of_label1_id) as num_label1
1 Like