Query slow when relationship depth is 2

Hi, guys!

I have below nodes and relationships in my graph:

(c:Contact {tel: properties.tel})
(r:Role {roleId: properties.roleId})
(c:Contact)-[ri:ROLE_IS]->(r:Role)
(c1:Contact)-[htc:HAS_TEL_CONTACT]->(c2:Contact)

There are about 0.1 billion nodes and 0.2 billion relationships in the graph.

I want to find out the contacts of contacts' roles, which means the depth of HAS_TEL_CONTACT relationship is 2. Then I write below CQL:.

MATCH (:Contact {tel: 'xxxx'})-[:HAS_TEL_CONTACT]->(:Contact)-[:HAS_TEL_CONTACT]->(:Contact)-[:ROLE_IS]->(r:Role)
WITH DISTINCT r
RETURN r.roleId;

But I find the query is too slow to me. I profile the query, the result is below:

It takes about 2 mins to finish, which is much slower than using MySQL to achieve the same function.
My neo4j is community version v3.5.4 . The CPU is of 8 cores. The initial and max heap size is 16G.

Can I do any improvement on the CQL or neo4j configuration to make the query faster?

Thanks!

I'm also very beginner to CQL. But how about to query like?

MATCH(c1:Contact)
MATCH(c2:Contact)
MATCH(c3:Contact)
MATCH(r:Role)
MERGE(c1)-[htc:HAS_TEL_CONTACT]->(c2)-[htc:HAS_TEL_CONTACT]->(c3)-[:ROLE_IS]->(r)
WITH DISTINCT r
RETURN r.roleId

However, I'm not pretty sure if the syntax I have written is fine. I hope, you'll find a hint. And also let me know if this works for you.

Thanks,
Bhojendra

Try this query:

MATCH (cr:Contact)-[:ROLE_IS]->(r:Role)
WITH COLLECT(r) as r1, COLLECT(cr) as cr1
UNWIND r1 as r2
UNWIND cr1 as cr2

MATCH (:Contact {tel: 'xxxx'})-[:HAS_TEL_CONTACT]->(:Contact)-[:HAS_TEL_CONTACT]->(cr2)-[:ROLE_IS]->(r2)
WITH DISTINCT r2
RETURN r2.roleid;

It not works. And I don't need MERGE. I just need to MATCH.

Thanks for your reply! The query is also very slow, even slower than mine. It seems the first MATCH takes too much time.

Let's see the count of distinct contacts vs non-distinct. Try running each of these, noting both the number of results and the time taken:

MATCH (:Contact {tel: 'xxxx'})-[:HAS_TEL_CONTACT*2]->(c:Contact)
RETURN count(c) as count

and

MATCH (:Contact {tel: 'xxxx'})-[:HAS_TEL_CONTACT*2]->(c:Contact)
RETURN count(DISTINCT c) as distinctCount

I pick a tel in my graph randomly.
The first 'without distinct' query takes 3233 ms. The count is 46156.
The second 'with distinct' query takes 144 ms, which is very fast. While, I think it is caused by the cache is hit. The count is 35296.

Did you create index on the node using that property? This will your query faster.

I suggest to use explain to understand how many nodes are being computed.

Yes, I have create index on "tel" property and "roleId" property. You can see the computed node number in the profile result pic .