How to get query plan and how to create and use an index on id?

jayxu688 · August 2, 2020, 12:48pm

Hi,
I am new user of neo4j. I am using neo4j 3.5 now.

My first question is how to show the query plan like this, which can show the performance data clearer.

Compiler CYPHER 4.1

Planner COST

Runtime INTERPRETED

Runtime version 4.1

+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| Operator        | Details                                                                                         | Estimated Rows | Rows | DB Hits | Page Cache Hits | Page Cache Misses | Page Cache Hit Ratio |
+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| +ProduceResults | person                                                                                          |              1 |    1 |       0 |               0 |                 0 |               0.0000 |
| |               +-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| +NodeIndexSeek  | person:Person(firstname, surname) WHERE firstname STARTS WITH $autostring_0 AND exists(surname) |              1 |    1 |       2 |               0 |                 0 |               0.0000 |
+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+

Total database accesses: 2, total allocated memory: 0

Second question: I create an index on id, but it seems not working.

CREATE INDEX ON :Artifact(id)

Then I execute this cypher:

explain Match (some:Artifact)-[:DEPEND_ON*]->(a:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"}) where not exists(()-[:DEPEND_ON]->(some)) and 0<ID(some)<1000  return distinct some.gav

The result is shown below:

anthapu · August 2, 2020, 3:08pm

You have to create index on property not on ID. The internal graph id "ID(some)" does not need an index.

Also, it is not good idea to use the ID's as part of your queries like this, as node id's may not be as sequential for a node as you think.

You need index on "gav" property here.

CREATE INDEX ON :Artifact(gav)

This will use index.

jayxu688 · August 2, 2020, 4:40pm

Thanks.

The requirement is to query all the Artifact nodes that directly or indirectly depend on the Artifact with gav:org.slf4j:slf4j-api:1.7.21 .

This graph is built up from a Maven Central Repository (which is a java library repository). And the org.slf4j:slf4j-api:1.7.21 Artifact has a tons of other Artifacts depend on it either directly or indirectly.

The simplest data model is like this:

My solution for the requirement is like this:
Step 1: Find all the Artifact nodes which don't have other Artifacts depend on them. Cause if they have, it makes these nodes also depend on org.slf4j:slf4j-api:1.7.21. get result a.

Step 2: Filter the result a to get which of them truly depend_on* org.slf4j:slf4j-api:1.7.21. get result b.
Cyher

Match (some:Artifact) where 0<id(some)<1000 and not exists(()-[:DEPEND_ON]->(some))with collect(some) as col
Match (a:Artifact) where id(a)=179110  with a
FOREACH (n IN col| match p=shortestpath((n)-[:DEPEND_ON*]->(a))
return case p when p then n end AS result)

result with below:

Step 3: Iterate the result b to get the distinct nodes in all the paths between every nodes in result b to org.slf4j:slf4j-api:1.7.21. That's the final result for the requirement.
e.g.

match (a:Artifact {gav:"net.wessendorf.kafka:kafka-cdi-extension:0.0.9"}) ,(b:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"}),p=allshortestpaths((a)-[:DEPEND_ON*]->(b))  return p

graph122

I also tried skip...limit to implement pagination before, But when the skip number goes up, the query tended to extremely slow.

MATCH (a:Artifact {gav: "org.slf4j:slf4j-api:1.7.21"})<-[:DEPEND_ON*]-(some)
where a<>some return distinct some skip n limit m

I would like to ask if there is any other better way to do this work.
Or how can I make my cypher above executable?

andrew_bowman · August 4, 2020, 6:09pm

You need to understand that WITH changes what variables are in scope. Only the variables you include will stay in scope, any others will be left out.

So for the query that's erroring out, the problem is this: with a. Change it to with a, col so col remains in scope.

Also, you can't use MATCH inside a FOREACH (only write clauses are allowed), so you'll need to use UNWIND on col instead.

jayxu688 · August 4, 2020, 6:21pm

Thanks, Andrew.
So, how the correct cypher should be?
I am a new user of Neo4j.
Thank for your help.

andrew_bowman · August 4, 2020, 6:25pm

This compiles:

MATCH (some:Artifact) 
WHERE 0 < id(some) < 1000 and not ()-[:DEPEND_ON]->(some)
WITH collect(some) as col
MATCH (a:Artifact) 
WHERE id(a)=179110  
WITH a, col
UNWIND col as n
MATCH p  = shortestpath((n)-[:DEPEND_ON*]->(a))
RETURN CASE p WHEN p THEN n END AS result

Though I'm not sure it will do what you want. I'm not quite sure what you're intending by that RETURN.

jayxu688 · August 4, 2020, 6:48pm

The requirement is to query all the Artifact nodes that directly or indirectly depend on the Artifact with property gav: org.slf4j:slf4j-api:1.7.21 .and its id is 179110

I want to get all the nodes like the diagram below.

This graph is built up from a Maven Central Repository (which is a java library repository). And the org.slf4j:slf4j-api:1.7.21 Artifact has a tons of other Artifacts depend on it either directly or indirectly.

andrew_bowman · August 4, 2020, 8:27pm

Do you need paths to each of these nodes, or do you just need the nodes?

You also did some pre-matching based on ids. Is that still needed, or is it enough to get all connected :Artifact nodes that don't have anything depending on them?

jayxu688 · August 4, 2020, 9:22pm

Hi, Andrew，
I want to get all the :Artifact nodes related to the target node. Not only the nodes that don't have anything depending on them but also these nodes in the paths the start nodes to target nodes in this diagram, excluding the target node.

In a word, all the :Artifact nodes depending on the target node either directly or indirectly.

And the pre-matching is not necessary.
Do I make this clear?

andrew_bowman · August 4, 2020, 10:39pm

That sounds like it should be a fairly simple query, so let me know if there's something I've overlooked:

MATCH (a:Artifact)<-[:DEPEND_ON*]-(n:Artifact)
RETURN DISTINCT n

On a larger graph it may be easier to use APOC path finder procs:

MATCH (a:Artifact)
CALL apoc.path.subgraphNodes(a, {relationshipFilter:'<DEPEND_ON', labelFilter:'>Artifact'}) YIELD node
RETURN node

jayxu688 · August 4, 2020, 11:24pm

Hi, Andrew
Thank you very much.
You save my day.

I tried apoc before, but I used apoc.path.expand...That was the wrong direction.
I tried your cypher in this way:

MATCH (a:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"})
CALL apoc.path.subgraphNodes(a, {relationshipFilter:'<DEPEND_ON', labelFilter:'>Artifact'}) YIELD node
RETURN node

And it's quite efficient:
Started streaming 164139 records after 6 ms and completed after 9674 ms, displaying first 1000 rows.

One more question:
How can I get the execution time of cypher and the memory allocated on the neo4j desktop?
Or are there any other ways to get these data?

I used profile match ..., but cannot get the total allocated memory.
I want to get the information like below. My neo4j version is 3.5.15

Compiler CYPHER 4.1

Planner COST

Runtime INTERPRETED

Runtime version 4.1

+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| Operator          | Details                                           | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits | Page Cache Misses | Page Cache Hit Ratio | Order      |
+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +ProduceResults   | `p.name`, `count(m)`                              |             13 |  102 |       0 |                |               0 |                 0 |               0.0000 | p.name ASC |
| |                 +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Sort             | `p.name` ASC                                      |             13 |  102 |       0 |          22048 |               0 |                 0 |               0.0000 | p.name ASC |
| |                 +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +EagerAggregation | cache[p.name] AS `p.name`, count(m) AS `count(m)` |             13 |  102 |       0 |          13768 |               0 |                 0 |               0.0000 |            |
| |                 +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Filter           | m:Movie                                           |            172 |  172 |     172 |                |               0 |                 0 |               0.0000 |            |
| |                 +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Expand(All)      | (p)-[anon_17:ACTED_IN]->(m)                       |            172 |  172 |     297 |                |               0 |                 0 |               0.0000 |            |
| |                 +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +NodeIndexScan    | p:Person(name) WHERE exists(name), cache[p.name]  |            125 |  125 |     126 |                |               0 |                 0 |               0.0000 |            |
+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+

Total database accesses: 595, total allocated memory: 32672

Topic		Replies	Views
Optimizing a very simple (but expensive) query that takes hours to complete Cypher	6	1275	June 11, 2020
Query running so long and not returning any output Cypher performance , cypher	12	1468	May 23, 2020
Extremely slow query when profile looks very good? Cypher	11	4081	October 3, 2019
Cypher query execution is not effectively Cypher performance , cypher	2	320	July 27, 2021
Query all the paths to one node with one kind of relationship neo4j Cypher cypher	9	4525	September 9, 2020

July Summer Fun!

How to get query plan and how to create and use an index on id?

Related topics