I discovered that Directed Relationship Index Seek does not work in optional match circumstances properly.
Neo4j Version: 4.4.4
Operating System: Ubuntu 20.04 / Docker
API: Cypher
Steps to reproduce
-
Add test data
-
MERGE (ip:IP{val: '192.168.1.1'}) - [open:Open] -> (port:Port{val: '22'}) - [bind:Bind{ip: ip.val}] -> (service:Service{val: 'ssh'}); MERGE (ip:IP{val: '192.168.1.2'}) - [open:Open] -> (port:Port{val: '80'}) - [bind:Bind{ip: ip.val}] -> (service:Service{val: 'http'}); MERGE (ip:IP{val: '192.168.1.3'}) - [open:Open] -> (port:Port{val: '443'}) - [bind:Bind{ip: ip.val}] -> (service:Service{val: 'https'}); MERGE (ip:IP{val: '192.168.1.6'}) - [open:Open] -> (port:Port{val: '5432'}) - [bind:Bind{ip: ip.val}] -> (service:Service{val: 'postgresql'}); MERGE (ip:IP{val: '192.168.1.4'}) - [open:Open] -> (port:Port{val: '6379'}) - [bind:Bind{ip: ip.val}] -> (service:Service{val: 'redis'}); CREATE INDEX index_bind_ip IF NOT EXISTS FOR () - [bind:Bind] - () ON (bind.ip); CREATE CONSTRAINT unique_index_ip_val IF NOT EXISTS FOR (n:ip) REQUIRE n.val IS UNIQUE; CREATE CONSTRAINT unique_index_port_val IF NOT EXISTS FOR (n:port) REQUIRE n.val IS UNIQUE; CREATE CONSTRAINT unique_index_service_val IF NOT EXISTS FOR (n:service) REQUIRE n.val IS UNIQUE; -
Run a cypher query
-
MATCH (ip:IP) - [open:Open] -> (port:Port) OPTIONAL MATCH (port) - [bind:Bind] -> (service:Service) where bind.ip = ip.val RETURN count(1) -
Above cypher script executes very slow when the dataset is huge, but it should be executed more efficient because i add indexes on the all nodes and relationships in the first step, after profiling above cypher script via
profile ..., i discovered that the index ofBindrelationship on the propertyipnamedindex_bind_ipdoes not work, following svg image is the corresponding execution plan. -
via
using index ...force neo4j to use relationship index: -
PROFILE MATCH (ip:IP) - [open:Open] -> (port:Port) OPTIONAL MATCH (port) - [bind:Bind] -> (service:Service) using index bind:Bind(ip) where bind.ip = ip.val RETURN count(1)
it complains following errors:
Failed to fulfil the hints of the query.
Could not solve these hints: `USING INDEX bind:Bind(ip)`
Then I replaced ip.val with "", execution plan is divided into two parts:
PROFILE MATCH (ip:IP) - [open:Open] -> (port:Port)
OPTIONAL MATCH (port) - [bind:Bind] -> (service:Service) using index bind:Bind(ip) where bind.ip = ""
RETURN count(1)
left branch cannot find ip.val in right branch, so it complains.
- I had searched relational documentation about
query tuningat https://neo4j.com/docs/cypher-manual/current/query-tuning/using/, i know that neo4j would setstarting pointfor eachusing index ..., then multiple branches were executed parallelly, butbind.ipshould refer toip.valwhileipis in another branch afterusing index ..., so original logic is broken, i want to know if there is any solution(or workaround) to solve above conflicts.

