WITH .. WHERE exists((e1)--(e2)) takes too much time(To check any relationship exists between nodes)

glilienfield · July 26, 2022, 8:05am

I reformatted the query so I can understand the predicate. Adding line feeds, I got the following:

MATCH(subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity), (subject)-[]->(subjectProp) 
where (
(adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")) 
or 
(adjective.name = "Watercolor Painting") 
or 
(adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")) 
or 
(adjective.name = "Watercolor Painting")
) 
RETURN DISTINCT subject SKIP 0 LIMIT 10

Is this what you intended? You have the same predicate repeated twice in the 'where' clause. Also, you have one condition that has no constraint on the 'subjectProp' value, thus you are probably getting a lot of rows when the adjective.name = 'Watercolor Painting.' Definitely remove the redundant predicates, as the query plan was more complex with the extra filtering.

MATCH(subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity), (subject)-[]->(subjectProp) 
where (
(adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")) 
or 
(adjective.name = "Watercolor Painting") 
) 
RETURN DISTINCT subject SKIP 0 LIMIT 10

cuneyttyler · July 26, 2022, 9:02am

I'm sorry I should have organize the query. Although the repeated part is situational and it changes according to the given user search input. I ran the second query you provided with PROFILE keyword and It first does an index seek on Adjective. After that it expands first subject--adjective match, this results in 80k rows. After that, it expands subject--subjectProp relationship and it results in 3 Million rows which means each Entity have about 40 subjectProp and it really is so. Finally it applies - I attached the screenshot to it. It results in 6 Million rows and I didn't quite understand what it's doing there. That's why the query is expensive. Although I'd have expected that because it needs to expand every Entity so that it can check if they have the desired subjectProp.name. I'm not sure if there is some way to efficiently run this query - this is simply 'filter nodes by relationship'

Screenshot from 2022-07-26 12-10-11.png

EDIT: What I'm thinking about is that I need to reorganize my data to have a property for each Entity which contains all relationship names and values in it - and I need to create a text index on it. So when I search for subjectProp.name and subjectProp.value - I don't match that subjectProp but simply do an index search on that property I created. By this way, the query won't expand to these millions of rows. I hope a TEXT index would be enough rather than FULL TEXT index.

glilienfield · July 26, 2022, 1:41pm

The second or clause does not have a constrain on the subjectProp variable and the query returns just the subject, not the subjectProp. As a result, you can remove the '(subject)-->(subjectProp)' match for this scenario. It will eliminate expanding the results from the first match pattern for the second scenario. Is there a reason to include this pattern considering you are not using the subjectProp node? Are you trying to ensure that a relationship from subject to another entity exists, besides the relationship that is matched in the first match? If so, we can implement it to just check for the existence instead of expanding the result set unnecessarily.

Query with the (subject)-->(subjectProp) pattern removed from the query where there is no constrain on subjectProp. I used a 'Union' clause to separate them.

call {
match (subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity), (subject)-[]->(subjectProp)
where adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")
return subject
UNION
match (subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity)
where adjective.name = "Watercolor Painting"
return subject
}
RETURN DISTINCT subject SKIP 0 LIMIT 10

If you do want to ensure that a relationship exists from 'subject' to another entity, other than the entity matched in the (subject)-->(adjective) pattern, then we need to ensure that that exists more than one outgoing relationship from the 'subject' entity.

call {
match (subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity), (subject)-[]->(subjectProp)
where adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")
return subject
UNION
match (subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity)
where adjective.name = "Watercolor Painting"
call {
    with subject
    match (subject)-[]->(subjectProp)
    return count(*) as cnt
}
with subject, cnt
where cnt>1
return subject
}
RETURN DISTINCT subject SKIP 0 LIMIT 10

BTW- do you really want ten records from the query, or is that limit just for test purposes? If you really only want ten, then I would think you could limit the number of records earlier, so the query doesn't generate millions of rows and then keeps only ten.

cuneyttyler · July 26, 2022, 2:31pm

Thanks for the answer. My application is Semantic Search and I have Entities and Relationships defining those Entities. For example I have Paintings(Entity) - Entity nodes which have 'instanceof'(Property:P31) relationship to 'Painting' Entity. When I want to search for only 'paintings' there is no need for subjectProp but when I want to search for 'Watercolor paintings' I need to filter Painting entities having a relationship to anything containing 'Watercolor'. One of the above queries does this. I'm copying it here:

MATCH(subject:Entity)-[:`https://www.wikidata.org/wiki/Property:P31`|`https://www.wikidata.org/wiki/Property:P279`]->(adjective:Entity), (subject)-[]->(subjectProp) 
where (
(adjective.name = "Painting" and (subjectProp.name = "Watercolor" or subjectProp.value =~ "(?i).*Watercolor.*")) 
or 
(adjective.name = "Watercolor Painting") 
) 
RETURN DISTINCT subject SKIP 0 LIMIT 10

The point you make is interesting - that the second 'or' clause not having subjectProp filter. That's a big issue. So the query you provided works in 8 seconds now - the inner 'call' and with..where part is unnecessary for my case.

Maybe this kind of query can only be reduced to 8 seconds (Maybe with a machine with a lot of cores this would decrease). But for a production search app this is too much. How about the solution I mentioned in my previous response's EDIT section. I am creating an additional property for each Entity containing their connected entities' names - and I create an index on that property. This is simply manually creating a search index. How does that sound to you? I'm running this query now and I'll see how it'll perform. It seems to me the only solution for now to execute such a query in a huge graph in approximately 1 secs. Because the last query you provided also have millions of db hits.

About LIMIT, I simply perform paging with SKIP and LIMIT in my web page(SKIP is ommited here).

Topic		Replies	Views
Simple relational query is very slow Neo4j Graph Platform migrated , cypher-tagged	4	243	January 12, 2023
Using match and exists to check if a relationship exists Neo4j Graph Platform migrated	1	412	July 15, 2022
Performance query over millions of relationships Cypher	2	2561	January 31, 2020
Check if node exists - proposal for a new APOC-function Procedures & APOC	10	1360	September 28, 2021
Using Relationship Properties to Filter Other Nodes Neo4j Graph Platform migrated	4	189	February 8, 2023

Get Certified in June!

WITH .. WHERE exists((e1)--(e2)) takes too much time(To check any relationship exists between nodes)

Related topics