Tuning of cypher queries for List operations

I have imported one RDF file in Neo4J using NeoSemantics.
Total no of nodes : 1071461 (1 Million)
Total Relationships : 2553482 (2.5 Million)
When I query this imported graph for traversal queries, I usually get execution time arount 1-100 milliseconds, But when I go for using list in my queries, execution time is quite high ranging from 1.5-5 seconds.

Ex of list query: (having response time within 1.5-5 seconds):

Match (n:Resource) where n.skos__prefLabel ="Anti-Allergic Agents" with n.skos__notation AS dm_notation match (n) where dm_notation IN n.ns3__PA return n.skos__prefLabel

Ex of traversal query (having response time within 1-100 ms):

match (n:Resource{skos__prefLabel:'metabolism'})<-[:rdfs__subClassOf]-(p) return p.skos__prefLabel

Is there any way to bring down execution time of such queries?
Any help would be appreciated, Thanks in advance!
P.S. :
I have added single property indexes for Resource nodes on skos__prefLabel, skos__notation and ns3__PA property.

I think it makes sense that you have quite a big difference in performance here, for a few reasons:

  • Property searches are always going to be slower than label / relationship searches.

  • Your WITH clause specifies a property on the resource, but anything not in your WITH clause won't be ported over. Since you aren't, your MATCH (n) literally does a comparison on every node in your entire graph as n is undeclared at that point. I am unsure if this is intentional or not.

  • Also, you're still at a bad spot because you're taking each row's property and doing a comparison against EVERY other row EACH time. You could probably improve it with something like (I'm tired so someone else please help out):

MATCH (n:MyLabel { myFilterProp: "someValue" })
WITH collect(DISTINCT n.myPropToCheck) AS checkList
MATCH (n:MyLabel) 
WHERE any(prop IN n.listProp WHERE prop IN checkList)

Thank you for the suggestion, it did improve my response time, and came down to 900 ms from 1.5 seconds! But it still is higher than expected execution time. I agree with you on the fact that this will take longer than relationship searches and want to make sure to bring down execution time as much as possible. With RDF4J I get response time for such queries in range 30-100 ms.

MATCH (n:Resource { skos__prefLabel : "Anti-Allergic Agents" })
WITH n.skos__notation AS dm_notation
MATCH (m:owl__Class) WHERE any(prop IN m.ns3__PA WHERE prop=dm_notation)
RETURN m.skos__prefLabel 

Can this query be further optimized? Also does response time depend on memory configuration of neo4j? I have maximum heap size set to default 512MB

Well,

How many elements are usually in this list? Is there any option you turn it into a Label? You can always make relationships to a new node that represent this property-value for the whole model. If you aim for the performance you may need to adjust your into something more Graphish.

Bennu

1 Like