Query optimization

bratanic_tomaz · December 16, 2020, 3:18pm

I have the following query:

MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"] 
WITH drug,r1,c2,r2,covid order by (r1.freq+r2.freq) desc limit 20
CALL {
    WITH drug,r1,c2
    match (c_xy)<-[r_in_xy:IS_IN]-(s_xy)<-[r_extr_xy:Extracted_From]-(i_xy:Instance {predicate: type(r1)})-[r_sub_x:Inst_Subject]->(drug),
      (i_xy)-[r_obj_y:Inst_Object]->(c2)
    WITH c_xy, s_xy
    LIMIT 3
    RETURN collect("PMID:"+c_xy.pmid+"-"+c_xy.pyear+": "+s_xy.sentence) as sent_xy
}
CALL {
    WITH c2,r2,covid
    match (c_yz)<-[r_in_yz:IS_IN]-(s_yz)<-[r_extr_yz:Extracted_From]-(i_yz:Instance{predicate:type(r2)})-[r_sub_y:Inst_Subject]->(c2),
      (i_yz)-[r_obj_z:Inst_Object]->(covid)
    WITH c_yz, s_yz
    LIMIT 3
    RETURN collect("PMID:"+c_yz.pmid+"-"+c_yz.pyear+": "+s_yz.sentence) as sent_yz
}
RETURN sent_xy,drug.cui+":"+drug.name+"-"+type(r1)+"-> "+c2.cui+":"+c2.name+"-"+type(r2)+"-> "+covid.cui+":"+covid.name as Drug_Y_Covid,sent_yz;

The query plan is as follows:

My question is how can I optimize this query. I think the two main issues are:

MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"] 
WITH drug,r1,c2,r2,covid order by (r1.freq+r2.freq) desc limit 20

As you can notice I have already used a hint index to start from covid instead of the drug. Are there any further improvements I could make? As of now, the query gets relatively fast from covid to c2, but then it needs to expand on all the relationships of c2 and that is quite slow and then also filter the drug by its cui id. In this case, index doesn't play any role.

The second issue is the filtering of the instance predicate. There are a lot of instance nodes, so this is not really fast. I have tried adding an index on the property, but the query planner doesn't use it. My question is how could I go about optimizing my query, and secondly maybe refactoring my graph for best performance. I was thinking of using labels instead of properties for predicate, but then again filtering by dynamic node label is not as effective.

match (c_xy)<-[r_in_xy:IS_IN]-(s_xy)<-[r_extr_xy:Extracted_From]-(i_xy:Instance {predicate: type(r1)})-[r_sub_x:Inst_Subject]->(drug)

Any help appreciated

david_allen · December 16, 2020, 3:25pm

In the middle column of your query plan, it looks like the database is expanding all nodes off of all possible drugs, which is a problem -- and not using an index. Drug isn't labeled either, which is contributing.

Try rewriting this:

MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"]

To something like this:

MATCH (drug:Drug { cui: "C0031853" })
WITH drug
MATCH (covid:Concept)
WHERE  covid.cui in ["C5203670","C5203676","C5203671"] 
WITH drug, covid
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid)

With the rest as usual. Notice the pattern of matching BOTH endpoints of the path before expanding the path, see if this helps

bratanic_tomaz · December 16, 2020, 3:49pm

david_allen:

MATCH (drug:Drug { cui: "C0031853" })
WITH drug
MATCH (covid:Concept)
WHERE  covid.cui in ["C5203670","C5203676","C5203671"] 
WITH drug, covid
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid)

It doesn't seem to improve performance, it actually makes it worse by almost doubling the db hits...
When looking at the query planner it seems that it starts expanding from the drug rather than the covid, which makes it worse.

I have also tried using

MATCH (drug:Drug { cui: "C0031853" })
WITH drug
MATCH (covid:Concept)
WHERE  covid.cui in ["C5203670","C5203676","C5203671"] 
WITH drug, covid
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid)
USING JOIN ON covid

but it is not obeying me and doesn't start to expand paths from covid, but sticks to the drug.

Topic		Replies	Views
Please help me tune my query Cypher performance , cypher	3	444	September 25, 2021
Need help with query Cypher	1	732	January 20, 2019
Exhaustive query optimization Cypher apoc , querying , performance , cypher , merge	4	345	April 8, 2022
Query Tuning Optional Match Cypher	1	398	March 1, 2020
How to optimize this query Neo4j Graph Platform cypher	1	245	January 4, 2022

Get Certified in June!

Query optimization

Related topics