Multiple matches in a single query vs single match in multiple queries

aman.negi · April 17, 2023, 4:16am

Hey community folks,

I had a doubt in implementing a query to fetch details related to a node.
this my schema :

I want fetch the details of a patent, I wrote a query for that as follows :

match (p:PATENT {app_num : "17909583"})
optional match (p)-[:HAS_GAU]->(gau:GAU)
optional match (p)-[:IS_OF_TYPE]->(app_type:APP_TYPE)
optional match (p)-[:HAS_TERM_DATA]->(td:TERM_DATA)-[:HAS_TERM_ADJUSTMENTS]->(ta:TERM_ADJUSTMENTS)
optional match (p)<-[:IS_ASSOCIATED_TO]-(lf:LAW_FIRM)
optional match (p)<-[:INVENTED]-(i:INVENTOR)
optional match (p)<-[:IS_APPLICANT_OF]-(a:APPLICANT)
optional match (p)<-[:EXAMINED]-(ex:EXAMINER)
optional match (p)<-[:IS_ASSOCIATED_TO_PATENT]-(at:ATTORNEY)
optional match (p)-[:HAS_PRIORITY_CLAIM]->(pc:PRIORITY_CLAIM)
optional match (p)-[:HAS_PROSECUTION]->(txn:PROSECUTION_NODE)
optional match (p)-[:HAS_FILE]->(f)
with properties(p) as bib_data , gau.gau as gau, app_type.type as app_type, td, collect(ta) as adjustments, properties(lf) as law_firm, collect(properties(i)) as inventors, collect(properties(a)) as applicants, properties(ex) as examiner, collect(properties(at)) as attorneys, collect(properties(pc)) as priority_claims, collect(properties(txn)) as transaction_history
return {biblio : bib_data , gau : gau, application_type : app_type, term_history : {term_data : td , adjustments : adjustments},law_firm : law_firm, inventors : inventors, applicants : applicants, examiner : examiner, attorneys : attorneys, priority_claims : priority_claims, transaction_history : transaction_history} as patent

problems with this query:

It results in multiple duplicates in the objects that I 'collect' , I think that is due to the multiple matches and cypher is treating the patent 'p' different for each time . Hence, multiple duplicates(please correct me if I am wrong this my speculation as a beginner)
Also I read that cypher uses its query planner optimizes the query , I don't know how it does that but I read that there should only exist one merge statement per query to have optimal performance , I wonder if that is the case with matches too.

Since I am making this application using python driver of neo4j.
So should I write a query that gives me all the related details through cypher, or should I write multiple cypher queries to fetch different data and then aggregate those results using python ?

Which one would be faster?

End goal is to fetch the details of a patent and then I compare it to a new file of the same patent and then upsert the details that are new with respect to the current patent present in the database

EDIT

this is the query to remove the duplicates that I came up with :

match (p:PATENT {app_num : "17909583"})
with p

optional match (p)-[:HAS_GAU]->(gau:GAU)
with gau.gau as gau, p

optional match (p)-[:IS_OF_TYPE]->(app_type:APP_TYPE)
with app_type.type as app_type, gau, p

optional match (p)-[:HAS_TERM_DATA]->(td:TERM_DATA)-[:HAS_TERM_ADJUSTMENTS]->(ta:TERM_ADJUSTMENTS)
with distinct ta, td, p, app_type, gau
with collect(properties(ta)) as term_adjustments, td, p, app_type, gau


optional match (p)<-[:IS_ASSOCIATED_TO]-(lf:LAW_FIRM)
with properties(lf) as law_firm,  term_adjustments, td, p, app_type, gau

optional match (p)<-[:INVENTED]-(i:INVENTOR)
with distinct i,  law_firm,  term_adjustments, td, p, app_type, gau
with collect(properties(i)) as inventors, term_adjustments, td, p, law_firm,   app_type, gau

optional match (p)<-[:IS_APPLICANT_OF]-(a:APPLICANT)
with distinct a, inventors, term_adjustments, td, p, law_firm,   app_type, gau
with collect(properties(a)) as applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

optional match (p)<-[:EXAMINED]-(ex:EXAMINER)
with properties(ex) as examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

optional match (p)<-[:IS_ASSOCIATED_TO_PATENT]-(at:ATTORNEY)
with distinct at, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau
with collect(properties(at)) as attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

optional match (p)-[:HAS_PRIORITY_CLAIM]->(pc:PRIORITY_CLAIM)
with distinct pc, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau
with collect(properties(pc)) as priority_claims, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

optional match (p)-[:HAS_PROSECUTION]->(txn:PROSECUTION_NODE)
with distinct txn, priority_claims, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau
with collect(properties(txn)) as transaction_history, priority_claims, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

optional match (p)-[:HAS_FILE]->(f:FILE_NODE)
with distinct f, transaction_history, priority_claims, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau
with collect(properties(f)) as file_history, transaction_history, priority_claims, attorneys, examiner, applicants, inventors, term_adjustments, td, p, law_firm, app_type, gau

with properties(p) as bib_data, term_adjustments, td as term_data, app_type, gau, law_firm, inventors, applicants, examiner, attorneys, priority_claims, transaction_history, file_history
return {biblio : bib_data, gau : gau, app_type : app_type, term_history : {term_data : term_data , adjustments : term_adjustments}, law_firm : law_firm, inventors : inventors, applicants : applicants, examiner : examiner, attorneys : attorneys, priority_claims : priority_claims, transaction_history : transaction_history, file_history : file_history} as patent_data

glilienfield · April 17, 2023, 1:26pm

Yes, in your first query you will get duplicates with each successive match when the next match has multiple rows. This is because for each row resulting from the current query, the next query is executed and the and the previous query result appended to the new query’s result. It will get successively worse when you keep chaining queries as you did in your first approach.

It looks like in your second approach you collected the new results so you only get one recorded appended to the row from the previous query.

glilienfield · April 17, 2023, 6:30pm

Here is another approach you can consider. It may be a little easier to understand. It leverages list comprehension.

match (p:PATENT {app_num : "17909583"})
optional match (p)-[:HAS_TERM_DATA]->(td:TERM_DATA)-[:HAS_TERM_ADJUSTMENTS]->(ta:TERM_ADJUSTMENTS)
with p, {term_data: td, adjustments: collect(properties(ta))} as term_history
return {
    biblio : properties(p) , 
    gau : [(p)-[:HAS_GAU]->(gau:GAU)|gau.gau][0], 
    application_type : [(p)-[:IS_OF_TYPE]->(app_type:APP_TYPE)|app_type.type][0], 
    term_history : term_history,
    law_firm : [(p)<-[:IS_ASSOCIATED_TO]-(lf:LAW_FIRM)|properties(lf)][0], 
    inventors : [(p)<-[:INVENTED]-(i:INVENTOR)|properties(i)], 
    applicants : [(p)<-[:IS_APPLICANT_OF]-(a:APPLICANT)|properties(a)], 
    examiner : [(p)<-[:EXAMINED]-(ex:EXAMINER)|properties(ex)][0], 
    attorneys : [(p)<-[:IS_ASSOCIATED_TO_PATENT]-(at:ATTORNEY)|properties(at)], 
    priority_claims : [(p)-[:HAS_PRIORITY_CLAIM]->(pc:PRIORITY_CLAIM)|properties(pc)], 
    transaction_history : [ (p)-[:HAS_PROSECUTION]->(txn:PROSECUTION_NODE)|properties(txn)],
    file: [(p)-[:HAS_FILE]->(f)|f.title][0]
 } as patent

aman.negi · April 18, 2023, 3:46am

hey @glilienfield this worked like wonders with a few changes, So thanks for that.

I have two things to ask you :

Can you please any reference from where this list comprehension can be learned.
will the query planner know how to optimize this as there are multiple patterns going on in this query , will it be faster if I wrote different queries for each pattern and then aggregate the results using python ?

TIA,
Aman

glilienfield · April 18, 2023, 4:26am

Map projection is also very helpful:

I am not sure about the planner. I will have to look at the query plan to see how it plans the query.

You certainly could do this in python, but I would do all the queries in a single work function.

aman.negi · April 18, 2023, 5:30am

Okay Thank you so much and if there is anything that you could share to see how to analyse a query plan, then it would be of great help.
(I am quite a beginner to all tech and stuff actually)

TIA,
Aman

Topic		Replies	Views
Cypher Query for multiple nodes and relationship selection Cypher performance , cypher	2	352	December 10, 2021
How does this work - Multiple MATCH/OPTIONAL MATCH statements with WHERE Graph Academy	2	541	August 4, 2021
Cypher Query to get all data of one node Cypher	9	3832	January 9, 2019
Maybe I misunderstood OPTIONAL MATCH. If not there should be something wrong Cypher cypher	4	387	October 18, 2021
Parallel matches from a single initial node Cypher	0	112	March 14, 2022

Multiple matches in a single query vs single match in multiple queries

Related topics