cancel
Showing results for 
Search instead for 
Did you mean: 

optional match with multipl nodes

redha_benhisse1
Node Clone

Hi
I would like to make an optiomal match for multiple nodes. My configuration is like follow :

redha_benhisse1_0-1664728893780.png

If I want to do a left join (optimal match) on several nodes that will match the following sql query :

select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from p_lineorder
left join dates on lo_orderdate = d_datekey
left join customer on lo_custkey = c_custkey
left join supplier on lo_suppkey = s_suppkey
left join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;

is it ideal to do like the 1st cypher query or like the 2nd

1st cypher query :

profile match (c:customer) where c.C_REGION starts with "AMERICA"
optional match (c)<-[:order_customer]-(l:lineorder)-[:order_part]->(p:part),(d:date)<-[:order_date]-(l)-[:order_supplier]->(s:supplier)
using join on l
where (p.P_MFGR = "MFGR#2" or p.P_MFGR = "MFGR#2")
and s.S_REGION starts with "AMERICA"
return d.D_YEAR, c.C_NATION, sum(l.LO_REVENUE) as revenu, sum(l.LO_SUPPLYCOST) as supplycost
ORDER BY d.D_YEAR, c.C_NATION;

2nd cypher query :

profile match (c:customer) where c.C_REGION starts with "AMERICA"
optional match (c)<-[:order_customer]-(l:lineorder)-[:order_part]->(p:part)
optional match (d:date)<-[:order_date]-(l)-[:order_supplier]->(s:supplier)
using join on l
where (p.P_MFGR = "MFGR#2" or p.P_MFGR = "MFGR#2")
and s.S_REGION starts with "AMERICA"
return d.D_YEAR, c.C_NATION, sum(l.LO_REVENUE) as revenu, sum(l.LO_SUPPLYCOST) as supplycost
ORDER BY d.D_YEAR, c.C_NATION;

Should I use one optimal match with a semi-colon in the middle or two optimal matches (or some other formulation of the query?!). If there were linear nodes the problem does not arise, but as there is a star structure with a central node, the query is not clear.

Thank you in advance

2 REPLIES 2

glilienfield
Ninja
Ninja

My understand is they will produce the same results, except for one scenario in which they are different. Neo4j will not traverse the same relationship twice when matching a pattern. As such, if you have two patterns that share a common relationship, then the results will be different. As an example, consider the following path:

create(a:Node{id:0})-[:REL]->(b:Node{id:1})-[:REL]->(c:Node{id:2})

The following query returns the results below. It results in one path: a--b--c

match(a:Node{id:0})--(b) , (b)--(c) return a, b, c

 Screen Shot 2022-10-02 at 9.02.01 PM.png

In contrast, the following query returns the results below. It results in two paths: a--b--c and a--b--a. The difference is there are two possibilities for the second match associated with the first match: b--c and b--a. Since the second match is independent of the first, the second match can traverse back towards 'a' from 'b'.  This is not allowed when the two match patterns are on the same match. 

match(a:Node{id:0})--(b:Node) 
match (b)--(c) 
return a, b, c

 Screen Shot 2022-10-02 at 9.07.31 PM.png

As a note, this should only be possible when specifying undirected match patterns, as these are. Providing a direction would not allow the reverse transversal. Modifying the second query with direction criteria eliminates the second match.

match(a:Node{id:0})-->(b:Node) 
match (b)-->(c) 
return a, b, c

Screen Shot 2022-10-02 at 9.14.27 PM.png

I would think the two forms are equivalent in your case, since you have directed relationships. Do you get the same results with each form? 

Does each line order have just one customer, one part, one date, and one supplier? If not, you will get the Cartesian product of the sequential matches. 

In your query, the first match will give you all the line order nodes associated with the customer criteria. If there is only one part per lineorder, then there will be one row per lineorder since the part has only one relationship to lineorder.  The second match will result in one row for each lineorder resulting from the first match, since, there is only one date and one supplier associated with the lineorder. 

thanks

Indeed, in my case, each lineorder is associated with one customer, one part and one date and my relationships are directed as shown in the following diagram:

redha_benhisse1_0-1664784045226.png

In any case, your answer was very clear.

In my configuration, if the nodes were on the same path, the question does not arise but as they are not on the same path but all exit from the central lineorder node, I had to make a match between 3 and add a 4th afterwards but I wanted to find the optimal request