I successfully crated the following cypher query.
there are no errors on the run but my suspicious is that the second unwind is incorrect.
unwind range(0,size(vEmp)-1) as emp_index
MATCH x=(n:Employee{Workforce:'Y'})-[r:MTD2023]->(m:Month)
with collect(n.ID) as vEmp,collect(m.ID) as vMonth,collect(m:SalaryMTD)as vMTD, n.Salary as vSalary, m.DaysMonth as vDaysMonth,m.DaysYear as vDaysYear,(n.Salary/365)*m.DaysMonth as vSalaryMTD, (n.Salary/365)*m.DaysYear as vSalaryYTD
unwind range(0,size(vMonth)-1) as index
unwind range(0,size(vEmp)-1) as emp_index
merge (p:Month{ID:vMonth[index]})
merge (e:Employee{ID:vEmp[emp_index]})
merge (e)-[r:MTD2023]->(p)
set r.MTD2023=vSalaryMTD
Return r LIMIT 50
What is it that you are trying to calculate and update? In summary, you have a match that gives you a collections of rows of employees and months. You then group and collect employees and months together that have the same exact set of values for: n.Salary, vDaysMonth, vDaysYear, vSalaryMTD, and vSalaryYTD. Then you form the Cartesian product between the employee and month elements through the double-unwind, just to update the the relationship between the existing employee and month with the value of vSalaryMTD. I am thinking this can be simplified. I don't have insight into the data, but are you getting vMonth lists that have just one element?
First, Thank you for taking the time. You are correct in the description of the steps.
I was able to resolve the issue by adding call function e.g. call{with }
Can you please let me know how can I run this query more efficient.
Thanks in advance!
MATCH (n:Employee) where n.Workforce='Y'
with n.ID as EmpList,n.Salary as EmpSalary
MATCH (m:Month) where m.Year=2023
with m.DaysMonth as vDaysMonth,m.DaysYear as vDaysYear,collect(m.ID) as vPeriod,collect(EmpList) as EmpID,EmpSalary
call{with EmpSalary,vDaysMonth,vDaysYear
return round((EmpSalary/365)*vDaysMonth) as SalaryMTD, round((EmpSalary/365)*vDaysYear) as SalaryYTD}
With SalaryMTD,SalaryYTD,EmpID,vPeriod,vDaysMonth,vDaysYear,EmpSalary
unwind range(0,size(EmpID)-1) as index
unwind range(0,size(vPeriod)-1) as index2
merge(e:Employee{ID:EmpID[index]})
merge(p:Month{ID:vPeriod[index2]})
merge(e)-[r:Paycheck]->(p)
SET r.MTD2023=SalaryMTD
SET r.Period=vPeriod
SET r.DaysMonth=vDaysMonth
SET r.AnnualSalary=EmpSalary
return e,p,r
What do the relationship properties daysMonths and daysYear represent?
You changed this. You are no longer matching on an existing relationship between employee and month, but still are creating it. Is the purpose to create a relationship between each employee and every month and set the relationship properties based on that months?
I took a stab at it. I think this does the same thing:
match (m:Month) where m.Year=2023
with collect(m) as months
match (n:Employee) where n.Workforce='Y'
unwind months as m
merge(n)-[r:Paycheck]->(m)
SET r.MTD2023=round((n.Salary/365)*m.DaysMonth)
SET r.Period=[i in months | i.id]
SET r.DaysMonth=m.DaysYear
SET r.AnnualSalary=n.Salary
return n, r, m
Sure, they each seem to be attributes of that type of entity. One note though, in a graph database you generally don't store values that refer to another entity; you create a relationship instead. It looks like you are doing something like that with your month's parentNode and nextNode attributes. You should just create relationships to these nodes such as, HAS_PARENT and HAS_NEXT, instead of storing test attributes. Then you can find any month's parent and next node quickly, instead of matching on the other node's primary key.
I got the property wrong. I didn't check your original code first. I used 'id' instead of 'ID' for the month property.
match (m:Month) where m.Year=2023
with collect(m) as months
match (n:Employee) where n.Workforce='Y'
unwind months as m
merge(n)-[r:Paycheck]->(m)
SET r.MTD2023=round((n.Salary/365)*m.DaysMonth)
SET r.Period=[i in months | i.ID]
SET r.DaysMonth=m.DaysYear
SET r.AnnualSalary=n.Salary
return n, r, m
If I may ask one more question please, the query successfully run with 2 employees,
when I try to run a presser test with 40K employees I'm receiving the following error
The allocation of an extra 2.0 MiB would use more than the limit 250.0 MiB. Currently using 248.0 MiB. dbms.memory.transaction.global_max_size threshold reached
My understanding is that due to the unwind of 40,000 rows. I'm using the free version for testing.
Is there workaround?
Maybe it is due to trying to save 40,000 records in a single transaction. You can trying using a 'call subquery in transactions' or the apoc.periodic.iterate equivalent.
He is a refactor of the code to back the updates into transactions of 10,000 records. You can try a smaller number if the memory issues still persists.
match (m:Month) where m.Year=2023
with collect(m) as months
match (n:Employee) where n.Workforce='Y'
call {
with n, months
foreach(m in months |
merge(n)-[r:Paycheck]->(m)
SET r.MTD2023=round((n.Salary/365)*m.DaysMonth)
SET r.Period=[i in months | i.ID]
SET r.DaysMonth=m.DaysYear
SET r.AnnualSalary=n.Salary
)
} in transactions of 10000 rows
I also removed the return statement, as this slows things down and requires memory to buffer the the results. It seems unnecessary when updating 40,000 records.
the query : please note that I replaced Employee node with List
:auto match (m:Month) where m.Year=2023
with collect(m) as months
match (n:List) where n.Workforce='Y'
call {
with n, months
foreach(m in months |
merge(n)-[r:Paycheck]->(m)
SET r.MTD2023=round((n.Salary/365)*m.DaysMonth)
SET r.Period=m.ID
SET r.DaysMonth=m.DaysYear
SET r.AnnualSalary=n.Salary
)
}
in transactions of 10000 rows