Hi Neo4j community,
I have now been trying for a while and reading up on unwind, collect, etc and try to achieve something I have no issues with using Python and running a nested for loop. However when I do this I will perform a similar query many, many times and this is super slow eventually where I belive running this as one single query it might be a lot faster.
My situation is the following. I have got a graph database with Employee nodes whose (artificial) name property is
Department_i_Employee_j with i,j flexibel, i.e. I have
and so on. They also have a deparment property which I guess is not relevant for what I want to do.
What I would like to achieve is having two list, one for the a certain set of departments (giving me the i) and one for a certain number of employees (giving me the maximum j for the respective i) in the respective department and I want to update respective nodes.
So say given the lists [0,1,2] and [4,2,6] I would like to update
As mentioned I eventually like to update them and using a nested for loop in Python and updating each vertex using a separate query works, but takes really long for bigger instances.
I have tried many things using unwind, collect, foreach and just cannot seem to get there, so I would like to do something like this
Hi @pska752 !
First at all, there's a couple of things on your query that are not needed. It should be enough with:
MATCH (e:Employe) WITH COUNT(*)/5 AS deptSize UNWIND range(0,2) as depts UNWIND range(1,deptSize) AS number MATCH (e:Employe) WHERE e.name ="Department_"+toString(depts)+"_Employee_"+toString(number) return e
Second... It work anyway. I have tried with random data w/o problems. Can you try this query and send some screenshots of your data in case it doesn't go well?
Thanks for the quick response. Here are some screenshots of what my DB looks like showing two different vertices from two different departments.
Now what I would like to do is to perform a particular update (for each node the same) to the "first" x% of vertices in each department, i.e. in the query as described above the first 20% in each department.
So if Deparmtment 0 has 10 employees, then the first 2 get updated, if Department 1 has 25 employees, then the first 5 employees and so on.
Neither the query you mentioned
nor the one I mentioned
seem to result in any changes.
I hope this helps. Thanks a lot for your help.
Hi @pska752 !
1. I see that you properties have a "New" prefix already. In that case, your query should look like:
MATCH (e:Employe) WITH COUNT(*)/5 AS deptSize UNWIND range(0,2) as depts UNWIND range(1,deptSize) AS number MATCH (e:Employe) WHERE e.name ="New_Department_"+toString(depts)+"_Employee_"+toString(number) return e
2. Most importantly, Why don't you have some nodes labeled as "Department" with relationships between Employees and Departments? This model could be more graphy.
Thanks for your help and for confirming my point saying "Why don't you have some nodes labeled as "Department" with relationships between Employees and Departments? This model could be more graphy."
This is exactly my intention as I conduct experiments to evaluate update efficiency in different models for the same data as what you described by introducing Department nodes is eventually the goal.
The query still does not seem to work and not noticing the missing the "New" prefix was a silly mistake on my end. I also noticed that Employe in the MATCH was missing another "e" at the end. Yet still not working.
I will double check for any typos, etc. and see if I can get it to work.
Thanks for all your help,
Hey guys. From what I read from your requirement, you want to give a set of dept numbers and corresponding max number of employees to update. Your example has as inputs two lists, one representing the department numbers and one representing each departments max number of employes. Instead of two separate lists as you specified, I used a list of ordered pairs, where each pair represents the department number and its max number of employee. The following query should do what you want based on this change:
with [[0,4], [1,2], [2,6]] as updates unwind updates as update with update as dept, update as maxEmp with dept, range(1, maxEmp) as empNumbers unwind empNumbers as empNumber MATCH (e:Employee) WHERE e.name ="Department_"+toString(dept)+"_Employee_"+toString(empNum) SET e.newProperty = 'new'
Is this what you are looking for? Your original query is a little strange, as it is calculating the department size based on the property e.department, then it is trying to correlate those values to 'Department_i' in the e.name property. I don't see how to relate them, so you can assign a size to the number of employees.
Thanks a lot and yes, this is pretty close to what I want to do.
Now the thing is that I don't have those two lists explicitly given.
The first list [0,1,2] will actually be all departments that are present which is easy to get and the second list will depend on the department sizes for example only (the "first") 20% of employees in each department will be updated. For this I would look how many employee nodes have the property department = 'Department_0' which gives me the first entry in the second list and so on.
If I then combine these two lists to get a list of lists (of length 2) like you did then it should work.
I will try that and keep you posted.
Thanks a lot for your help.
I got it, so let's work on the first part and then merge into one query. My follow up question is 'how does the department property correlate to the number in the name property?' We need to derive a department number and its corresponding size.
If there is no relationship, can we derive it by determining the number of distinct departments and then assuming each is identified with a number? The problem with this is if a department is deleted, will its number be reused? Or, is deleting a department a non-issue?
I hope I understand your question correctly.
I am aware of the fact that it would make more sense to have vertices labelled Department with names Department_0, etc. and edges between employees and departments. I have this inefficient graph model to compare queries between this and the more efficient model with the Department vertices.
Now as far as this model is concerned the property department tells me which department the employee is in. The fact that 'Department' also appears as part of the name string has got nothing necessarily to do with the department they are in, it was only because when artificially creating new employees I tried to keep their names unique, so I don't have something like 'Employee_1' several times, always with a different value for the property department.
I basically tried to avoid having lots of 'John Smith' and 'Jane Does' 🙂
I hope this makes it clearer what I tried to achieve here.
Still trying to get my head around the query you posted above and how to adjust it.
Sorry, I honestly don't understand. Is this mock data used for testing purposes, as why do you have some much data varying by index and I see no real data in your screenshots?
Anyway, I just made something up that may help you adapt it to your situation. The following query just counts the number of employees per 'department' and then creates a numbered list with a department size assigned to each number. The department numbers and corresponding sizes don't correlate to the 'department_i' values, but we never determined that linkage.
match (e:Employee) with e.department AS dep, COUNT(*) AS size with collect(size) as deptSizes with deptSizes, range(0, size(deptSizes)-1) as deptNumbers unwind deptNumbers as deptNumber with deptNumber, deptSizes[deptNumber] as deptSize return deptNumber, deptSize
We can use the above query as the list of departments and max size in the earlier query. The following query is a merge of the two queries, so we first calculate the ordered pairs (deptNo, deputize) and then use in the second part of the query to match and update.
match (e:Employee) with e.department AS dep, COUNT(*) AS size with collect(size) as deptSizes with deptSizes, range(0, size(deptSizes)-1) as deptNumbers unwind deptNumbers as deptNumber with deptNumber, deptSizes[deptNumber] as deptSize with deptNumber, range(1, deptSize) as empNumbers unwind empNumbers as empNumber MATCH (e:Employee) WHERE e.name ="Department_"+toString(deptNumber)+"_Employee_"+toString(empNumber) SET e.newProperty = 'new'
If you want to update only 20% of the employees per department, multiple the count(*) by the fraction, i.e. '0.20*count(*)'
Hope this helps....
Apologies for the long delay. Thank you again for all the help and pointing me in the right direction.
I eventually managed to make it work with the query
MATCH (e:Employee) WITH e.department AS dep, COUNT(*) AS size, toInteger(FLOOR(0.5*COUNT(*))) AS fraction UNWIND RANGE(0,fraction) AS numbers MATCH (e:Employee) WHERE e.department = dep AND e.name = 'New_' + toString(dep) + '_Employee_' + toString(numbers) RETURN e
All the help to finally get there has been much appreciated 🙂