cancel
Showing results for 
Search instead for 
Did you mean: 

Nested match query help

pska752
Node

Hi Neo4j community,

 

I have now been trying for a while and reading up on unwind, collect, etc and try to achieve something I have no issues with using Python and running a nested for loop. However when I do this I will perform a similar query many, many times and this is super slow eventually where I belive running this as one single query it might be a lot faster.

 

My situation is the following. I have got a graph database with Employee nodes whose (artificial) name property is

Department_i_Employee_j with i,j flexibel, i.e. I have

Department_0_Employee_1

Department_0_Employee_2

Department_1_Employee_1

and so on. They also have a deparment property which I guess is not relevant for what I want to do.

What I would like to achieve is having two list, one for the a certain set of departments (giving me the i) and one for a certain number of employees (giving me the maximum j for the respective i) in the respective department and I want to update respective nodes.

So say given the lists [0,1,2] and [4,2,6] I would like to update

Department_0_Employee_1

...

Department_0_Employee_4

 

Department_1_Employee_1

Department_1_Employee_2

 

Department_2_Employee_1

...

Department_2_Employee_6

As mentioned I eventually like to update them and using a nested for loop in Python and updating each vertex using a separate query works, but takes really long for bigger instances.

 

I have tried many things using unwind, collect, foreach and just cannot seem to get there, so I would like to do something like this

 

MATCH (e:Employee) 
WITH e.department AS dep, COUNT(*)/5 AS deptSize
UNWIND range(0,2) as depts
UNWIND deptSize as sizes
UNWIND range(1,sizes) AS number
MATCH (e:Employee) WHERE e.name ="Department_"+toString(depts)+"_Employee_"+toString(number)
SET e.newProperty = 'new'
 
however this for example does nothing at all.
 
Any help would be much appreciated and I am sure it is not too complicated and the answer is almost there, but have been trying for a while and reading up on it and just don't quite get there.
 
Thanks a lot for helping
9 REPLIES 9

bennu_neo
Neo4j
Neo4j

Hi @pska752 !

First at all, there's a couple of things on your query that are not needed. It should be enough with:

MATCH (e:Employe) 
WITH COUNT(*)/5 AS deptSize
UNWIND range(0,2) as depts
UNWIND range(1,deptSize) AS number
MATCH (e:Employe) 
WHERE e.name ="Department_"+toString(depts)+"_Employee_"+toString(number)
return e

Second... It work anyway. I have tried with random data w/o problems. Can you try this query and send some screenshots of your data in case it doesn't go well?

 

Oh, y’all wanted a twist, ey?

Hi Bennu,

Thanks for the quick response. Here are some screenshots of what my DB looks like showing two different vertices from two different departments.

 

pska752_0-1655168891990.png

pska752_1-1655168948605.png

Now what I would like to do is to perform a particular update (for each node the same) to the "first" x% of vertices in each department, i.e. in the query as described above the first 20% in each department.

So if Deparmtment 0 has 10 employees, then the first 2 get updated, if Department 1 has 25 employees, then the first 5 employees and so on.

Neither the query you mentioned

pska752_2-1655169139392.png

nor the one I mentioned

pska752_3-1655169178748.png

seem to result in any changes.

I hope this helps. Thanks a lot for your help.

 

Cheers,

Philipp

Hi @pska752 !

2 things.

1. I see that you properties have a "New" prefix already. In that case, your query should look like:

MATCH (e:Employe) 
WITH COUNT(*)/5 AS deptSize
UNWIND range(0,2) as depts
UNWIND range(1,deptSize) AS number
MATCH (e:Employe) 
WHERE e.name ="New_Department_"+toString(depts)+"_Employee_"+toString(number)
return e

2. Most importantly, Why don't you have some nodes labeled as "Department" with relationships between Employees and Departments? This model could be more graphy.

Oh, y’all wanted a twist, ey?

Hi Bennu,

Thanks for your help and for confirming my point saying "Why don't you have some nodes labeled as "Department" with relationships between Employees and Departments? This model could be more graphy."

 

This is exactly my intention as I conduct experiments to evaluate update efficiency in different models for the same data as what you described by introducing Department nodes is eventually the goal.

 

The query still does not seem to work and not noticing the missing the "New" prefix was a silly mistake on my end. I also noticed that Employe in the MATCH was missing another "e" at the end. Yet still not working.

I will double check for any typos, etc. and see if I can get it to work.

Thanks for all your help,

Philipp

glilienfield
Ninja
Ninja

Hey guys. From what I read from your requirement, you want to give a set of dept numbers and corresponding max number of employees to update. Your example has as inputs two lists, one representing the department numbers and one representing each departments max number of employes. Instead of two separate lists as you specified, I used a list of ordered pairs, where each pair represents the department number and its max number of employee. The following query should do what you want based on this change:

with [[0,4], [1,2], [2,6]] as updates
unwind updates as update
with update[0] as dept, update[1] as maxEmp
with dept, range(1, maxEmp) as empNumbers
unwind empNumbers as empNumber
MATCH (e:Employee) WHERE e.name ="Department_"+toString(dept)+"_Employee_"+toString(empNum)
SET e.newProperty = 'new'

 Is this what you are looking for? Your original query is a little strange, as it is calculating the department size based on the property e.department, then it is trying to correlate those values to 'Department_i' in the e.name property. I don't see how to relate them, so you can assign a size to the number of employees. 

Hi,

Thanks a lot and yes, this is pretty close to what I want to do.

Now the thing is that I don't have those two lists explicitly given.

The first list [0,1,2] will actually be all departments that are present which is easy to get and the second list will depend on the department sizes for example only (the "first") 20% of employees in each department will be updated. For this I would look how many employee nodes have the property department = 'Department_0' which gives me the first entry in the second list and so on.

If I then combine these two lists to get a list of lists (of length 2) like you did then it should work.

I will try that and keep you posted.

Thanks a lot for your help.

 

I got it, so let's work on the first part and then merge into one query.  My follow up question is 'how does the department property correlate to the number in the name property?' We need to derive a department number and its corresponding size.

If there is no relationship, can we derive it by determining the number of distinct departments and then assuming each is identified with a number? The problem with this is if a department is deleted, will its number be reused? Or, is deleting a department a non-issue? 

Hi,

I hope I understand your question correctly.

pska752_0-1655862359313.png

I am aware of the fact that it would make more sense to have vertices labelled Department with names Department_0, etc. and edges between employees and departments. I have this inefficient graph model to compare queries between this and the more efficient model with the Department vertices.

Now as far as this model is concerned the property department tells me which department the employee is in. The fact that 'Department' also appears as part of the name string has got nothing necessarily to do with the department they are in, it was only because when artificially creating new employees I tried to keep their names unique, so I don't have something like 'Employee_1' several times, always with a different value for the property department.

I basically tried to avoid having lots of 'John Smith' and 'Jane Does' 🙂

I hope this makes it clearer what I tried to achieve here.

Still trying to get my head around the query you posted above and how to adjust it.

Best,

Philipp

Sorry, I honestly don't understand.  Is this mock data used for testing purposes, as why do you have some much data varying by index and I see no real data in your screenshots?  

Anyway, I just made something up that may help you adapt it to your situation. The following query just counts the number of employees per 'department' and then creates a numbered list with a department size assigned to each number. The department numbers and corresponding sizes don't correlate to the 'department_i' values, but we never determined that linkage. 

match (e:Employee) 
with e.department AS dep, COUNT(*) AS size
with collect(size) as deptSizes
with deptSizes, range(0, size(deptSizes)-1) as deptNumbers
unwind deptNumbers as deptNumber
with deptNumber, deptSizes[deptNumber] as deptSize
return deptNumber, deptSize

We can use the above query as the list of departments and max size in the earlier query.  The following query is a merge of the two queries, so we first calculate the ordered pairs (deptNo, deputize) and then use in the second part of the query to match and update.

match (e:Employee) 
with e.department AS dep, COUNT(*) AS size
with collect(size) as deptSizes
with deptSizes, range(0, size(deptSizes)-1) as deptNumbers
unwind deptNumbers as deptNumber
with deptNumber, deptSizes[deptNumber] as deptSize
with deptNumber, range(1, deptSize) as empNumbers
unwind empNumbers as empNumber
MATCH (e:Employee) WHERE e.name ="Department_"+toString(deptNumber)+"_Employee_"+toString(empNumber)
SET e.newProperty = 'new'

If you want to update only 20% of the employees per department, multiple the count(*) by the fraction, i.e. '0.20*count(*)'

Hope this helps....