Derive single value based on a property a node B related to node A from direct and indirect relationships

Hello,

I've just started learning Neo4j to do a POC for my current project. I want to have a final value for one of the node called Employee, say certified or not certified based on the direct relation with the Credential node or via the the group certificate when multiple Employee nodes can be grouped for certificates for group activities. Every certificate/credential will have an expiry date, and if any of the certificate gets expired then the employee will be called non-compliant and has to do the certification again. This is how the data model looks like:

I was able to get the list of all employees for a company and their associated certificates(direct/ via group) and mapped them with each employee.

MATCH (c:Company { id: 1000})
WITH c {
  .id,
  .name,
  employees: [(e:Employee)-[:WORKS_AT]->(c) | e {
    .empId,
    .name,
    credentials: [(e)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }],
    egCredentials: [(e)-[:PART_OF]->(:EmployeeGroup)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }]
  }]
}
RETURN
    c;

This would give me data like

---
c
---
{
"id": XXX,
"name": "XYX Company",
"employees": [
     // this employee should be marked as non-compliant 
     // as one the certificate is expired
     {
     "empId": XXXXX,
     "name": "ABC",
     "credentials": [
          {
                "credentialType": "AWS Training",
                "credentialId": XXXYYX,
                "expiration": "2024-06-03",
                 "status": "Compliant"
          },
         {
              "credentialType": "Java Traing",
              "credentialId": XXHAVFDS,
              "expiration": "2022-08-19",
              "status": "Expired"
          },
         // ....
      ],
      "egCredentials": [
            // similar values like above credentials key
       ]
  },
  // ....
]

But what ultimately I wanted was to have a single value based on both the properties (credentials[] & egCredentials[]) of each Employee node in the list.

I have 2 use cases here, one for single employee's compliance check and one for projecting the forecast for current date till the next 11 months where I should get the number of compliant/non-compliant employees by the month.

  • Use case 1, single employee compliance check
    return the employee node with the status as compliant or non-compliant based the validity of all the credentials which are attached to the Employee directly or via EmployeeGroup. If any of the credential from both the relation is expired then the whole status should be non-compliant
{
   "empId": XXXX,
   "name": "ABC",
   "status": "compliant", // or "non-compliant"
   "credentials": [] // should be a merged list if possible from both direct and via EmployeeGroup relation to Credential node
}
  • Use case 2, forecast of compliant employees for the whole year

The first item should be today's date, next items will be from next month's start or end date does not matter as we'll count for whole month and not for any individual date. Meaning, if the credential is expiring on 1st of the month or last of the month; for future forecast we can show that it will be non-compliant in that month.

[
  {
   "month": current-month,
    "compliantEmployeeCount": X,
    "nonCompliantEmployeeCount": X
  },
  {
    "month": current-month + 1,
    "compliantEmployeeCount": X,
    "nonCompliantEmployeeCount": X
  },
  {
    "month": current-month + 2,
    "compliantEmployeeCount": X,
    "nonCompliantEmployeeCount": X
  }
  // till.. current-month + 11
]

We have it in the SQL query we did earlier for our MySQL DB. But that query is pretty much crazy nesting of sub-queries.

Can someone help me here with the cypher query and explain the parts so that I can understand it and explain further to my team? Or if you have some tutorials/videos for reference that would be really helpful.

Thank you,
Hiren

Try this for use case 1:

MATCH (c:Company { id: 1000})
MATCH (e:Employee)-[:WORKS_AT]->(c)
WITH c, e, 
[(e)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] as credentials,
[(e)-[:PART_OF]->(:EmployeeGroup)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] as egCredentials
RETURN c {
  .id,
  .name,
  employees: collect(e {
    .empId,
    .name,
    credentials: credentials,
    egCredentials: egCredentials,
    status: CASE WHEN any(i in credentials + egCredentials where i.status = 'Expired') THEN 'non-compliant' ELSE 'compliant' END 
})
}

I just noticed you wanted the credentials to be merged.

MATCH (c:Company { id: 1000})
MATCH (e:Employee)-[:WORKS_AT]->(c)
WITH c, e, 
[(e)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] + 
[(e)-[:PART_OF]->(:EmployeeGroup)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] as credentials
RETURN c {
  .id,
  .name,
  employees: collect(e {
    .empId,
    .name,
    credentials: credentials,
    status: CASE WHEN any(i in credentials where i.status = 'Expired') THEN 'non-compliant' ELSE 'compliant' END 
})
}

You can try this for use case 2. I did it kinda quick and didn't look to optimize or compact it. See if it works first. I will explain it if it works. I don't have any test data to test it against. It gives you a report for current month and 11 out months. I also leveraged the apoc.coll.min function, so this is dependent on the apoc library.

MATCH (c:Company { id: 1000})
MATCH (e:Employee)-[:WORKS_AT]->(c)
call{
    with e
    return count(*) as total_employees
}
WITH [(e)<-[:ATTACHED_TO]-(cr:Credential) |  date.truncate('month', cr.expiration)] +  
[(e)-[:PART_OF]->(:EmployeeGroup)<-[:ATTACHED_TO]-(cr:Credential) | date.truncate('month', cr.expiration)] as employee_expiration_dates, total_employees
WITH apoc.coll.min(employee_expiration_dates) as min_employee_expiration_date, total_employees
WITH total_employees, collect(min_employee_expiration_date) as all_employees_min_expiration_date, date.truncate('month', date()) as current_month
UNWIND range(0,11) as month_index
WITH total_employees, all_employees_min_expiration_date, current_month + duration({months:month_index}) as out_month
WITH total_employees, all_employees_min_expiration_date,  out_month,
size([i in all_employees_min_expiration_date where i <= out_month | 1]) as num_expired
RETURN collect({
    year: out_month.year,
    month: out_month.month,
    compliantEmployeeCount: total_employees - num_expired,
    nonCompliantEmployeeCount: num_expired
}) as result

I tried this query but this gives an error,

Aggregation column contains implicit grouping expressions. 
For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause.
Illegal expression(s): c, id, name (line 16, column 8 (offset: 606))
"RETURN c {"
        ^

I did not get that error, but implicit grouping is deprecated. Maybe you are using a newer version where feature has been removed.

The implicit grouping is from using 'collect' in the map projection. Try the following refactored version.

MATCH (c:Company { id: 1000})
MATCH (e:Employee)-[:WORKS_AT]->(c)
WITH c, e, 
[(e)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] + 
[(e)-[:PART_OF]->(:EmployeeGroup)<-[:ATTACHED_TO]-(cr:Credential) | cr {
      .credentialId,
      .credentialType,
      .expiration,
      status: CASE WHEN duration.inDays(date(), cr.expiration).days > 0 then 'Compliant' ELSE 'Expired' END
    }] as credentials
WITH c, collect(e {
    .empId,
    .name,
    credentials: credentials,
    status: CASE WHEN any(i in credentials where i.status = 'Expired') THEN 'non-compliant' ELSE 'compliant' END 
}) as employees
RETURN c {
  .id,
  .name,
  employees: employees
}
1 Like

Hi @glilienfield ,
That query worked successfully!. I'm using newer version of Neo4j desktop v1.5.7. I'm still getting used to write complex queries, especially the WITH clause is somewhat confusing as we have to use it to make the variables' available for next set of ops. I will try to understand this and share my findings on this and get back to you.

Thank you so much for your help!

1 Like