How to aggregate responses to request during a specific time period


(Mail2michaelennis) #1

My data consist of a series of request over time [:REQUEST] and responses [:RESPONSE] each with respective created date. I need only the responses within the date range I get requests. So I only want the responses to the request I got today even though I may have made responses to older request today also....e.g. I got 10 request over the past week....5 over the past two days. I responded to all 10.I need to know how many I got over the past two days and how many of THOSE I responded to. I have tried various variation of this Cypher with inconsistent results:

<MATCH (d:Dealer {email: $email})-[s:SUPPLY]->(m)
WITH d, COLLECT(toLower( AS makeList
WHERE apoc.coll.contains(makeList,toLower(r.makeVehicle))
AND r.createdAt >= $startDate AND r.createdAt <= $endDate
**above cypher collects request received **

WITH d,r, COUNT(r) AS Requests ** I need to use this count() later**
MATCH (d)-[resp:RESPONSE]->(r) I only want to look at responses to request in period above
WITH d,resp,r,Requests
MATCH (d)<-[order:ORDER]-(r) I now need to capture orders to only to those responses
WITH d,r,order, resp, COLLECT(order.paymentType) AS PaymentTypes I need a list of paymentTypes
RETURN toInteger(COUNT(Requests)), toInteger(COUNT(resp)), toInteger(COUNT(order)), PaymentTypes> I need a list of the paymentTypes

How can I create a cypher to get the result I need?

(Andrew Bowman) #2

Hi there!

So there are a few suggestions you can apply here.

First, make sure you have the following indexes in place if you don't already:

I'm not sure you need the apoc contains function call here, you may want to see if replacing this with: WHERE toLower(r.makeVehicle) in makeList

Your last segment is a little confusing.

If you mean for Requests to be the count of all :Requests you matched to just previously, then there's a small problem here. In Cypher, aggregations are with respect to the non-aggregation variables, which become the grouping key. As such, having r in the WITH clause along with COUNT(r) is a problem, as for each row here you will get a single request and the count of that single request, which will always be 1. In order to get the count yet keep the r nodes, you'll need to aggregate the r nodes, get the size of the collection, then UNWIND back to rows:

WITH d, collect(r) as allRequests
WITH d, allRequests, size(allRequests) as Requests
UNWIND allRequests as r

It looks like at the end what you're looking for are aggregations of counts for the various nodes as well as the distinct payment types. You're running into another aggregation issue with collecting order.paymentType when order is present in the WITH clause, that will collect with respect to each order rather than collecting payment types across all orders. This query with suggested changes may work better for you:

MATCH (d:Dealer {email: $email})-[:SUPPLY]->(m)
WITH d, COLLECT(toLower( AS makeList
MATCH (r:Request)
WHERE toLower(r.makeVehicle) in makeList
AND $startDate <= r.createdAt <= $endDate

WITH d, r, size((d)-[:RESPONSE]->(r)) as requestResponses // for each request, the number of responses
WITH d, collect(r) as allRequests, sum(requestResponses) as responseCount // sum all response counts
WITH d, responseCount, allRequests, size(allRequests) as requestCount
UNWIND allRequests as r

MATCH (d)<-[order:ORDER]-(r) 
RETURN d, responseCount, requestCount, count(order) as orderCount, collect(DISTINCT order.paymentType) as paymentTypes

(Mail2michaelennis) #3

Thanks Andrew....I will test and let you know if this works

(Mail2michaelennis) #4

@andrew.bowman for the most part this worked. Since I need the list of paymentTypes for all orders I removed the 'DISTINCT'. However, I still have one of the inconsistencies....If anyone of the actions (request,response or orders) is zero the entire query returns zero for all.... no rows. How could this be?

(Andrew Bowman) #5

MATCH only finds existing matched patterns in the graph, so if no such pattern exists then the row will be wiped out. Use OPTIONAL MATCH instead for those.

You may also need to handle the UNWIND when the allRequests collection is empty, as that will also wipe out rows. You can change that to:

UNWIND CASE WHEN size(allRequests) = 0 THEN [null] ELSE allRequests END as r

(Mail2michaelennis) #6

Ok...I will check this out and let you know. However, I am not sure I understand the role of the UNWIND CASE? does this UNWIND only applies to the case where Request has zero rows? If Request is zero then all other actions would be zero....In the instant case the Orders is zero.So the possible scenarios are any of the actions could be zero but if only Request is non-zero then we want to show that Response and Orders are zero. If request is zero then all actions should return zero.

(Andrew Bowman) #7

UNWIND basically performs a cartesian product of each row with the list elements for the list on that row, so if there is no list, or an empty list, then the row will be wiped out. This is a workaround to prevent that, replacing an empty list with a single element list (null for the element) so you'll keep the row, get a null for r, and be able to continue processing that row (remember to be using OPTIONAL MATCH in place of MATCH for all the rest of the matches in the query, excepting the first).

Feel free to try out the query with my change, vs with UNWIND without the case and see the differences! It would wipe out the row for that dealer rather than report the dealer with with counts of 0.

(Mail2michaelennis) #8

@andrew.bowman Thanks....I believe this works for all the test case tried so far. I think this is it....Still not clear what's going on with the WITH line the idea of it wiping out the data you already retrieved looks like an area in which Neo4j can do some improvement. It would be more intuitive if the data remains.

(Andrew Bowman) #9

I think it may be more useful to get a better understanding of how Cypher operates. Cypher operations create records (rows) and those rows also act as inputs to Cypher operations.

Filtering steps remove the full row based upon the predicates.

The thing to keep in mind is that the variables do not exist in isolation from each other, they are variables per row, and the row itself is the input and output, and is subject to the results of a predicate succeeding or failing.

UNWIND multiplies out the row by doing a cartesian product with each element of a list, so multiplying by 0 (empty list) means no rows for the given input row. I have asked for some kind of OPTIONAL UNWIND functionality (similar to OPTIONAL MATCH) that would more gracefully handle an empty or nonexistent list without wiping out the row, hopefully that may eventually make it in.