Cypher to make incremental changes to nodes/relationships sourced from external source

apoc
performance
import

(Paul Drangeid) #1

I only started writing Cypher after attending GraphConnect in September, so as I am trying some more advanced queries, I'd appreciate any critique on methods, efficiencies (or frankly, just sloppy code).

I am ingesting data from a service ticket database. I've already written a script that does the bulk data load (first initial), and now want to write the more efficient incremental updates and modifications.

The cypher I am presenting here would be responsible for MODIFYING any existing tickets that have changed since the last refresh. (I would see this procedure being run recurring every 5-10 minutes or so)
It needs to see if data changed (and thus relationships). I am assuming (correctly?) that I want to be efficient and perform comparisons in cypher code (executing reads and logic in memory) vs over-aggressive SET commands creating unnecessary writes to the Neo4j DB.

I read from the external DB and pull modified tickets.
I then perform OPTIONAL MATCH for data that has relationships (the coalesce is in case the row has NULL data. I create 'None Provided' label nodes so there are place-holder relationships)
then additional OPTIONAL MATCH to see the EXISTING relationships with the EXISTING ticket (before it has been modified with changes)
Using FOREACH to delete relationships which are no longer accurate
Using FOREACH to update node properties that are no longer accurate
Using MERGE to create the NEW relationships based on the updated values from the external Database.
Finally, a timestamp to prevent this ticket from getting processed in the next refresh, unless the ticket in the source database were to be modified again (and thus get a newer UPDATEDATE value)

Yes I am aware there is some redundancy in having all those data pieces as both node properties AND relationships.
I'm still exploring the modeling and will probably be removing some of the node properties in a future version.
Before I repeat this concept with my other data from this system I thought I would run it by the community. Thanks for any input / suggestions!

Without further delay, here's the cypher code:

// FIRST we should look at any tickets MODIFIED (BUT NOT CREATED) after the last ticket ingestion
MATCH (cdl:Crmdataload {name:'CommitCRM'}), (lt:Ticket)
WITH cdl,max(lt.createdon) as lastnewticket
WITH cdl,apoc.date.format(lastnewticket,'ms','yyyy-MM-dd HH:mm:ss.sss') AS lastnewticket,timestamp() AS howsoonisnow,apoc.date.format(cdl.ticketrefresh,'ms','yyyy-MM-dd HH:mm:ss.sss', 'CST') AS lastticketrefresh
CALL apoc.load.driver("com.extendedsystems.jdbc.advantage.ADSDriver")
call apoc.load.jdbc('jdbc:extendedsystems:advantage://commitcrm:6262;catalog=//commitcrm/CommitCRM/Db','SELECT TICKETNO,CARDID,CONTACTID,WORKERID,PRIORITY,OPENDATETIME,CLOSEDATETIME,STATUS,KIND
,CREATEDATE,CREATEUSER,UPDATEDATE,UPDATEUSER,SOURCE,ASSETRECID,DUEDATETIME from Tickets WHERE CREATEDATE < \''+lastnewticket+'\' AND UPDATEDATE > \''+lastticketrefresh+'\'') YIELD row
MATCH (t:Ticket {ticketnumber:row.TICKETNO})
OPTIONAL MATCH (e:Crmemployee {acctrecid:row.WORKERID})
OPTIONAL MATCH (a1:Company {acctrecid:row.CARDID})
OPTIONAL MATCH (a:Company {acctrecid:coalesce(a1.acctrecid,'None Provided')})
OPTIONAL MATCH (c1:Contact {acctrecid:row.CONTACTID})
OPTIONAL MATCH (c:Contact {acctrecid:coalesce(c1.acctrecid,'None Provided')})
OPTIONAL MATCH (src:Ticketsource {name:coalesce(row.SOURCE,'None Provided')})
OPTIONAL MATCH (tt:Tickettype {name:coalesce(row.KIND,'None Provided')})
OPTIONAL MATCH (ts:Ticketstatus {statcode:coalesce(row.STATUS,'None Provided')})
//MERGE (t:Ticket {ticketnumber:row.TICKETNO})
OPTIONAL MATCH (t)-[mbr:MANAGED_BY]-(oe:Crmemployee)
OPTIONAL MATCH (t)-[tor:TICKET_OF_CLIENT]-(oa:Company)
OPTIONAL MATCH (t)-[tfr:TICKET_FOR]-(oc:Contact)
OPTIONAL MATCH (t)-[tsr:TICKET_SOURCE]-(ots:Ticketsource)
OPTIONAL MATCH (t)-[ttr:TICKET_TYPE]-(ott:Tickettype)
OPTIONAL MATCH (t)-[tsr:TICKET_STATUS]-(ots:Ticketstatus)
//set t.manager = e.name,t.openedon=row.OPENDATETIME,t.closedon=row.CLOSEDATETIME,t.createdon=row.CREATEDATE,t.updatedon=row.UPDATEDATE
FOREACH (ignoreMe in CASE WHEN not(exists(oe.acctrecid)) or (row.WORKERID <> oe.acctrecid) THEN [1] ELSE [] END | DELETE mbr)
FOREACH (ignoreMe in CASE WHEN not(exists(oa.acctrecid)) or (row.CARDID <> oa.acctrecid) THEN [1] ELSE [] END | DELETE tor)
FOREACH (ignoreMe in CASE WHEN not(exists(oc.acctrecid)) or (row.CONTACTID <> oc.acctrecid)  THEN [1] ELSE [] END | DELETE tfr)
FOREACH (ignoreMe in CASE WHEN not(exists(ots.name)) or (row.SOURCE <> ots.name)  THEN [1] ELSE [] END | DELETE tsr)
FOREACH (ignoreMe in CASE WHEN not(exists(ott.name)) or (row.KIND <> ott.name)  THEN [1] ELSE [] END | DELETE ttr)
FOREACH (ignoreMe in CASE WHEN not(exists(ots.name)) or (row.STATUS <> ots.name)  THEN [1] ELSE [] END | DELETE tsr)
FOREACH (ignoreMe in CASE WHEN (exists(e.name)) and (t.manager <> e.name) THEN [1] ELSE [] END | SET t.manager=e.name)
FOREACH (ignoreMe in CASE WHEN (exists(a.name)) and (t.account <> a.name) THEN [1] ELSE [] END | SET t.account=a.name)
FOREACH (ignoreMe in CASE WHEN (exists(c.name)) and (t.contact <> c.name)  THEN [1] ELSE [] END | SET t.contact=c.name)
FOREACH (ignoreMe in CASE WHEN (exists(src.name)) and (t.source <> src.name)  THEN [1] ELSE [] END | SET t.source=src.name)
FOREACH (ignoreMe in CASE WHEN (exists(tt.name)) and (t.type <> tt.name)  THEN [1] ELSE [] END | SET t.type=tt.name)
FOREACH (ignoreMe in CASE WHEN (exists(ts.name)) and (t.status <> ts.name)  THEN [1] ELSE [] END | SET t.status=ts.name)
MERGE (t)-[:MANAGED_BY]->(e)
MERGE (t)-[:TICKET_OF_CLIENT]->(a)
MERGE (t)-[:TICKET_FOR]->(c)
MERGE (t)-[:TICKET_SOURCE]->(src)
MERGE (t)-[:TICKET_TYPE]->(tt)
// SET the timestamp for when this refresh procedure last ran
SET cdl.ticketrefresh=howsoonisnow
RETURN t.ticketnumber,oe.name,oa.name,oc.name;
//RETURN count(t) AS `Updated (:Ticket) when data or relationships have changed;