Import from JSON where CASE statement would help

WardCunningham · November 11, 2018, 10:01pm

I am importing four json files using a process similar to that described by Michael Hunger.

Three of the files load successfully but one is giving me trouble because I can't get the cypher logic right. I feel like I need to use a case statement but that doesn't seem to be useful for choosing between different merge strategies. The file describes dataflows in a computer system. The json is an array of objects where each object describes a flow of one of three types: a database connection, a restful api, and a async-queue producer/consumer. The particular file is available online:

http://context.asia.wiki.org/plugin/json/dataflow-diagram

My Cypher code tries to use CASE to choose which merges to apply.

unwind {json} as obj
case obj.type
when 'store' then
  create (d:DB {store: obj.store})
  merge (f:PROGRAM {name: obj.from})
  merge (f)-[:STORE]->(d)
when ...

This fails with the message:

Invalid input 's': expected 'l/L' (line 2, column 3 (offset: 23))
"case obj.type"
   ^

Now having studied this grammar:
https://s3.amazonaws.com/artifacts.opencypher.org/M12/railroad/CaseExpression.html
I realize case was never intended to choose between merge queries. I'm thinking now I have to create incomplete nodes with only the type field and then run a series of match-merge sequences, one for each type in my json. Or is there a better way?

I should mention that I'm running these Cypher scripts as part of a Docker build.

github.com

WardCunningham/context-db/blob/master/Dockerfile

FROM neo4j:3.1.5

RUN apk --no-cache upgrade && \
    apk --no-cache add curl jq

WORKDIR /
COPY . /

# Configure neo4j for read/write
COPY neo4j.write.conf /var/lib/neo4j/conf/neo4j.conf

WORKDIR /var/lib/neo4j
ENV PATH /var/lib/neo4j/bin:$PATH
RUN neo4j start && \
    sleep 15 && \
    cat /var/lib/neo4j/logs/neo4j.log && \
    echo ================================ && \
    sh /load_json.sh organization-chart && \
    sh /load_json.sh source-code-control && \
    sh /load_json.sh dataflow-diagram && \

This file has been truncated. show original

WardCunningham · November 12, 2018, 12:22am

I think I have acceptable logic now but it appears to be surprisingly inefficient running 100x slower than my other cypher for similarly sized json file. Here is my approach (be sure to scroll to see every line):

create (s:SOURCE {
  name:'dataflow-diagram',
  title:'Dataflow Diagram'})
with s

unwind {json} as obj
  create (c:CASE)
  set c=obj
with s

match (c:CASE {type:'store'})
merge (d:DB {name:c.store})
merge (g:PROGRAM {name:c.from})
merge (g)-[:QUERIES {source:s.name}]->(d)
with s

match (c:CASE {type:'rest'})
merge (f:PROGRAM {name:c.from})
merge (t:PROGRAM {name:c.to})
merge (f)-[:CALLS {source:s.name}]->(t)
with s

match (c:CASE {type:'queue'})
create (q:QUEUE {name:c.queue})
merge (r:PROGRAM {name:c.read})
merge (r)-[:READS {source:s.name}]->(q)
merge (w:PROGRAM {name:c.write})
merge (w)-[:WRITES {source:s.name}]->(q)

I've included the source attribution that I omitted in my previous code as a simplification. Now I include it so that I am directly cutting and pasting my work. This is the terminal output I get while loading my data one file at a time.

organization-chart
real	0m 1.46s
user	0m 0.00s
sys	0m 0.00s

source-code-control
real	0m 1.19s
user	0m 0.00s
sys	0m 0.00s

dataflow-diagram
Cannot merge node using null property value for name
real	1m 49.67s
user	0m 0.00s
sys	0m 0.00s

service-traffic-reports
real	0m 0.50s
user	0m 0.00s
sys	0m 0.00s

I can't explain my "null property value" error either. I've inspected my data with jq and haven't found anything amiss. I could be struggling too long. Time to set this project aside.

WardCunningham · November 13, 2018, 4:00pm

lyonwj · November 13, 2018, 11:04pm

Hi Ward -

There is a bit of a hack you can use for this sort of conditional logic that involves creating dummy lists using CASE and then iterating over them. Something like this:

create (s:SOURCE {
  name:'dataflow-diagram',
  title:'Dataflow Diagram'})
with s

UNWIND {json} AS obj
WITH *,
  CASE WHEN obj.type == 'store' THEN [1] ELSE [] END AS stores,
  CASE WHEN obj.type == 'rest' THEN [1] ELSE [] END AS rests,
  CASE WHEN obj.type == 'queue' THEN [1] ELSE [] END AS queues


FOREACH (_ IN stores |
  MERGE (d:DB {name: obj.store})
  MERGE (g:PROGRAM {name: obj.from})
  MERGE (g)-[:QUERIES {source: s.name}]->(d)
)

FOREACH (_ IN rests |
  MERGE (f:PROGRAM {name: obj.from})
  MERGE (t:PROGRAM {name: obj.to})
  MERGE (f)-[:CALLS {source: s.name}]->(t)
)

FOREACH (_ IN queues |
  CREATE (q:QUEUE {name: obj.queue})
  MERGE (r:PROGRAM {name: obj.read})
  MERGE (r)-[:READS {source: s.name}]->(q)
  MERGE (w:PROGRAM {name: obj.write})
  MERGE (w)-[:WRITES {source: s.name}]->(q)
)

Also, do you have uniqueness constraints created for any of the properties you are merging on?

WardCunningham · November 14, 2018, 6:49am

Thanks for the tip William. The decoding section that I finally used distinguished queue reads from queue writes. This was my source of null values since only one or the other was present. There was also a fourth type, calls to external sites.

  CASE WHEN obj.type='store'
    THEN [1] ELSE [] END AS stores,
  CASE WHEN obj.type='rest'
    THEN [1] ELSE [] END AS rests,
  CASE WHEN obj.type='queue'
    and obj.read is not null
    THEN [1] ELSE [] END AS reads,
  CASE WHEN obj.type='queue'
    and obj.write is not null
    THEN [1] ELSE [] END AS writes,
  CASE WHEN obj.type='site'
    THEN [1] ELSE [] END AS sites

This version ran in a half second which was consistent with the other files. I wonder what my version was doing with all of those seconds?

dataflow-diagram
real	0m 0.59s
user	0m 0.00s
sys	    0m 0.00s

Topic		Replies	Views
Error using CASE WHEN statment Cypher	8	4023	April 21, 2020
Conditionally create or merge a node Cypher cypher , operations	16	5258	July 20, 2021
Conditionally add a property to a node when importing CSV with empty spaces Cypher	8	982	November 5, 2020
Import nodes from recursively structured json Cypher apoc , cypher , import	2	1664	February 26, 2020
Cannot perform queries with "WHERE" clause to data imported from JSON file Cypher apoc , cypher	2	306	June 18, 2021

August 🏄 🏖️ 🏊

Import from JSON where CASE statement would help

Related topics