Data import from EDL 1.0 to Neo4j


(12kunal34) #1

Hi Everyone,

I have my data in EDL 1.0 and i want to import this data in neo4j .
my data is available at publish layer of EDL nad i can access it through impala . i can write query there and fetch the data .
could you please tell me what is the best way to do it .
i never did data import through hdfs before .

Thanks in advance.


(Stefan Armbruster) #2

Please provide more context. What is EDL 1.0 ? I only know the acronym as "Eclipse Distribution License".


(12kunal34) #3

Hi Stefan thanks for your reply.

EDL stands for Enterprise data lake and in other terms we can say that my data is on HDFS.
i need to import it in neo4j.


(Stefan Armbruster) #4

The APOC library allows for accessing hive datasource via apoc.load.jdbc.
Additionally all procedures using URLs do allow for hdfs:// stlye urls e.g. to load csv, json, xml or others.


(12kunal34) #5

Can we access impala through APOC ??


(Stefan Armbruster) #6

Never used Impala. Quick googling shows it has a jdbc driver, so I assume apoc.load.jdbc will work with it.


(12kunal34) #7

thank you so much stefan.
could you please tell me a sample syntx for data load using above apoc procedure..


(12kunal34) #8

Hi Stefan ,
i tried it with hive and gave below url connection string

WITH 'jdbc:hive2://my_Usrname:my_password@ip-172-31-6-58.ap-south-1.compute.internal:10000/cts687382_test' as url
CALL apoc.load.jdbc(url,'student') YIELD row
RETURN row.rollno, row.class;

but after running this getting below error

I have downloaded hive jdbc connector and put this in plugins folder
please help me to find what i missed here .


(Stefan Armbruster) #9

any suspicious entries in log/debug.log? You can also try to explicitly load a jdbc driver via CALL apoc.load.driver("com.mysql.jdbc.Driver"); (you need to replace the classname of course with the hive pendent)


(12kunal34) #10

hi
please find my log file


2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] LAST_TRANSACTION_COMMIT_TIMESTAMP (Commit time timestamp for last committed transaction): 1539811587053
2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] UPGRADE_TRANSACTION_COMMIT_TIMESTAMP (Commit timestamp of transaction the most recent upgrade was performed at): 0
2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for NEO_STORE_RECORDS END ---
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for TRANSACTION_RANGE START ---
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] Transaction log:
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] Oldest transaction 2 found in log with version 0
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for TRANSACTION_RANGE END ---
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles START ---
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] Disk space on partition (Total / Free / Free %): 69688356864 / 2212962304 / 3
Storage files: (filename : modification date - size)
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   New folder:
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-11-04T02:52:10-0800 - 0.00 B
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   certificates:
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]     neo4j.cert: 2017-09-23T00:11:21-0700 - 1002.00 B
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]     neo4j.key: 2017-09-23T00:11:21-0700 - 1.69 kB
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:21-0700 - 2.67 kB
2018-11-04 18:08:05.849+0000 INFO [o.n.k.i.DiagnosticsManager]   data:
2018-11-04 18:08:05.849+0000 INFO [o.n.k.i.DiagnosticsManager]     dbms:
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]       auth: 2017-09-23T00:12:44-0700 - 113.00 B
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]     - Total: 2017-09-23T00:12:44-0700 - 113.00 B
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:26-0700 - 113.00 B
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]   import:
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]     test.csv: 2018-10-17T12:09:02-0700 - 87.00 B
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-10-17T12:24:50-0700 - 87.00 B
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   index:
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:24-0700 - 0.00 B
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   logs:
2018-11-04 18:08:05.912+0000 INFO [o.n.k.i.DiagnosticsManager]     debug.log: 2018-11-04T10:08:05-0800 - 5.14 MB
2018-11-04 18:08:05.912+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:18-0700 - 5.14 MB
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore: 2018-10-17T14:39:19-0700 - 8.00 kB
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.counts.db.a: 2018-10-17T14:39:19-0700 - 960.00 B
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.counts.db.b: 2018-10-17T14:24:18-0700 - 928.00 B
2018-11-04 18:08:05.943+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labelscanstore.db: 2018-11-04T10:07:46-0800 - 48.00 kB
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db: 2018-10-17T12:39:13-0700 - 8.00 kB
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.974+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.names: 2018-10-17T12:39:13-0700 - 8.00 kB
2018-11-04 18:08:05.974+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.names.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db: 2018-10-17T14:39:19-0700 - 16.00 kB
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.id: 2018-11-04T10:07:45-0800 - 873.00 B
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.labels: 2017-09-23T00:11:21-0700 - 8.00 kB
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.labels.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db: 2018-11-04T02:43:26-0800 - 135.45 kB
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.arrays: 2018-10-17T14:39:19-0700 - 8.00 kB
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.arrays.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index: 2018-10-17T13:39:16-0700 - 8.00 kB
2018-11-04 18:08:06.037+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.037+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.keys: 2018-10-17T13:39:16-0700 - 8.00 kB
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.keys.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.strings: 2017-09-23T01:12:02-0700 - 16.00 kB
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.strings.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipgroupstore.db: 2017-09-23T01:12:02-0700 - 8.00 kB
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipgroupstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipstore.db: 2017-09-23T01:12:02-0700 - 23.91 kB
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db: 2017-09-27T00:46:23-0700 - 8.00 kB
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.099+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.names: 2017-09-23T00:32:56-0700 - 8.00 kB
2018-11-04 18:08:06.099+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.names.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.schemastore.db: 2017-09-23T00:11:23-0700 - 8.00 kB
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.schemastore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.transaction.db.0: 2018-11-04T10:07:42-0800 - 451.49 kB
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]   plugins:
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]     apoc-3.2.3.6-all.jar: 2018-11-04T02:42:07-0800 - 7.01 MB
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]     hadoop-common-3.1.1.jar: 2018-11-04T03:52:09-0800 - 3.85 MB
2018-11-04 18:08:06.146+0000 INFO [o.n.k.i.DiagnosticsManager]     hive-exec-3.1.0.jar: 2018-11-04T09:39:48-0800 - 38.72 MB
2018-11-04 18:08:06.153+0000 INFO [o.n.k.i.DiagnosticsManager]     hive-jdbc-3.1.0.jar: 2018-11-04T01:12:06-0800 - 122.33 kB
2018-11-04 18:08:06.157+0000 INFO [o.n.k.i.DiagnosticsManager]     httpasyncclient-4.0-beta4.jar: 2018-11-04T10:04:52-0800 - 150.93 kB
2018-11-04 18:08:06.161+0000 INFO [o.n.k.i.DiagnosticsManager]     httpclient-4.5.jar: 2018-11-04T09:48:31-0800 - 710.51 kB
2018-11-04 18:08:06.169+0000 INFO [o.n.k.i.DiagnosticsManager]     libthrift-0.9.3.jar: 2018-11-04T09:15:49-0800 - 228.71 kB
2018-11-04 18:08:06.177+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-11-04T10:06:15-0800 - 50.76 MB
2018-11-04 18:08:06.181+0000 INFO [o.n.k.i.DiagnosticsManager]   store_lock: 2017-09-23T00:11:21-0700 - 0.00 B
2018-11-04 18:08:06.185+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles END ---
2018-11-04 18:08:06.209+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED START ---
2018-11-04 18:08:06.813+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED END ---
2018-11-04 21:33:22.577+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 10798696ms.
2018-11-04 21:33:43.232+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 12549ms.
2018-11-05 03:19:37.205+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 20752015ms.
2018-11-05 03:19:56.070+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 324ms.

and i tried CALL apoc.load.driver("org.apache.hive.jdbc.HiveDriver");

it returns nthing but execured well aand after adding so many dependency jars getting below error

Failed to invoke procedure `apoc.load.jdbc`: Caused by: java.lang.NoClassDefFoundError: org/apache/http/HttpRequestInterceptor

(Stefan Armbruster) #11

NoClassDefFoundError might mean you're missing one of thedependent jars required to load the jdbc driver class. Which jars did you add to plugins folder?


(12kunal34) #12

Hi Stefan ,

please find below screenshot for available jars in plugin folder


(Stefan Armbruster) #13

Apoc docs for 3.2 advice to add these (https://neo4j-contrib.github.io/neo4j-apoc-procedures/index32.html):

  • hadoop-common-2.7.3.2.6.1.0-129.jar
  • hive-exec-1.2.1000.2.6.1.0-129.jar
  • hive-jdbc-1.2.1000.2.6.1.0-129.jar
  • hive-metastore-1.2.1000.2.6.1.0-129.jar
  • hive-service-1.2.1000.2.6.1.0-129.jar
  • httpclient-4.4.jar
  • httpcore-4.4.jar
  • libfb303-0.9.2.jar
  • libthrift-0.9.3.jar