Export data to csv using py2neo

Khaled · November 11, 2019, 8:07am

Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?

Thanks for any help that anyone can offer
khaled

dana_canzano · November 12, 2019, 1:35pm

rather than write your own CSV exporter you could use APOC and specifically the export functions are described at https://neo4j.com/docs/labs/apoc/current/export/

Installation is also described in this same document

Khaled · November 12, 2019, 1:48pm

Thank you for replying.
I have tried to use APOC export but it is too slow since i have more than 200M nodes.
Any suggestion to speed up the exporting process

dana_canzano · November 12, 2019, 1:51pm

do you have more detail on 'it is too slow'? if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
Can you post the explain plan of the query?
Have you configured min/max heap and pagecache in the neo4j.conf.

Khaled · November 12, 2019, 2:07pm

This is the query i used

CALL apoc.export.csv.query("MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances ", "inputs.csv", {batchSize:200000, parallel:false})

I have configured the min/max heap and pagecache in the neo4j.conf
Note that my RAM is 8G

Regarding py2neo i have come up with the following script

from neo4j import GraphDatabase
import csv

driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "123"))

with open('result.csv', 'w',newline='') as csvFile:
writer = csv.writer(csvFile)
session = driver.session()
q1 = "MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances"
nod = session.run(q1)

for j in nod:
    writer.writerow(j)

It works fine but it is also slow, any suggestion

dana_canzano · November 12, 2019, 2:32pm

8G RAM is quite small. 8G to be split between pagecahce and heap ?

what about

if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?

also for what its worth as you have no WHERE clause in your MATCH statement the query is effectively a ScanNodesByLabel and there is no opportunity to use indexes etc. Effectively a TableScan in RDBMS world.

Topic		Replies	Views
Export relation to CSV Cypher	3	2428	February 13, 2020
Export Millions of Data on Neo4j to CSV, JSON with Official Drivers Import / Export	3	3120	January 14, 2020
Fast Export Cypher performance	0	915	December 13, 2018
How to directly export oracle db query output into CSV using call apoc.export.csv.query Procedures & APOC	3	460	April 23, 2020
How to speed up uploading data from csv in graph db Cypher apoc , cypher , bolt , import	5	4487	August 29, 2019

July Summer Fun!

Export data to csv using py2neo

Related topics