Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?
Thanks for any help that anyone can offer
khaled
Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?
Thanks for any help that anyone can offer
khaled
rather than write your own CSV exporter you could use APOC and specifically the export functions are described at https://neo4j.com/docs/labs/apoc/current/export/
Installation is also described in this same document
Thank you for replying.
I have tried to use APOC export but it is too slow since i have more than 200M nodes.
Any suggestion to speed up the exporting process
do you have more detail on 'it is too slow'? if you change the cypher query to simply return count(*);
rather than for example return person.name, person.age, person.address
does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
Can you post the explain plan of the query?
Have you configured min/max heap and pagecache in the neo4j.conf.
This is the query i used
CALL apoc.export.csv.query("MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances ", "inputs.csv", {batchSize:200000, parallel:false})
I have configured the min/max heap and pagecache in the neo4j.conf
Note that my RAM is 8G
Regarding py2neo i have come up with the following script
from neo4j import GraphDatabase
import csv
driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "123"))
with open('result.csv', 'w',newline='') as csvFile:
writer = csv.writer(csvFile)
session = driver.session()
q1 = "MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances"
nod = session.run(q1)
for j in nod:
writer.writerow(j)
It works fine but it is also slow, any suggestion
8G RAM is quite small. 8G to be split between pagecahce and heap ?
what about
if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
also for what its worth as you have no WHERE clause in your MATCH statement the query is effectively a ScanNodesByLabel and there is no opportunity to use indexes etc. Effectively a TableScan in RDBMS world.