Importing large table from Postgresql


(Lkapusta) #1

I have table which size is ~100gb in postgresql. When I try to import that table:

/home/user/neo4j-etl-cli-1.2.1/bin/neo4j-etl export --url jdbc:postgresql://127.0.0.1:5432/base --user user --password pass --schema myschema --fs 100 --import-tool /usr/share/neo4j/bin --csv-directory /tmp/csv --force --using cypher:neo4j-shell --neo4j:user neouser --neo4j:password neopass --destination /tmp/import/graph.db/

I get "Command failed due to error (RuntimeException: Execution of CSV export process is failed org.postgresql.util.PSQLException: Ran out of memory retrieving query results.). Rerun with --debug flag for detailed diagnostic information."

I tried experimenting with fetch-size flag and I still got "ran out of memory".
Is there any other way to import large tables to neo4j?

And how can I exclude table example_table from export?
neo4j-etl-cli-1.2.1/bin/neo4j-etl export --exclusion-mode-tables exclude --tables example_table
?


(Ryan Boyd) #2

@michael.hunger or @albertodelazzari ay have some additional suggestions. however, the current way of retrieving the data over JDBC might not work for 100GB.

i'd suggest exporting manually from Postgres using COPY functionality:
https://www.postgresql.org/docs/current/static/sql-copy.html

there are some examples here:
http://www.postgresqltutorial.com/export-postgresql-table-to-csv-file/

after exporting, you can use neo4j-admin import to import the data into neo4j:
https://neo4j.com/docs/operations-manual/current/tutorial/import-tool/

hope this helps!


(Lkapusta) #3

Thanks. I will have to write custom exporter to split those records in several csv files.

Could you tell me how to use exclude tables flag in neo4j-etl? I have to skip that one large table during export-import.


(Alberto De Lazzari) #4

Hi,
you can find some useful information here.
You can use this parameter to define the exclusion policy: --exclusion-mode-tables and then use --tables to specify the list of tables to exclude/include.

Hope this can be helpful.


(Lkapusta) #5

Thank you.
I've read etl documentation that you linked but I still can't exclude big tables from the import.
Let's say I have schema myschema in postgresql with many tables and I want to exclude two tables: firsttable and secondtable. I tried:
--rdbms:schema myschema --exclusion-mode-tables --tables firsttable secondtable
and
--rdbms:schema myschema --exclusion-mode-tables --tables myschema.firsttable myschema.secondtable

Etl ignored those options and generated csv files for both tables and imported them.


(Michael Hunger) #6

Did you run that on the command-line?

Can you share your full commandline?


(Alberto De Lazzari) #7

Hi, the command should be this one:

--rdbms:schema myschema --exclusion-mode-tables exclude --tables firsttable secondtable

You need to specify which policy to use for the "table exclusion mode" because the default one is none, so the --tables is ignored


(J Delaunay) #8

Let's try TALEND for injecting data into neo4j.

regards,