Importing large table from Postgresql

(Lkapusta) #1

I have table which size is ~100gb in postgresql. When I try to import that table:

/home/user/neo4j-etl-cli-1.2.1/bin/neo4j-etl export --url jdbc:postgresql:// --user user --password pass --schema myschema --fs 100 --import-tool /usr/share/neo4j/bin --csv-directory /tmp/csv --force --using cypher:neo4j-shell --neo4j:user neouser --neo4j:password neopass --destination /tmp/import/graph.db/

I get "Command failed due to error (RuntimeException: Execution of CSV export process is failed org.postgresql.util.PSQLException: Ran out of memory retrieving query results.). Rerun with --debug flag for detailed diagnostic information."

I tried experimenting with fetch-size flag and I still got "ran out of memory".
Is there any other way to import large tables to neo4j?

And how can I exclude table example_table from export?
neo4j-etl-cli-1.2.1/bin/neo4j-etl export --exclusion-mode-tables exclude --tables example_table

(Ryan Boyd) #2

@michael.hunger or @albertodelazzari ay have some additional suggestions. however, the current way of retrieving the data over JDBC might not work for 100GB.

i'd suggest exporting manually from Postgres using COPY functionality:

there are some examples here:

after exporting, you can use neo4j-admin import to import the data into neo4j:

hope this helps!

(Lkapusta) #3

Thanks. I will have to write custom exporter to split those records in several csv files.

Could you tell me how to use exclude tables flag in neo4j-etl? I have to skip that one large table during export-import.

(Alberto De Lazzari) #4

you can find some useful information here.
You can use this parameter to define the exclusion policy: --exclusion-mode-tables and then use --tables to specify the list of tables to exclude/include.

Hope this can be helpful.

(Lkapusta) #5

Thank you.
I've read etl documentation that you linked but I still can't exclude big tables from the import.
Let's say I have schema myschema in postgresql with many tables and I want to exclude two tables: firsttable and secondtable. I tried:
--rdbms:schema myschema --exclusion-mode-tables --tables firsttable secondtable
--rdbms:schema myschema --exclusion-mode-tables --tables myschema.firsttable myschema.secondtable

Etl ignored those options and generated csv files for both tables and imported them.

(Michael Hunger) #6

Did you run that on the command-line?

Can you share your full commandline?

(Alberto De Lazzari) #7

Hi, the command should be this one:

--rdbms:schema myschema --exclusion-mode-tables exclude --tables firsttable secondtable

You need to specify which policy to use for the "table exclusion mode" because the default one is none, so the --tables is ignored

(J Delaunay) #8

Let's try TALEND for injecting data into neo4j.