I have a very simple MySQL database consisting of only three tables
companies
products
companies_products
Table companies
has ~470.000 rows of companies with some adress data, website, categories
Table products
has ~46.000 rows and only contains a product name.
Table products_companies
is a m:n table that links products to the companies that produce or sell said products. It has ~ 4.900.000 rows (product_id, company_id, indicator of seller, bulk seller or producer)
The upload has to run only once, I don't need updates at the moment. The graph is a very simple (:company)-[:has]->(:product).
What is the preferred or recommended method to do this??
I've been trying to import this data to a local graph using my freshly downloaded Neo4j Desktop. Using the ETL Tool with Online Direct Import and Standard batch sizes it is running now for 24 hours (!!) and completed 390.000 rows.
I tried loading the 4.9 mil relationship rows with a LOAD CSV
with CALL IN TRANSACTIONS
and it is also running WAY to slow. I tried different IN TRANSACTION OF xy ROWS
to no effect.
Heres' the LOAD CSVcode:
:auto LOAD CSV WITH HEADERS FROM 'file:///comp_prod.CSV' AS line
CALL {
with line
MATCH (p:Products {id: toInteger(line.productid)})
MATCH (c:Companies {id: toInteger(line.companyid)})
MERGE (f)-[:has{
seller: toInteger(line.seller),
bulk_seller: toInteger(line.bulk_seller),
producer: toInteger(line.producer),
}]->(b)
} in transactions
Is there a recommendations for batch size? My machine is a moderate powerful i7-7700 with 32 GB of RAM.
Should larger Imports only be made with the graph not started? What am I doing wrong here?
Edit: Its 5.13 on Windows 10 Machine