I have a very simple MySQL database consisting of only three tables
companiesproductscompanies_products
Table companies has ~470.000 rows of companies with some adress data, website, categories
Table products has ~46.000 rows and only contains a product name.
Table products_companies is a m:n table that links products to the companies that produce or sell said products. It has ~ 4.900.000 rows (product_id, company_id, indicator of seller, bulk seller or producer)
The upload has to run only once, I don't need updates at the moment. The graph is a very simple (:company)-[:has]->(:product).
What is the preferred or recommended method to do this??
I've been trying to import this data to a local graph using my freshly downloaded Neo4j Desktop. Using the ETL Tool with Online Direct Import and Standard batch sizes it is running now for 24 hours (!!) and completed 390.000 rows.
I tried loading the 4.9 mil relationship rows with a LOAD CSV with CALL IN TRANSACTIONS and it is also running WAY to slow. I tried different IN TRANSACTION OF xy ROWS to no effect.
Heres' the LOAD CSVcode:
:auto LOAD CSV WITH HEADERS FROM 'file:///comp_prod.CSV' AS line
CALL {
with line
MATCH (p:Products {id: toInteger(line.productid)})
MATCH (c:Companies {id: toInteger(line.companyid)})
MERGE (f)-[:has{
seller: toInteger(line.seller),
bulk_seller: toInteger(line.bulk_seller),
producer: toInteger(line.producer),
}]->(b)
} in transactions
Is there a recommendations for batch size? My machine is a moderate powerful i7-7700 with 32 GB of RAM.
Should larger Imports only be made with the graph not started? What am I doing wrong here?
Edit: Its 5.13 on Windows 10 Machine