How to load and analise data with large rows count? (Performance)

yzhelanov · June 5, 2024, 11:29am

Hello there!
I am new to neo4j, so I would be happy to receive any help or hints. Using Neo4j 5.20.0 on a Linux server with 32GB RAM and 8 vCPUs, as well as Neo4j Desktop 1.5.8.
I want to analyse data of orders: 370,000 items, 900,000,000 sales, and 130,000,000 orders. I have synchronized data within 3 days. This data weighs ~300 GB in Neo4j. Did it with small batches of 32,000 rows. But now it seems that I did it wrongly. I have created edges as follows: Order - Sale ⇾ Item. And I have created an index: CREATE INDEX item_index FOR (i:Item) ON (i.id).
So my questions:

Maybe I need to do it vice versa? (Item - Sale ⇾ Order)
Is it normal to load data of this size within 3 days? Items were loaded within 60 seconds in multithread. One batch of items was loaded in around 3–4 seconds. One batch of sales and orders was loaded in around 6–8 seconds. But there was a problem with multithread loading, so I did it in single thread.
This is how I load my sales and orders data:

UNWIND $sales AS sale
CREATE (o:Order {number: sale[3], date: sale[0]})
WITH o, sale
MATCH (i:Item {id: sale[2]})
CREATE (o)-[r:SALE {
  order_number: sale[3], number_in_order: sale[4],
  price: sale[5], valuerub: sale[6], valuesht: sale[7],
  val: sale[8],
  sebes: sale[9], price_type_id: sale[10]
}]->(i)

I want, for example, to analyse which items are often bought with another item, so the query will look like this:

MATCH (target:Item {id: 155868})<-[:SALE]-(o:Order)-[:SALE]->(i:Item)
WHERE i.id <> 155868
RETURN i.id AS item_id, COUNT(*) AS co_occurrence_count
ORDER BY co_occurrence_count DESC;

However, when I run it, the query takes 97477 ms to execute. My 8 cores are utilized on 0-2%, and RAM usage is not increasing. Maybe I need to adjust some Neo4j settings to make it work faster?
Plan of the query:

Topic		Replies	Views
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	924	August 4, 2023
Counting rows performance Cypher	2	241	January 10, 2022
Fastest way to load data in neo4j using python Cypher	5	8788	May 5, 2021
Loading in millions of nodes Import / Export performance , cypher , import	0	327	February 18, 2022
Performance query over millions of relationships Cypher	2	2444	January 31, 2020

How to load and analise data with large rows count? (Performance)

Related topics