Modeling nodes representing present and history

tomazk8 · October 13, 2024, 2:06pm

Just a topic to think about...

In the store-management system I'd like to have a history of sales stored as a single-linked list where new sales are inserted on the top, connected via PREVIOUS_SALE relationships in one long chain. To get the history I'd then traverse the list down, i.e. *0..100. No problem here. The challenge is that sales reference other nodes like Products and other entities. Imagine having 100k sales, each referencing products where one Product could then have 100k incoming relationships at worst. This degrades performance a lot when a query traverses Product's incoming relationships to find something not related to Sales, but Neo4j still has to touch incoming relationships from Sales to filter them out.

In general, looking at such system, Nodes could be split into two groups. Ones that are used at present time (Product) and others that were used in (or are part of) the past (Sale). Those used in the past are queried when i.e. retrieving history or doing statistics. Because the history grows and grows, it is not ok for the history to reference the present because the relationship count constantly grows and leads to slow performance as shown above. But we do need those relationships to query things.

How would it be if Neo4j would support mono-directional relationships, supporting traversal only into forward direction? Sometimes we don't have a reason to traverse backwards which could be useful in my case and probably other cases too. Because such relationships are of course not supported by Neo4j, the solution is to introduce intermediate node. Instead of Referencing Sale->Product, it would be Sale->X->Product. Sales would reference X (N:1) and X would reference the Product 1:1. This would not slow down any queries working with only with products and getting to a Product from Sale would only require to hop over one node. In general, same would apply to many other cases.

nguyentuananh92 · October 13, 2024, 3:58pm

Can you share the dump file for more discuss. I also interested in the problem as you talk.

nguyentuananh92 · October 13, 2024, 4:11pm

I am not sure if I understand the model at first like this or not.

tomazk8 · October 13, 2024, 5:24pm

I don't have the software built yet, but I was thinking of rewriting my small old store-management software to use Neo4j.

Basically, the Sale needs to know which products were sold, so maybe something like Sale-[:SOLD_PRODUCT]->Product.

I also have Purchase node that stores the details about purchased products. There too is Purchase-[:PURCHASED_PRODUCT]->Product.

Then there is Stock node (not the best name for it), that stores the products for a specific physical location inside one of the the stores, Stock-[:CONTAINS]->Product

The operation to find the Stock where the product is, would be traversing back over CONTAINS relationship. But if Sale and Purchase would directly reference this Product, Neo4j would have to find one incoming relationship from a lot of them, slowing it down with growing history and thus increasing number of incoming relationships. With intermediate node, there would be only three relationships forever.

With intermediate nodes, the size of the history would not effect the performance of operations at present time.

nguyentuananh92 · October 14, 2024, 3:13pm

I think don't care about software you want to build. Just modeling database to meet all the requirement you need first. I interested in history you are talking about. Can you share the dump file to discuss more detail.
Thanks.

tomazk8 · October 14, 2024, 4:01pm

Nguyen, what exactly do you mean by "dump file"?

nguyentuananh92 · October 14, 2024, 4:07pm

If you have create database and try to add some data, you can dump this database to discuss more detail.
Follow the instruction below
Create a DBMS dump file - Neo4j Desktop

tomazk8 · October 14, 2024, 4:31pm

No, I don't have the database built yet. It was just an idea I was experimenting with.

nguyentuananh92 · October 14, 2024, 4:34pm

Let try https://arrows.app/
to model it, list use case and data you want to have, and then add some data to detail more what you want. It may change alot from ideal to the fact.

hakan.lofqvist1 · October 15, 2024, 1:53pm

I think it is a matter of "perspectives". Here I tried to illustrate 3 different perspectives on "history" and how all 3 can co-exist in the same model:

Green would be the history of sales from the Product A's perspective
Blue would be the history from a All sales' perspective
Red would be the history from Customer A's perspective

Topic		Replies	Views
Relational data vs sequential data Modeling	1	776	October 9, 2019
Neo4j for PoS system Newbie Questions mysql	4	969	April 10, 2019
How to have a historical data in Graph Database? Neo4j Graph Platform	3	3753	March 2, 2020
Database Modelling: When to create 'middle node' and when to create relationship Neo4j Graph Platform knowledge-base	2	377	September 3, 2021
Handling Entity History Records Graph Algorithms/Graph Data Science	2	419	April 4, 2024

Modeling nodes representing present and history

Related topics