In the store-management system I'd like to have a history of sales stored as a single-linked list where new sales are inserted on the top, connected via PREVIOUS_SALE relationships in one long chain. To get the history I'd then traverse the list down, i.e. *0..100. No problem here. The challenge is that sales reference other nodes like Products and other entities. Imagine having 100k sales, each referencing products where one Product could then have 100k incoming relationships at worst. This degrades performance a lot when a query traverses Product's incoming relationships to find something not related to Sales, but Neo4j still has to touch incoming relationships from Sales to filter them out.
In general, looking at such system, Nodes could be split into two groups. Ones that are used at present time (Product) and others that were used in (or are part of) the past (Sale). Those used in the past are queried when i.e. retrieving history or doing statistics. Because the history grows and grows, it is not ok for the history to reference the present because the relationship count constantly grows and leads to slow performance as shown above. But we do need those relationships to query things.
How would it be if Neo4j would support mono-directional relationships, supporting traversal only into forward direction? Sometimes we don't have a reason to traverse backwards which could be useful in my case and probably other cases too. Because such relationships are of course not supported by Neo4j, the solution is to introduce intermediate node. Instead of Referencing Sale->Product, it would be Sale->X->Product. Sales would reference X (N:1) and X would reference the Product 1:1. This would not slow down any queries working with only with products and getting to a Product from Sale would only require to hop over one node. In general, same would apply to many other cases.
I don't have the software built yet, but I was thinking of rewriting my small old store-management software to use Neo4j.
Basically, the Sale needs to know which products were sold, so maybe something like Sale-[:SOLD_PRODUCT]->Product.
I also have Purchase node that stores the details about purchased products. There too is Purchase-[:PURCHASED_PRODUCT]->Product.
Then there is Stock node (not the best name for it), that stores the products for a specific physical location inside one of the the stores, Stock-[:CONTAINS]->Product
The operation to find the Stock where the product is, would be traversing back over CONTAINS relationship. But if Sale and Purchase would directly reference this Product, Neo4j would have to find one incoming relationship from a lot of them, slowing it down with growing history and thus increasing number of incoming relationships. With intermediate node, there would be only three relationships forever.
With intermediate nodes, the size of the history would not effect the performance of operations at present time.
I think don't care about software you want to build. Just modeling database to meet all the requirement you need first. I interested in history you are talking about. Can you share the dump file to discuss more detail.
Thanks.
If you have create database and try to add some data, you can dump this database to discuss more detail.
Follow the instruction below Create a DBMS dump file - Neo4j Desktop
Let try https://arrows.app/
to model it, list use case and data you want to have, and then add some data to detail more what you want. It may change alot from ideal to the fact.
I think it is a matter of "perspectives". Here I tried to illustrate 3 different perspectives on "history" and how all 3 can co-exist in the same model: