Graph Data Modeling Question

Hi @TonyOD27 ,
Really i like your kind way to explain each and every point. Let me go through in detail and update but got the core

1.Query Execution time is your concern and wants to ensure number of relationship count to super node may be the issue.

But 3 quick pointers to share
[1]. Aggregated operations always have some performance issue (Graph vs RDBMS ). So when you try COUNT(DISTINCT(node)) obvious it will load all ur nodes into memory and need more computing since you are using 'DISTINCT' .

[2]. since you mentioned on heap-size- 24GB. We faced similar issue (a blind mistake) since we ignored the memory config while setting up initial instance (AWScloud) so thought to confirm is that same on your side.

Please re-check whats your server RAM config. Hope it shud be greater that 24GB.
There is a recommendation and formulae
Memory = Heap Size + Page Cache + 2-3 GB for OS

We made HeapSize and page Cache but sum of these 2 config value was equal to RAM of Server itself ; so OS have no space to allocate for its I/O and process. so even simple MATCH query took too much time .Then we realized logic and importance of checking Heap Size + Page Cache+ OS memory (min 2-3 GB should be left for OS)

Example like - Instance got 24 GB RAM, recommended configuration is like
Initial Heap_Size 8 GB and made it Heap_Max to 12 GB = 50% of server Memory
Page Cache : 8 GB
So left 4 GB for OS

I was trying to get the actual Memory estimation technique link but missed to book mark.
But you pls check this below though it might be just 1% cause for the issue.

just one more point to share my learning since it wud help you other scenario ..
[3]. Neo4J stored data with its relations as we knew . I remember some example- Library- Books sorted and placed in Racks and each rack labeled like 'Graph Books, DB Books, OS, Server, Programming like .. so when the user wants to take 'Neo4J' book, he/she may not need to look from RACK 1 to RACK 10 but directly go to 'Graph Books rack 6' and take out books related to Graph right....this example helped me to have multiple RELATIONSHIP Name - and having multiple RELATIONS actually helps to fetch the data so quick instead of having just ONE Relations....this is the principle behind FAN_out..

Example: ( :Customer )-[:ORDER] -> (:Product) for last 3 years wud be many but
same, i had it like 2 ways based on my Business need

1 .(:Customer : GuestUser) -[:ORDER]-> (Product)
here instead of searching all customer base, i have a label to get a subset of customer so performance/query hit will be balanced
2. ( : Customer: MonthlyCustomer)-[: ORDER_JAN_1WEEK]->(:Product)

basically subset of customers based on some property /category aligned and helped Business to get the context

So this is what i tried and it worked well for Customer360 view and i got to learn- it is all OK to have multiple Relationships between 2 Nodes when we have Millions of connection of course each Business scenario is different and what we want to check /query is the key differentiator here.

Instead of 'Dept' as property in 'Person' node, is that OK to have 'DEPARTMENT' as new Node (Sales, HR, FINANCE,OPERATIONS) - basically group by Segments may work for you i think if its aligned to your Business scope.

Hope it helps!

With Smiles,
Senthil Chidambaram

1 Like