Showing results for 
Search instead for 
Did you mean: 

Read query performance -- property vs labeled node for list/array

Graph Buddy

I would like optimize the read query performance -- (response time, perhaps memory use) -- of storing a list of key/value pairs as a property compared with storing the same data as graph of labeled nodes.

The specific context is a list of 21 five-character strings, where each string is a key assigned by the US Census Bureau of another item managed in the database. Each such list is also associated with a date, so there is an instance of this list for each day within a given range (currently 2020-2022).

My current code is working, but computes this list over and over in a subquery. I'd like to instead compute each list once and then somehow store the resulting list.

One obvious approach is to compute the list and then store it as the value of a property on an already-existing labeled node (where there is one containing node for each day).

A different approach is create a labeled node for each element of the list (containing a date/value pair), link the elements together with a ":PRIOR" relationship binding, and use a path query and list comprehension to collect the desired list.

Is there a relevant "Neo4J best practice" that offers guidance about design decisions like this? I can make it work either way (I think). I can spike each implementation and measure the results myself. Since I'm surely not the first Neo4J developer to confront this question, I'm hoping that this community can offer some collective wisdom.

What is "Neo4J best practice" for storing multiple seldom-changing lists of key/value pairs?



Hi @tms 

For your information, properties are more expensive than labels in terms of storage and also in terms of performance if not indexed. You can try to check this explanation for the categorical variables so that you can make the best decision for your data/use-case. FYI, lists are not indexed in Neo4j so they cost a lot in performance while quering them.

would be good if you shared some more details on your model and the data you currently have.
it depends a lot on how you want to use the data.

the drawback on the list is that you cannot easily update inline and you cannot index the list values.

If it's only about index/storage of the 5 values each, you can also just use 5 properties ala value_2022 and index them if you want to.

A linked list of nodes makes more sense for real event chains that have a meaning in the domain and are frequently used as such, then you can use either an event list of if the values are aggregated over time-spans even an time-tree.

But today with many use-cases it's good enough to use use a date/localdate property and index that to get range queries over a set of nodes.





Nodes 2022
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.