Length X Array Property or Different X Single Properties

berkay.coskuner98 · March 21, 2023, 7:09am

Hello,

I have a question about node properties. The question has 2 different perspectives. But first, here is the question:

What is the difference between using an array of length 24 (only 1 property) and different single 24 properties in a node ?

Example:

Node1:
TIME : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0]

Node2:
TIME_1: 0
TIME_2: 0
TIME_3: 0
TIME_4: 0
.
.
.
TIME_23:0

What is the difference between this 2 nodes in a in-memory(named) graph with respect to memory consumption ? Important point is, I am NOT only interested in final memory usage (coming from estimate functions). I am interested in intermediate steps. For example my final memory usage is 49 GB but my memory peak is around 260GB while creating a in-memory graph. My biggest constraint is this 260GB.
What is the difference between this 2 nodes in a machine learning algorithm ? For example FastRP.

Thank you.

florentin_dorre · March 23, 2023, 9:27am

Hello @berkay.coskuner98,
(1) Storing it will be more memory efficient to store all properties in a single array for now as there is some static overhead per node property, but it should not be a lot.
However, during computation, its preferred to project a single array property.
Further the memory consumption for FastRP is mainly driven by the embeddingDimension (and your nodeCount).

(2) The result should be the same. It mainly differs in what you need to define as featureProperties.

Hope that solves your question

berkay.coskuner98 · March 24, 2023, 11:32am

Hello Florentin,

(1) Thank you for your reply. You are saying that, keeping all of the properties in an array is better than keeping them seperately. Did I understand correctly ?

If I understand it correctly, I have observed the opposite situation.

Keeping 24 separate variables instead of a 24-element array seems to be more memory efficient. In my experiment, the in-memory graph with a 24-element array took up 79 GB of space. The in-memory graph, which contains 24 seperate elements, took up 49 GB of space. I don't quite understand why?

We are considering the situation we are talking about for a big graph, about 500 million nodes, right?

(2) Thank you so much. I understand that thy always flattened. Does not matter array or single variable ?

florentin_dorre · March 29, 2023, 8:59am

(1) Yes I would expect the array version is better. This is surprising and we will investigate and get back to you.

(2) yes thats true.

berkay.coskuner98 · March 29, 2023, 10:05am

Hello Florentin,

Thank you for your reply.

I would be very happy because I have some doubts about that.
Understood. Thank you.

max.kiessling1 · April 3, 2023, 5:21pm

Hej @berkay.coskuner98

I started investigating your case a little and I think the first conclusion is that the behavior that you observed is actually somewhat expected.
This is due to the fact of how we currently store node properties.

Primitive node property values, for example longs, are basically stored in a paged primitive array like long[][]. This means there is basically next to no overhead for storing a single node property for every node expect for the array header.

Array node properties are stored similarly, i.e. long[][][] (note the additional dimension here).
In this case the property array basically stores a pointer to the actual property array of each node. Which introduces the overhead of basically an additional long value per node on 64bit machine.

However, I have been so far unable to reproduce the huge gap between the two variants that you observed.

Could you supply some additional information like:

The numer of nodes
The projection query
The data type stored for every node (long, double, float)

How did you obtain the size measurement, is it from gds.graph.list / the return fields of the projection query?

Thanks a lot!
Best, Max

Topic		Replies	Views
What is the point of using different types of edges? Graph Algorithms/Graph Data Science	8	626	March 29, 2023
Hardware sizing for Graph Database Neo4j Graph Platform operations	3	469	October 30, 2020
Read query performance -- property vs labeled node for list/array Neo4j Graph Platform migrated	2	240	June 17, 2022
Storing potentially large nodes in Neo4j Modeling performance , data-modeling	1	743	May 19, 2022
Best practices on number of properties for a node Modeling	2	3933	March 22, 2019

Length X Array Property or Different X Single Properties

Example:

Related topics