Length X Array Property or Different X Single Properties

Hello,

I have a question about node properties. The question has 2 different perspectives. But first, here is the question:

What is the difference between using an array of length 24 (only 1 property) and different single 24 properties in a node ?

Example:

Node1:
TIME : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0]

Node2:
TIME_1: 0
TIME_2: 0
TIME_3: 0
TIME_4: 0
.
.
.
TIME_23:0

  1. What is the difference between this 2 nodes in a in-memory(named) graph with respect to memory consumption ? Important point is, I am NOT only interested in final memory usage (coming from estimate functions). I am interested in intermediate steps. For example my final memory usage is 49 GB but my memory peak is around 260GB while creating a in-memory graph. My biggest constraint is this 260GB.

  2. What is the difference between this 2 nodes in a machine learning algorithm ? For example FastRP.

Thank you.

Hello @berkay.coskuner98,
(1) Storing it will be more memory efficient to store all properties in a single array for now as there is some static overhead per node property, but it should not be a lot.
However, during computation, its preferred to project a single array property.
Further the memory consumption for FastRP is mainly driven by the embeddingDimension (and your nodeCount).

(2) The result should be the same. It mainly differs in what you need to define as featureProperties.

Hope that solves your question

Hello Florentin,

(1) Thank you for your reply. You are saying that, keeping all of the properties in an array is better than keeping them seperately. Did I understand correctly ?

If I understand it correctly, I have observed the opposite situation.

Keeping 24 separate variables instead of a 24-element array seems to be more memory efficient. In my experiment, the in-memory graph with a 24-element array took up 79 GB of space. The in-memory graph, which contains 24 seperate elements, took up 49 GB of space. I don't quite understand why?

We are considering the situation we are talking about for a big graph, about 500 million nodes, right?

(2) Thank you so much. I understand that thy always flattened. Does not matter array or single variable ?

(1) Yes I would expect the array version is better. This is surprising and we will investigate and get back to you.

(2) yes thats true.

Hello Florentin,

Thank you for your reply.

  1. I would be very happy because I have some doubts about that.

  2. Understood. Thank you.

Hej @berkay.coskuner98

I started investigating your case a little and I think the first conclusion is that the behavior that you observed is actually somewhat expected.
This is due to the fact of how we currently store node properties.

Primitive node property values, for example longs, are basically stored in a paged primitive array like long[][]. This means there is basically next to no overhead for storing a single node property for every node expect for the array header.

Array node properties are stored similarly, i.e. long[][][] (note the additional dimension here).
In this case the property array basically stores a pointer to the actual property array of each node. Which introduces the overhead of basically an additional long value per node on 64bit machine.

However, I have been so far unable to reproduce the huge gap between the two variants that you observed.

Could you supply some additional information like:

  • The numer of nodes
  • The projection query
  • The data type stored for every node (long, double, float)

How did you obtain the size measurement, is it from gds.graph.list / the return fields of the projection query?

Thanks a lot!
Best, Max