Pregel: Store Messages in Nodes

Hi everyone,

I am currently trying to write a procedure using the Pregel-API, where I would like to use the compute-method, to store each of the incoming messages for a node. However, I am kind of struggling with the Messages-object right now. The problem lies in the fact, that the Messages-class implements an Iterable and the node-schema only allows for an array of doubles to be stored. Hence, I must somehow cast the Messages-Iterable into an array of doubles. Since this happens on each node, I am looking for a rather efficient way of doing so. E.g. for the schema below, how to I get to the array of doubles defined by REACHEDBY-field...

@Override
public PregelSchema schema (DistanceConfig config) {
    return new PregelSchema.Builder()
        .add(REACHEDBY, ValueType.DOUBLE_ARRAY, Visibility.PUBLIC)
        .add(DISTANCE, ValueType.DOUBLE, Visibility.PUBLIC)
        .build();
}

... From the Messages-object passed to the compute-method:

@Override
public void compute(ComputeContext<DistanceConfig> context, Messages messages) {
    .
    .
    .
    else if (step <= maxIter) {
        if (!messages.isEmpty()) {
                ??? What to do here ???
    }
    .
    .
    .
}

Has anyone else any experience with that? What would be the preferred way of performing this kind of operation?

Thanks a lot,

Lorenz

Hi Lorenz.

At the moment, we do not expose the number of messages, i.e., you have to consume them in order to figure out how many there are. This means you can't allocate an array upfront. Without changing the API, I see two options:

  1. Collect the messages in a list and turn them into an array afterwards (memory overhead)
  2. Allocate an array and grow it if necessary. That would require you to store the actual length of the array in a second node value.

Looking at the node value name, I assume you want to store all nodes that are connected to that node? In theory, that might explode quickly and for a connected graph you'd end up with quadratic space. Maybe something to think about. If you tell me about the general idea of your implementation I might be able to suggest a different approach.