Neo4j procedure: Relationship property missing in multi-threaded graph traversal

Hi !

I'm currently working on a custom Neo4j procedure to perform a specialized graph traversal that's not feasible with standard Cypher queries. To optimize performance, I've implemented a multi-threaded solution using ForkJoinPool. However, this approach has introduced some perplexing bugs that disappear when I remove the multi-threading.

While traversing the graph, i'm retrieving a name property on some relationships.

The issue is that this implementation raises very strange exceptions:

  • Property 'name' not found on relationship
  • Caused by: org.neo4j.exceptions.UnderlyingStorageException: Access to record Node[77753,used=false,created=false,rel=-1,prop=-1,labels=Inline(0x0:[]),light,fixedRefs=false] went out of bounds of the page. The record size is 15 bytes, and the access was at offset 3315 bytes into page 142, and the pages have a capacity of 8192 bytes. The mapped store file in question is /data/databases/neo4j/neostore.nodestore.db
  • Caused by: org.neo4j.graphdb.NotFoundException: No such property, 'hash'.

Example implementation

This is an example of what the implementation looks like:

package example;

import org.neo4j.graphdb.*;
import org.neo4j.procedure.*;
import org.neo4j.logging.Log;
import java.util.*;
import java.util.concurrent.*;
import java.util.stream.Stream;

public class ParallelTraversalProcedure {

    @Context
    public Transaction tx;

    @Context
    public Log log;

    private static final ForkJoinPool forkJoinPool = new ForkJoinPool();

    @Procedure(value = "example.parallelTraversal", mode = Mode.READ)
    @Description("Parallel traversal to reproduce property access bug")
    public Stream<TraversalResult> parallelTraversal(
            @Name("startNodeId") long startNodeId) {

        ConcurrentLinkedQueue<TraversalResult> results = new ConcurrentLinkedQueue<>();
        Node startNode = tx.getNodeById(startNodeId);

        forkJoinPool.invoke(new TraversalTask(startNode, "", results));

        return results.stream();
    }

    private class TraversalTask extends RecursiveAction {
        private final Node node;
        private final String path;
        private final ConcurrentLinkedQueue<TraversalResult> results;

        TraversalTask(Node node, String path, ConcurrentLinkedQueue<TraversalResult> results) {
            this.node = node;
            this.path = path;
            this.results = results;
        }

        @Override
        protected void compute() {
            List<TraversalTask> subTasks = new ArrayList<>();

            for (Relationship r : node.getRelationships(Direction.OUTGOING)) {
                try {
                    String name = (String) r.getProperty("name");
                    String newPath = path.isEmpty() ? name : path + "/" + name;
                    Node childNode = r.getEndNode();

                    results.add(new TraversalResult(newPath, r.getType().name(), name));

                    // Create subtask if the child node is a Tree
                    if (childNode.hasLabel(Label.label("Tree"))) {
                        subTasks.add(new TraversalTask(childNode, newPath, results));
                    }
                } catch (NotFoundException e) {
                    log.error("Property 'name' not found on relationship. " +
                              "Start node ID: " + node.getId() + 
                              ", End node ID: " + r.getEndNode().getId() + 
                              ", Relationship type: " + r.getType().name());
                    throw e; // Rethrow to stop the procedure
                }
            }

            invokeAll(subTasks);
        }
    }

    public static class TraversalResult {
        public String path;
        public String relationType;
        public String name;

        public TraversalResult(String path, String relationType, String name) {
            this.path = path;
            this.relationType = relationType;
            this.name = name;
        }
    }
}

So at this point, I don't understand how the graph traversal, even if it's multi-threaded and might have concurrency or synchronization issues, can face a situation where a relationship property is just nowhere to be found ?

Property 'name' not found on relationship. Details: Root node ID: 123, Child node ID: 36, Relationship type: HAS_CHILD_TREE, Root node labels: [Tree], Child node labels: [Tree]

I double-checked those relationships:

MATCH (n:Tree)-[r]->(b:Tree)
WHERE ID(n) = 123 AND ID(b) = 36
RETURN count(r.name)

So the name property is there, but Neo4j raises an Exception for some of them during the traversal, due to my parallelized implementation.

Questions

  • What could be causing these property access errors in a multi-threaded context, even though the properties clearly exist?
  • Are there any best practices or specific APIs I should be using to ensure read safety during parallel graph traversals?
  • Could there be any concurrency or synchronization issues that I'm overlooking?

Any insights or suggestions would be greatly appreciated. I'm happy to provide more details or clarifications if needed.

Thanks à lot !

  • Neo4j 5.23

Not sure if this matters, but when you check your data with this query

You should probably include the Relationship Type:

MATCH (n:Tree)-[r:HAS_CHILD_TREE]->(b:Tree)
WHERE ID(n) = 123 AND ID(b) = 36
RETURN count(r.name)

As for creating procedures. I would defenetly make it work "single threaded" first, and make it faster only if needed (and if there are other queries running at the same time, do you really want your procedure to hog a lot of resources?).

I don't have the details in front of me right now, but I am pretty sure there is a bunch more that needs to be done to traversals in parallel.

Thank you for your reply, @hakan.lofqvist1.

I'd like to clarify a few points:

  1. The query I posted was merely to demonstrate the existence of the relationship property, despite Neo4j's complaints in the logs.
  2. I can confirm that the procedure works quite well in a single-threaded environment.
  3. However, anticipating larger graphs and concurrent requests for graph traversal, I wanted to explore a parallel implementation proactively.

Regarding your question about resource usage: My primary goal was to return results as quickly as possible. Given the nature of the traversal, a multi-core approach seemed logical.

I don't have the details in front of me right now, but I am pretty sure there is a bunch more that needs to be done to traversals in parallel.

I'd be happy to have a look at any example procedure or documentation regarding multi-threaded graph traversals in a Neo4j procedure !

Thanks a lot !

I will check but can't promise anything this close to the weekend.

@hakan.lofqvist1 there is no hurry !

Although i'd love to hack more on Neo4j this week-end, I'm not blocked by this issue :slight_smile:

Thanks again !

Are you writing integration tests with the test-harness? I think these issues can be spotted there/may shorten your feedback loop.

I think you have overly complicated this task. Typically you use a fork/join to iteratively break down the main task into smaller and smaller tasks to get to an acceptable unit of work, at which point you execute small tasks.

In your case, you are iteratively traversing the graph and completing the work at each level already. As I see it, there is no need for the fork/join usage. You are just adding overhead in creating your TraversalTasks and submitting each to execute in a thread pool. Further, the unit of work you are completing is not compute intensive.

If this was me, I would just create a procedure that traverses the graph recursively and returns the list of TraversalResult items that are derived just as you are doing now. I have written a suite of custom procedures that traverse subgraphs using recursion that calculate complex metrics along the paths and it executes very quickly.

I would be glad to help if needed.

1 Like

Here are some useful comments regarding Threads and Transactions Introduction to Neo4j Plugins • Bert Radke

Thank you @glilienfield for offering to assist.

1 Like

@glilienfield thank you very much for your feedback.

My intention was to ensure from the start that my PoC was scalable on Neo4j.
This post is actually a follow-up to an earlier post that I started here, where I first attempted to solve my problem with plain Cypher:

You are correct, my usage of ForkJoin just isn't adapted to the problem here !

Also from the testing I've done so far, the results are already solving my scalability issue, but I'm keeping this in mind in case i need to revisit this problem in a few months.

Thanks a lot ! :100: