Hi !
I'm currently working on a custom Neo4j procedure to perform a specialized graph traversal that's not feasible with standard Cypher queries. To optimize performance, I've implemented a multi-threaded solution using ForkJoinPool
. However, this approach has introduced some perplexing bugs that disappear when I remove the multi-threading.
While traversing the graph, i'm retrieving a name
property on some relationships.
The issue is that this implementation raises very strange exceptions:
Property 'name' not found on relationship
Caused by: org.neo4j.exceptions.UnderlyingStorageException: Access to record Node[77753,used=false,created=false,rel=-1,prop=-1,labels=Inline(0x0:[]),light,fixedRefs=false] went out of bounds of the page. The record size is 15 bytes, and the access was at offset 3315 bytes into page 142, and the pages have a capacity of 8192 bytes. The mapped store file in question is /data/databases/neo4j/neostore.nodestore.db
Caused by: org.neo4j.graphdb.NotFoundException: No such property, 'hash'.
Example implementation
This is an example of what the implementation looks like:
package example;
import org.neo4j.graphdb.*;
import org.neo4j.procedure.*;
import org.neo4j.logging.Log;
import java.util.*;
import java.util.concurrent.*;
import java.util.stream.Stream;
public class ParallelTraversalProcedure {
@Context
public Transaction tx;
@Context
public Log log;
private static final ForkJoinPool forkJoinPool = new ForkJoinPool();
@Procedure(value = "example.parallelTraversal", mode = Mode.READ)
@Description("Parallel traversal to reproduce property access bug")
public Stream<TraversalResult> parallelTraversal(
@Name("startNodeId") long startNodeId) {
ConcurrentLinkedQueue<TraversalResult> results = new ConcurrentLinkedQueue<>();
Node startNode = tx.getNodeById(startNodeId);
forkJoinPool.invoke(new TraversalTask(startNode, "", results));
return results.stream();
}
private class TraversalTask extends RecursiveAction {
private final Node node;
private final String path;
private final ConcurrentLinkedQueue<TraversalResult> results;
TraversalTask(Node node, String path, ConcurrentLinkedQueue<TraversalResult> results) {
this.node = node;
this.path = path;
this.results = results;
}
@Override
protected void compute() {
List<TraversalTask> subTasks = new ArrayList<>();
for (Relationship r : node.getRelationships(Direction.OUTGOING)) {
try {
String name = (String) r.getProperty("name");
String newPath = path.isEmpty() ? name : path + "/" + name;
Node childNode = r.getEndNode();
results.add(new TraversalResult(newPath, r.getType().name(), name));
// Create subtask if the child node is a Tree
if (childNode.hasLabel(Label.label("Tree"))) {
subTasks.add(new TraversalTask(childNode, newPath, results));
}
} catch (NotFoundException e) {
log.error("Property 'name' not found on relationship. " +
"Start node ID: " + node.getId() +
", End node ID: " + r.getEndNode().getId() +
", Relationship type: " + r.getType().name());
throw e; // Rethrow to stop the procedure
}
}
invokeAll(subTasks);
}
}
public static class TraversalResult {
public String path;
public String relationType;
public String name;
public TraversalResult(String path, String relationType, String name) {
this.path = path;
this.relationType = relationType;
this.name = name;
}
}
}
So at this point, I don't understand how the graph traversal, even if it's multi-threaded and might have concurrency or synchronization issues, can face a situation where a relationship property is just nowhere to be found ?
Property 'name' not found on relationship. Details: Root node ID: 123, Child node ID: 36, Relationship type: HAS_CHILD_TREE, Root node labels: [Tree], Child node labels: [Tree]
I double-checked those relationships:
MATCH (n:Tree)-[r]->(b:Tree)
WHERE ID(n) = 123 AND ID(b) = 36
RETURN count(r.name)
So the name property is there, but Neo4j raises an Exception for some of them during the traversal, due to my parallelized implementation.
Questions
- What could be causing these property access errors in a multi-threaded context, even though the properties clearly exist?
- Are there any best practices or specific APIs I should be using to ensure read safety during parallel graph traversals?
- Could there be any concurrency or synchronization issues that I'm overlooking?
Any insights or suggestions would be greatly appreciated. I'm happy to provide more details or clarifications if needed.
Thanks à lot !
- Neo4j 5.23