Process ten of thousands of merge commands

Hi Neo4j Community,

I'm relatively new to Neo4j and am seeking advice from more experienced users on how to improve the performance of inserting hierarchical data into my database.

I'm working on a project where I consume a stream of items, each represented by its path from root to leaf, along with a timestamp indicating when it was received. The data has a hierarchical structure, and I aim to represent this in a variety-tree-like structure within my Neo4j database.

For example, an Item might be represented as an array with a path and a timestamp, like so: (Array(a,b,c,d), timestamp).

Here's an example of my current Cypher statement for inserting an item, where some fields are static (unchanging) and others are dynamic (unique to each item):

MERGE (a1)-[:N]->(a2:D2 {l: '<dynamic>'})
MERGE (a2)-[:N]->(a3:D3 {l: '<static>'})
...
MERGE (a9)-[:N]->(a10:D10 {l: '<dynamic>'})
MERGE (a10)-[:N]->(a11:D11 {l: '<dynamic>'})
ON CREATE SET a11.v = [<timestamp>]
ON MATCH SET a11.v = a11.v + <timestamp>

This approach represents how I update or insert an item into the database.

However, when dealing with approximately 20,000 such statements, needing to be executed in a transaction, I encounter significant performance issues. I'm using the Neo4j Java driver, constructing a single large query and executing it in an asyncSession.

When executing these statements periodically throughout a minute, it takes about 20-30 seconds for the operations to complete. However, attempting to perform all insertions at once at the end of the minute results in the operation not completing at all.

I'm looking for any suggestions on how to optimize this process. Is there a more efficient way to structure my queries or manage transactions to handle this volume of data more effectively? I'm starting to feel a bit desperate for a solution.

Thank you in advance for any advice or insights you can provide!

Let’s see if I understand.

Are you getting the stream and adding one Item per stream element?  
Are you trying to batch stream elements and adding multiple Items in one operation? 
In your example, the array elements represent path elements, where each is one hop?  Are you saying you have Items with 20,000 path elements?

You may want to consider writing a custom procedure to process one item. It will probably be faster than a sequence of match statements to traverse your graph and update it. It is because you can work at the entity level with the Java API. You need access to the neo4j server(s) in order to install your custom plugin, so it is not an option if you are using Aura.

Well, I explained it a bit off. It's not a stream but and Array as well.

I'm consuming a stream aggregating items to array and then with 1 minute period I try to upload this all at once to neo4j.

So I have an array of let's say 20 000 items that needs to be inserted into DB every minute.

Are you getting the stream and adding one Item per stream element?
Yes

Are you trying to batch stream elements and adding multiple Items in one operation?
I haven't tried batching of elements yet.

In your example, the array elements represent path elements, where each is one hop? Are you saying you have Items with 20,000 path elements?
Yes, I have 20 000 path elements. where path represents the path from root to leaf and is around 10 hops.

Can you show me an example of such procedure?

One thing that could be improved is that after first merge that doesn't match anything, I can safely call create for the rest of the path because it will be a totally independent branch in the tree, and therefore it doesn't need matching.

Example:
path1: "a/b/1/..."
path2: "a/b/2/...:

MERGE (a1:D0 {l: "a"})-[:N]->(a2:D1 {l: "b"}) - this finds already existing relationship
MERGE (a2)-[:N]->(a3:D3 {l: "2"}) - this fails so every merge statement after this is waste of resources and could be rewritten to CREATE (a3)-[:N]->(...)-[:N]->(...)

Can you give me a concrete example of an array you want to process?

In your example paths, does the path describe the entire desired path for one item?

In some cases, the tail of the path may not already exist, and needs to be created?

When the segment of the start of the path exists, are you updating any of it, or just navigating to the part where the rest of the path does not exists and needs to be created?

Are these hierarchical trees originating from a root node?

I can give you an example custom procedure once I understand the input data and desired outcome?

Can you give me a concrete example of an array you want to process?

  • What do you mean by concrete example of an array?
    In my code it's defined Vector<(Vector<String>, Instant)>

In your example paths, does the path describe the entire desired path for one item?

  • Yes

In some cases, the tail of the path may not already exist, and needs to be created?

  • Yes exactly. even root node may not exist but probably already does from previous insertions.

When the segment of the start of the path exists, are you updating any of it, or just navigating to the part where the rest of the path does not exists and needs to be created?

  • Just navigating, essentially I want to find the first node that doesn't yet exist and create the rest of the path.

Are these hierarchical trees originating from a root node?

  • Yes.

I'll try to explain what the program does:

Lets say: this is how is the tree structure looking in DB so far:

    A
   / \
  1   2
  |   |
  C   C

We receive these items to insert.

item1: Vector("A", "3", "D"), 1970-01-01T00:00:00
item2: Vector("A", "1", "C"), 1970-01-01T10:00:00
item3: Vector("B", "E", "C", "10"), 1970-01-01T12:00:00
item4: Vector("B", "E", "C", "20"), 1970-01-01T14:00:00

the structure after update:

      A                B
   /  |  \             |
  1   2  3             E
  |   |  |             |
  C   C  D             C
                      /  \  
                     10  20

Does this help clear things out?

Yes, that clarifies it a lot.

Statements to confirm:

  1. The input would be a list of items, where each item is a list of strings?
  2. The strings represent the value of the each nodes 'l' property?
  3. The data represents each nodes 'v' property.
  4. Is the size of the list of items the one that could be 20,000?

If my understanding is correct, I would consider creating a custom procedure that takes the list of lists as an input and creates the paths. The item lists have a lot of duplicate segments, so I would consider processing the lists of lists into a map representing a parent and its children. I would then recursively navigate the data to create the trees more efficiently.

I can give you a sample procedure once I have the new answers.

1. The input would be a list of items, where each item is a list of strings?

  • Yes.

2. The strings represent the value of the each nodes 'l' property?

  • Yes

3. The data represents each nodes 'v' property.

  • The β€˜v’ property has only leaf node.

4. Is the size of the list of items the one that could be 20,000?

  • Yes

Also i would like to add that I try to assign each string a node that captures it depth (:D) from the root

Meaning if I have an item that is represented by Vector(β€œA”, β€œ1”, β€œB”) and timestamp β€œ1970-01-01T10:00:00”

I want to create these nodes (if they don’t exist):

:D0 with property l: β€œA”
:D1 with property l: β€œ1”
:D2 with property l: β€œC” and v: [β€œ1970-01-01T10:00:00”]

If I’m updating these (the path already exists in DB) I would just append the timestamp to the leaf node.
Vector(β€œA”, β€œ1”, β€œB”) and timestamp β€œ1970-01-01T11:00:00”

:D0 with property l: β€œA”
:D1 with property l: β€œ1”
:D2 with property l: β€œC” and v: [β€œ1970-01-01T10:00:00”, β€œ1970-01-01T11:00:00”]

You really want to encode the depth with a particular label? i think a depth property would be better.

isn't there an inherent problem with tracking the depth statically, what happens when a update inserts a new node in the chain or a root node becomes the child of a root node? is it possible that a nodes depth changes making static depth values an issue to update? Typically the depth would be derived when you query for a child node.

Sure, I don’t see a big difference whether it’s a node label or property. Feel free to encode it as property.

Maybe my tought proces was that if I encode root as :D0 it's easier for neo4j to find because it's looking on relatively small subset of nodes instead of searching the node by it's property value.

isn't there an inherent problem with tracking the depth statically, what happens when a update inserts a new node in the chain or a root node becomes the child of a root node?

  • Nope the depth of a node NEVER changes. The only thing that changes is array of leaf node (it get's updated).

What do I do when I have two multiple timestamps, as you have in this example?

:D0 with property l: β€œA”
:D1 with property l: β€œ1”
:D2 with property l: β€œC” and v: [β€œ1970-01-01T10:00:00”, β€œ1970-01-01T11:00:00”]

Ok, here is something quick-and-dirty I put together to give you an example of the most important concepts: finding a node, creating a node, and creating a relationship between two nodes. I returned the created nodes so you can see how to return data and so it would display the results if you executed it in Neo4j desktop.

I set the 'v' property of the last node in the list to the passed timestamp.

This is not optimized, nor probably how I would do it in production. I would probably look at a recursive algorithm to compact the code.

package customProcedures;

import org.neo4j.graphdb.Label;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.logging.Log;
import org.neo4j.procedure.*;

import java.time.ZonedDateTime;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
import java.util.stream.Stream;

public class MergeListOfNodesProcedure {

    @Context
    public Log log;

    @Context
    public Transaction tx;

    @Procedure(name = "custom.mergeList", mode = Mode.WRITE)
    @Description("Merge and link a list of nodes specified by their parameter 'l'")
    public Stream<MergeResult> mergeList(@Name("nodeLabel") String labelString, @Name("list") List<String> nodeIds, @Name("timestamp") String timestamp) {

        Objects.requireNonNull(nodeIds);
        Objects.requireNonNull(timestamp);
        Objects.requireNonNull(labelString);

        if (!nodeIds.isEmpty()) {
            Label label = Label.label(labelString);
            List<Node> results = processList(label, nodeIds, timestamp);
            return results.stream().map(MergeResult::of);
        } else {
            return Stream.empty();
        }
    }

    private List<Node> processList(Label label, List<String> nodeIds, String timestamp) {
        String rootNodeId = nodeIds.get(0);
        Node rootNode = mergeRootNode(label, rootNodeId);
        List<String> otherNodeIds = nodeIds.stream().skip(1).toList();
        if (!otherNodeIds.isEmpty()) {
            List<Node> nodes = new ArrayList<>();
            nodes.add(rootNode);
            Node parentNode = rootNode;
            for (String childNodeId : otherNodeIds) {
                Node childNode = mergeChildNode(label, parentNode, childNodeId);
                nodes.add(childNode);
                parentNode = childNode;
            }
            parentNode.setProperty("v", ZonedDateTime.parse(timestamp));
            return nodes;
        } else {
            return Collections.singletonList(rootNode);
        }
    }

    private Node mergeRootNode(Label label, String nodeId) {
        Node rootNode = tx.findNode(label, "l", nodeId);
        if (rootNode == null) {
            rootNode = tx.createNode(label);
            rootNode.setProperty("l", nodeId);
        }
        return rootNode;
    }

    private Node mergeChildNode(Label label, Node parentNode, String childNodeId) {
        Node childNode = tx.findNode(label, "l", childNodeId);
        if (childNode == null) {
            childNode = tx.createNode(label);
            childNode.setProperty("l", childNodeId);
        }
        parentNode.createRelationshipTo(childNode, RelationshipType.withName("N"));
        return childNode;
    }

    public static class MergeResult {
        public Node node;

        private MergeResult(Node nodes) {
            this.node = nodes;
        }

        public static MergeResult of(Node nodes) {
            return new MergeResult(nodes);
        }
    }
}

The custom procedure processes one list of nodes. The way to process your list of lists is to unwind the list of lists into rows of a single list and call the custom procedure for each list. Here is an example.

with [
    ["a1", "b1", "c1", "2020-01-01T10:00:00-05:00"],
    ["a2", "b2", "c2", "2022-01-01T10:00:00-05:00"],
    ["a3", "b3", "c3", "2023-01-01T10:00:00-05:00"]
] as lists
unwind lists as list
with list[..3] as ids, list[3] as timestamp
call custom.mergeList("D", ids, timestamp) yield node
return node

You can modify the behavior of the custom procedure to take the entire list of lists and process it at once. It is up to you. I do like the approach I took though, as it is more flexible.

I created two unit tests to test it worked, as well as give you an example of how to setup the test harness.

package customProcedures;

import org.junit.jupiter.api.*;
import org.neo4j.driver.*;
import org.neo4j.driver.types.Node;
import org.neo4j.harness.Neo4j;
import org.neo4j.harness.Neo4jBuilders;

import java.time.ZonedDateTime;
import java.util.List;
import java.util.stream.Collectors;

import static org.junit.jupiter.api.Assertions.*;

class MergeListOfNodesProcedureTest {
    static Driver driver;
    static Neo4j neo4j;


    @BeforeAll
    static void setup_db() {
        neo4j = Neo4jBuilders.newInProcessBuilder()
                .withProcedure(MergeListOfNodesProcedure.class)
                .build();

        driver = GraphDatabase.driver(neo4j.boltURI(), Config.builder()
                .withoutEncryption()
                .build());
    }

    @AfterAll
    static void tear_down() {
        driver.close();
        neo4j.close();
    }

    @BeforeEach
    void delete_data() {
        try (Session session = driver.session()) {
            session.run("match(n) detach delete n");
        }
    }

    @Test
    void test_list_of_three_non_existent_nodes() {
        String cypher = """
                call custom.mergeList("D", ["a", "b", "c"], "1970-01-01T10:00:00-05:00") yield node
                return node""";

        List<Node> nodes = getCypherResults(cypher);
        assertEquals(3, nodes.size());
        assertTrue(nodes.stream().allMatch(x->x.hasLabel("D")));
        assertIterableEquals(List.of("a", "b", "c"), nodes.stream().map(v -> v.get("l").asString()).sorted().collect(Collectors.toList()));
        assertTrue(nodes.stream().filter(x->"c".equals(x.get("l").asString())).allMatch(y->ZonedDateTime.parse("1970-01-01T10:00:00-05:00").equals(y.get("v").asZonedDateTime())));
    }

    @Test
    void test_list_of_three_nodes_with_first_two_existing() {
        String cypher = """
                create(n1:D{l:"x"})
                create(n2:D{l:"y"})
                create(n1)-[:N]->(n2)
                with n1, n2
                call custom.mergeList("D", ["x", "y", "z"], "2020-11-01T10:24:15-05:00") yield node
                return node""";

        List<Node> nodes = getCypherResults(cypher);
        assertEquals(3, nodes.size());
        assertTrue(nodes.stream().allMatch(x->x.hasLabel("D")));
        assertIterableEquals(List.of("x", "y", "z"), nodes.stream().map(v -> v.get("l").asString()).sorted().collect(Collectors.toList()));
        assertTrue(nodes.stream().filter(x->"c".equals(x.get("l").asString())).allMatch(y->ZonedDateTime.parse("2020-11-01T10:24:15-05:00").equals(y.get("v").asZonedDateTime())));
    }

    private List<Node> getCypherResults(String cypher) {
        try (Session session = driver.session()) {
            Result result = session.run(cypher);
            return result.list(x -> x.get("node").asNode());
        }
    }
}

I can send you the IntelliJ project via email if you want it.

Property v is of type List<Long>. I wan't to create this property if it not exists otherwise I want to append the timestamp to list.

So you will always receive one, but the property β€œv” is a list and the list is updated with new values. Got it.

You inspired me tbh, I didn't know about the custom java-procedures that could be created.

I've come up with possible solution, the only thing I don't like is the iteration, because possibly there could me 10 000s of thousands of relationships later on.

public class UpdatePathIter {

    private enum RelTypes implements RelationshipType {
        NEXT
    }
    @Context
    public Transaction tx;

    @Procedure(name = "updatePathIter", mode = Mode.WRITE)
    public void upsertPath(@Name("path") List<String> path, @Name("value") Long value) {
        Node previousNode = getOrCreateNode(path.get(0), "D0");

        for (int i = 1; i < path.size(); i++) {
            String propertyValue = path.get(i);
            Node nextNode = previousNode.getRelationships(Direction.OUTGOING, RelTypes.NEXT).stream()
                    .map(Relationship::getEndNode)
                    .filter(endNode -> propertyValue.equals(endNode.getProperty("l")))
                    .findFirst()
                    .orElse(null);
            if (nextNode == null) {
                nextNode = tx.createNode(Label.label("D" + i));
                nextNode.setProperty("l", propertyValue);
                previousNode.createRelationshipTo(nextNode, RelTypes.NEXT);
            }
            previousNode = nextNode;
        }
    }

    private Node getOrCreateNode(String propertyValue, String label) {
        Node node = tx.findNode(Label.label(label), "l", propertyValue);
        if (node == null) {
            node = tx.createNode(Label.label(label));
            node.setProperty("l", propertyValue);
        }
        return node;
    }
}

Nice. I did it quickly and forgot to search for the children from among the related nodes. I just search for the node. You got it right.

Anyway, iterating over the relationships is required to traverse the graph. You are doing everything manually at this level.

Nodes with that many relationships will impact performance when search for the next node.

My POC was to demonstrate what is possible. You need to optimize execution. Your example graphs you showed had lists where they shared the first N nodes in common and only the leaves were updated. It would be wasteful to traverse these same paths multiple times just to update the individual leafs. Your input data represented as lists of paths does not facilitate optimal execution, as each is processed independently. You could consider preprocessing the list of lists into new data structures optimized for traversing the graph so you visit each node just once.

Here is one suggestion. You iterate through your lists and maintain one list of the root nodes (the first node in each list) and a map of nodes and a list of their child nodes. In this context I am referring to a node as the value of the property that identifies the node, as we don’t have the actual nodes at this point. The first structure would be a Set<String> and the map would be a Map<String, Set<String>>. You would create a recursive algorithm that starts with a root node and traverses the graph by doing the following things at each iteration: 1) find or create the current node, 2) retrieve the list of child nodes from the map for the current node, 3) create/find/link each child node, 4) call your recursive method once for each child, and 5) return from the method when the child list is empty (this will terminate the recursion). Perform this for each root node in the set of root nodes.

Also, when you are searching for the child nodes by iterating through the relationships, don’t look for one at a time, as you will iterate through the same list several times. Instead look for any of the children for each relationship of a parent node and keep track of your progress so you can break out once all are found.

Just some thoughts.

I created a procedure that utilizes two maps:

  • nodes is a Map<Node, Map<String,Node> that has key Node and value Map<String,Node> where each node name has it's reference.
  • rootNodes a simple Map<String,Node> map to quickly find root node and start the iteration.

First thing in each iteration is finding map of previous node, this may have been previously filled and has node I'm looking for. If this map doesn't exists I fill it up with every neighor.

I thought this would be sufficient but it's not, do you see room for improvement here?

Also I wanted to ask couple question how this works:

  • When this procedure finishes it commits the results or not?
  • Does this get commited once when it finishes or there are 4 individual commits.
  • Also when I call it for the second time I'm getting this error
    • Failed to invoke procedure updatePathIter_V2: Caused by: org.neo4j.graphdb.NotInTransactionException: The transaction has been closed.
UNWIND [[['a1', 'b1', 'c1'], 10],
        [['a2', 'b2', 'c2'], 20],
        [['a3', 'b3', 'c3'], 30],
        [['a1', 'b1', 'c1'], 40]] AS list
WITH list[0] AS path, list[1] AS value
CALL updatePathIter_V2(path, value)

I'm looking for way how to improve it and I was thinking wheter it get's commited at all or not.
I'm afraid that it's not getting commited and it's hindering my perfromance.

Current implementation

public class UpdatePathIter_V2 {

    private enum RelTypes implements RelationshipType {
        NEXT
    }
    @Context
    public Transaction tx;

    private static Map<Node, Map<String, Node>> nodes = new HashMap<>();
    private static Map<String, Node> rootNodes = new HashMap<>();

    @Procedure(name = "updatePathIter_V2", mode = Mode.WRITE)
    public void upsertPath(@Name("path") List<String> path, @Name("value") Long value) {
        Node previousNode = getOrCreateNode(path.get(0), "D0");

        for (int i = 1; i < path.size(); i++) {
            String propertyValue = path.get(i);
            Map<String, Node> nodeMap = nodes.computeIfAbsent(previousNode, k -> new HashMap<>());
            Node nextNode = null;

            if (!nodeMap.isEmpty()){
                nextNode = nodeMap.get(propertyValue);
            } else {
                List<Node> neighbors = previousNode.getRelationships(Direction.OUTGOING, RelTypes.NEXT).stream().map(Relationship::getEndNode).toList();
                for (Node n : neighbors) {
                    String l = (String) n.getProperty("l");
                    nodeMap.put(l, n);
                    if (l.equals(propertyValue)) {
                        nextNode = n;
                    }
                }
            }

            if (nextNode == null) {
                nextNode = tx.createNode(Label.label("D" + i));
                nextNode.setProperty("l", propertyValue);
                previousNode.createRelationshipTo(nextNode, RelTypes.NEXT);
                nodeMap.put(propertyValue, nextNode);
            }
            previousNode = nextNode;
        }

        insertValue(previousNode, value);
    }
    private void insertValue(Node node, long value) {
        long[] values = (long[]) node.getProperty("v", null);
        if (values == null) {
            node.setProperty("v", new long[]{value});
        } else {
            long[] newValues = Arrays.copyOf(values, values.length + 1);
            newValues[values.length] = value;
            node.setProperty("v", newValues);
        }
    }

    private Node getOrCreateNode(String propertyValue, String label) {
        Node node = rootNodes.get(propertyValue);
        if (node == null) {
            node = tx.findNode(Label.label(label), "l", propertyValue);
            if (node == null) {
                node = tx.createNode(Label.label(label));
                node.setProperty("l", propertyValue);
                rootNodes.put(propertyValue, node);
            }
        }
        return node;
    }
}

Answers to questions:

  • When this procedure finishes it commits the results or not?

The procedure is running within the transaction of the cypher query. That is what the custom procedure is being injected with (@Context Transaction tx). If you are running this within Neo4j browser, then the transaction will be auto committed when the query ends (unless there was an error). If you are using a driver to execute the query, it will depend on how you are are executing it. If you are using "session.run()" to execute the query (which you should not do in production), then it will is auto-committed when the query ends. If you are using a transaction function, it will commit when the return from the function, unless an exception was through or you rollback the transaction.

  • Does this get commited once when it finishes or there are 4 individual commits.

The example you show with the 'unwind' is one cypher query, which calls the procedure four times. It is one transaction, so one commit.

  • Also when I call it for the second time I'm getting this error

I suspect your error is do the the static maps you defined with Nodes in them. The Node values you find/create with a custom procedure are bound to the transaction. Once the transaction closes, those nodes are no longer accessible. You would have to search for them again within the new transaction. Notice how the find and create methods are called on the transaction 'tx'. What I believe is happening here is that your first call finds nodes and adds them to the maps. The maps are static, so the same map with the nodes is used then next time you call this method in a new cypher query. The nodes are not longer available, since it is a new transaction. This would explain the error. You can not use static data structures like this in a custom procedure. You could cache stuff not related to a transaction, but this is a very poor design, as how are you going to expire items from these hash maps over time. They will always be non-decreasing in size. You would need to use a real cache that has expiration/evacuation policies. Such examples would be Caffeine cache, Google's guava, or Ehcache to name a few.

Review of code:
I do not recommend you try to save state and share across calls to your procedure. In my suggestion, I was referring to sending the "list of lists" to the procedure and having it process all the paths in one call. You can them share the data between paths to make it more efficient. You could refactor your existing code to move the two static maps to within the procedure itself, so they are method variables. You can then pass the list of list in one call, instead of the example where you call the procedure multiple times, once per lists. You can then refactor your code to do what you have for each path, but doing so with the hash maps that will be accessible while processing each path. This will eliminate the exception.

Your approach is a little different than I had suggested, but that is ok. I was suggesting you process the paths as "lists of lists of strings" within the procedure and build data structures to organize the data into a hierarchical tree structure that you could then recursively navigate to find the nodes and update the graph. Your approach differs, as you start navigating the paths immediately and cache the nodes so you don't have to find them again. It is ok too.

In your approach, you are proactively getting all nodes related to the current "previous" node and caching them. In some situations this could be inefficient. Let's say you collections of paths only extends one of the related nodes and this node is returned in the beginning of the list of relationships. The one you need is find early, but are iterating over all the remaining related nodes and adding them to your cache. This is not so bad if the number of related nodes is small, but you mentioned there could be thousands.

The differencing difference with my suggested approach is that it navigates all the paths as strings and builds a data structure that has all the parent/child relationships in it. As such, when you navigate it to update your graph, you know exactly which nodes you need from a parent node. As such, you can search for them and break out once you find the small set you need.

Anyways, there are always multiple ways of approaching a problem. Choose the path that works for you.

  • Currently I had this implementation.
driver.session(classOf[AsyncSession]).runAsync(queries)
  • But from what you are saying the optimal approach would look like this?
driver.session(classOf[AsyncSession]).executeWriteAsync(tx => tx.runAsync(query))

Aren't those equivalent? What is the difference?

The query looks like this, where pathList represents paths, and valueList values..

val query = s"CALL updatePathIter_V3([$pathList],[$valueList])"
  • Well that's one problem this has but also it may not be found till very end so these should balance things out.
  • I was thinking of storing just the nodes I've needed to iterate over therefore, but that would pose a problem that I could have stored the node with just few of its' neighbors and when I go through it again it won't have the neighbor I'm looking for stored, and therefore I would need to iterate the array over again which would be stupid.
  • I acknowledge that the cachce is poorly designed and thanks for the tips.
    I'll try to use the real cache because creating this type of cache just for one transaction is waste and it's true potential lies when it gets reused.

Your approach:

I would really like to test your approach but I'm yet to understand it.

The 3rd point says that I need to find child node if it exists and if not I should create it + link it.

What if I get these lists to update:

[["A", "B", "1"], 10],
[["C", "B", "2"], 10]

Let's assume I'm iterating the second list now.
Imho it would break on the 3. point of algorithm. because I would try to find the node based on:

  • l property + D<number> label.

It would find node B which is wrong because it's part of different tree.

Findings:

  • I was wondering what was taking my impl. so long to process quite a few records and it seems that until my data are processed on neo4j server it takes ages.

I added logs to my custom procedure and it's quite fast but the communication takes way too long.

This is how long does it take neo4j to process lists.

This is how long it takes in my app

The session.run()is an auto-commit, or unmanaged transaction. The two differences I am aware of is 1) it only can execute on cypher query, and 2) it does not have retry logic. The executeWritetransaction can have multiple cypher queries and it has retry logic.

You will not be able to cache the nodes even if you used a "real" cache, as the nodes are not accessible once the transaction is closed. As such, you will not be able to utilize the cache on subsequent calls.

I will respond to the rest of the items tomorrow.

In regard to your responses times, are you returning data in your request? If so, they may be the issue causing the large discrepancy in the execution time of the custom procedure versus the execution time of your application.