Batch Update Getting Slower

Hi guys,

Following the popular article https://dzone.com/articles/tips-for-fast-batch-updates-of-graph-structures-wi. I'm doing my batch update. I'm using java API to update it and my query looks like:

UNWIND $props as row MERGE (n:Entity{eid:row.eid}) ON MATCH set n += row.properties

I have a 1 million records data set and I'm using a batch size as 10K, each time I send 10K records as a list of maps to my parameters. This approach worked fine at the beginning, but it got really slower or even stuck after 2 or 3 batch.

I have Index on Entity and eid, I used browser to test my query, and it looks like:

For the newly added node, when I tried to match them with label Entity and property eid, the index is also used. So I believe my problem is not due to index.

I tried to use smaller batch size as 1K, but the same problem occurred, after around 50 batch, the update got really slow.

What can I do to solve this problem? Any idea will be well appreciated.

It looks like you have a Unique index on :Entity(eid), but it might be good to share the cypher for the index, to have a second set of eyes on it, just in case.

Beyond that we may need to understand more about your data and graph model (and maybe neo4j server) in order to provide suggestions, the challenge may well be something specific to your use case...

Hi Joel,

Thanks for replying! I'm a bit confused by your wording "share the cypher for the index", could you please explain more?

Also, following is a example of the data I'm using:
535317769 泛娱乐;音乐;专辑 [{"ATTRIBUTE":"ID","VALUE":"12978970","VALUE_TYPE":"0","PROPERTY_ID":"1"},{"ATTRIBUTE":"name","VALUE":"为爱而爱","VALUE_TYPE":"0","PROPERTY_ID":"2"},{"ATTRIBUTE":"out_degree","VALUE":"1","VALUE_TYPE":"1","PROPERTY_ID":"3335"},{"ATTRIBUTE":"MID","VALUE":"003O6Mow2G8KMz","VALUE_TYPE":"0","PROPERTY_ID":"1094"},{"ATTRIBUTE":"rank","VALUE":"0.4701","VALUE_TYPE":"2","PROPERTY_ID":"3334"},{"ATTRIBUTE":"in_degree","VALUE":"1","VALUE_TYPE":"1","PROPERTY_ID":"3336"},{"ATTRIBUTE":"type_ids","VALUE":"4_33_103","VALUE_TYPE":"0","PROPERTY_ID":"7483"},{"ATTRIBUTE":"icon","VALUE":"http://y.gtimg.cn/music/photo_new/T002R300x300M000003O6Mow2G8KMz.jpg","VALUE_TYPE":"0","PROPERTY_ID":"1486"},{"ATTRIBUTE":"权威度","VALUE":"1.0","VALUE_TYPE":"0","PROPERTY_ID":"13"},{"ATTRIBUTE":"演唱者","VALUE":["王思颖"],"VALUE_TYPE":"0","PROPERTY_ID":"1115"},{"ATTRIBUTE":"资源ID","VALUE":["{"资源来源":"QQ音乐","ID":"003O6Mow2G8KMz","URL":"https://y.qq.com/n/yqq/album/003O6Mow2G8KMz.html"}"],"VALUE_TYPE":"0","PROPERTY_ID":"7585"},{"ATTRIBUTE":"categories","VALUE":"泛娱乐_音乐_专辑","VALUE_TYPE":"0","PROPERTY_ID":"-1"},{"ATTRIBUTE":"发行时间","VALUE":"2020年6月13日","VALUE_TYPE":"3","PROPERTY_ID":"1100"},{"ATTRIBUTE":"发行时间戳","VALUE":"1591977600","VALUE_TYPE":"1","PROPERTY_ID":"3439"},{"ATTRIBUTE":"名称","VALUE":"为爱而爱","VALUE_TYPE":"0","PROPERTY_ID":"2"},{"ATTRIBUTE":"重要度","VALUE":"0","VALUE_TYPE":"2","PROPERTY_ID":"15"}]

The first column is the eid, and the last column is the properties I want to update to my server. I'm using java to process these data into lists of maps (depending on the batch size), and use them as parameters. Following is my code:

public void batchUpdate( final List<Map<String,Object>> maps )
    {
        try ( Session session = driver.session() )
        {

            String signal = session.writeTransaction( new TransactionWork<String>()
            {
                @Override
                public String execute( Transaction tx )
                {

                    Map<String,Object> params = new HashMap<>();
                    try {

                        params.put( "props", maps );
                    } catch (Exception e) {
                        System.out.println("File read problem");
                    }

                    StatementResult result = tx.run("UNWIND $props as row MERGE (n:Entity{eid:row.eid}) ON MATCH set n += row.properties", params);
                    return "done";
                }
            } );
            System.out.println( "one transaction " + signal );
        }
    }