cancel
Showing results for 
Search instead for 
Did you mean: 

Batch Update Getting Slower

yunzhoun
Node Link

Hi guys,

Following the popular article https://dzone.com/articles/tips-for-fast-batch-updates-of-graph-structures-wi. I'm doing my batch update. I'm using java API to update it and my query looks like:

UNWIND $props as row MERGE (n:Entity{eid:row.eid}) ON MATCH set n += row.properties

I have a 1 million records data set and I'm using a batch size as 10K, each time I send 10K records as a list of maps to my parameters. This approach worked fine at the beginning, but it got really slower or even stuck after 2 or 3 batch.

I have Index on Entity and eid, I used browser to test my query, and it looks like:

For the newly added node, when I tried to match them with label Entity and property eid, the index is also used. So I believe my problem is not due to index.

I tried to use smaller batch size as 1K, but the same problem occurred, after around 50 batch, the update got really slow.

What can I do to solve this problem? Any idea will be well appreciated.

2 REPLIES 2

Joel
Ninja
Ninja

It looks like you have a Unique index on :Entity(eid), but it might be good to share the cypher for the index, to have a second set of eyes on it, just in case.

Beyond that we may need to understand more about your data and graph model (and maybe neo4j server) in order to provide suggestions, the challenge may well be something specific to your use case...

Hi Joel,

Thanks for replying! I'm a bit confused by your wording "share the cypher for the index", could you please explain more?

Also, following is a example of the data I'm using:
535317769 泛娱乐;音乐;专辑 [{"ATTRIBUTE":"ID","VALUE":"12978970","VALUE_TYPE":"0","PROPERTY_ID":"1"},{"ATTRIBUTE":"name","VALUE":"为爱而爱","VALUE_TYPE":"0","PROPERTY_ID":"2"},{"ATTRIBUTE":"out_degree","VALUE":"1","VALUE_TYPE":"1","PROPERTY_ID":"3335"},{"ATTRIBUTE":"MID","VALUE":"003O6Mow2G8KMz","VALUE_TYPE":"0","PROPERTY_ID":"1094"},{"ATTRIBUTE":"rank","VALUE":"0.4701","VALUE_TYPE":"2","PROPERTY_ID":"3334"},{"ATTRIBUTE":"in_degree","VALUE":"1","VALUE_TYPE":"1","PROPERTY_ID":"3336"},{"ATTRIBUTE":"type_ids","VALUE":"4_33_103","VALUE_TYPE":"0","PROPERTY_ID":"7483"},{"ATTRIBUTE":"icon","VALUE":"http://y.gtimg.cn/music/photo_new/T002R300x300M000003O6Mow2G8KMz.jpg","VALUE_TYPE":"0","PROPERTY_ID"..."}]

The first column is the eid, and the last column is the properties I want to update to my server. I'm using java to process these data into lists of maps (depending on the batch size), and use them as parameters. Following is my code:

public void batchUpdate( final List<Map<String,Object>> maps )
    {
        try ( Session session = driver.session() )
        {

            String signal = session.writeTransaction( new TransactionWork<String>()
            {
                @Override
                public String execute( Transaction tx )
                {

                    Map<String,Object> params = new HashMap<>();
                    try {

                        params.put( "props", maps );
                    } catch (Exception e) {
                        System.out.println("File read problem");
                    }

                    StatementResult result = tx.run("UNWIND $props as row MERGE (n:Entity{eid:row.eid}) ON MATCH set n += row.properties", params);
                    return "done";
                }
            } );
            System.out.println( "one transaction " + signal );
        }
    }