Data not inserting into neo4j


(Kunal Goyal) #1

Hello,

I am trying to import data into neo4j.
i am having 1 million of records and i am inserting this data using Jar.
this jar calling a method which inserting the data into neo4j . i am using Merge statement for that .
but after inserting 50k records its stuck .
i am not getting any error but data also not getting inserted .
what can be wrong here
could you all please suggest


(Andrew Bowman) #2

You may want to look at means to batch transactions, 10k entries per batch tends to work well.

Are you using LOAD CSV (and if so, are you USING PERIODIC COMMIT)? Or are you using some other means? An example of your load code/query would help.


(Kunal Goyal) #3

Hi Andrew,

Thank you for your reply.
actually i am not doing batch transaction since there is some direct connectivity issue with Impala to neo4j.
right now i am using kind of witty approach.here i am getting full data from impala and parsing the result one by one and inserting it into neo4j.
when i started the process it was running fine but after 200000 records insertion now speed is like 10 node per second .
should i change some hardware configuration for this ??
if yes could you please let me know about configuration i am using separate cluster for neo4j


(Andrew Bowman) #4

The most common cause for that kind of symptom, slowing down as data is entered, is that you're missing an index on the label/property of the MERGEd nodes from your insertion queries.

It would help to supply your insertion query and an EXPLAIN of the query (with all elements expanded), which should show how your match/merges are operating and if indexes are being used for lookup. That would also help us see if there are any other things to fix.

As for the insertion, you say you're getting the full data first. If so, this is a good opportunity to do batch insertion. A separate query per insert isn't going to be performant if this is meant to be a data load. See this blog entry for optimal approaches to batching modifications/insertions into the graph.


(Kunal Goyal) #5

Yeah, Thank you andrew .I saw your blog but problem is in my case that i can't create CSV file in file system using this data ,i can store this data runtime only(like in some variable).
i need one suggestion from you . if i store my data like in below format then i can insert data in batch ?

Data =  [{name:"Alice",age:32},{name:"Bob",age:42}]

And one more query :i have index on nodes properties but Can we create index on relationships ?? as i can not find any way to do this .


(Andrew Bowman) #6

Yes, providing a list of maps like this as a parameter is the right approach for batching, see the blog entry I linked to earlier.

Even if you have an index present, you should EXPLAIN your cypher query in the browser so you can make certain your query will use that index. If the query plan doesn't show an index being used you should provide the query and plan so we can help figure out why.

Our schema indexes are currently for label/property combinations only, and don't apply to relationships. However, with the recent Neo4j 3.5 release, we introduced supported full text indexing (automatically updating as your graph data changes), which can be used to index based on relationship type + property.


(Kunal Goyal) #7

okay..:slight_smile:
Then if my table has 10 million records and i am storing it into a list of map then it works ??


(Kunal Goyal) #8

@andrew.bowman sorry i can't post data since i am working on client data.but i can give you the overview of query and index info .

i am trying to import some entity data and creating relationship with properties between them .

my data contain some column like id, name, code, display_name, active, services, product
i am creating index on Id here and my query for import like below

Merge(n:Entity{id:'id'})
Set n.name= 'name',
n.code = 'code',
n.display_name = 'display_name',
n.active = 'active'

could you please suggest me what am i doing wrong for data import


(Andrew Bowman) #9

Batching 10k records at a time is usually our recommendation, so try to break up your batches accordingly.

As for your import query, make sure you have an index on :Entity(id) so your MERGEs are quick.

If you want to only set the properties when the MERGE results in a CREATE, then use ON CREATE SET` instead of SET.


(Kunal Goyal) #10

if i will use create set then later if i want to update the properties then if will update it or create another node or property ??


(Andrew Bowman) #11

Remember that ON CREATE SET and ON MATCH SET are both clauses you can only use after a MERGE. MERGE guarantees the node will be there, so no matter if the node was created or matched to an existing node, it is now in the graph. You can use SET on it if you want. These two just allow you to do different things depending on if MERGE resulted in node creation or simply matched to an existing node.

After reviewing the official MERGE documentation, please review this knowledge base article on using MERGE to get some better clarity on what it's doing and how to work with it.


(Kunal Goyal) #12

Hi @andrew.bowman Thank you for all your help .

i need one help in batch import.
using java code now i have my jsoon data in a "MY_DATA" variable .
e.g.
MY_DATA =[{ tk_locon=0, repble=0, design=1, id=7196979,version=1, type_name=SALE, conc=0, name=SEQU,security=0, location=0},{ tk_locon=0, repble=0, design=1, id=1222,version=1, type_name=SALE, conc=0, name=qwe,security=0, location=1},--------100 Records]

now what will be the query to import this data .??


(Andrew Bowman) #13

You would pass the list as one of the parameters to the query, UNWIND the parameter list within the query and then start working with the properties of each record. Something like this, assuming the list is available under the parameter MY_DATA:

UNWIND $MY_DATA as data
Merge(n:Entity{id:data.id})
Set n.name= data.name,
n.version = data.version
...

(Kunal Goyal) #14

'You would pass the list as one of the parameter '
Sorry,i am very confuse about this
My main concern is this only that how i would pass my variable as parameter :frowning:


(Andrew Bowman) #15

If you're using the Neo4j Browser, you can use :help param to get the syntax of how to use :param. You would set this before executing your query, and not part of the query itself.

If you're using a Neo4j driver, you should consult the language guides on using the appropriate driver for your language, and how to submit a query with parameters. Typically when you execute the query, the parameter map is submitted as an additional parameter to the execution call.


(Kunal Goyal) #16

Hi @andrew.bowman

I am getting my data as in below format and data with my query

UNWIND '[{ id=12345, project_ids=50, has=ABC10}, {entity_id=859685, project_ids=50, has=DCV12}]' as row  MERGE (c:TEst{ID: row.id}) ON CREATE SET c.PROJECT_IDS=row.project_ids,c.HAS=row.has

but when i am running my query then i am facing issue with '=' that is in data and it expecting data in string . it is giving error for has that Variable ABC10 not defined
do we have any way to handle it in query side only ??


(Andrew Bowman) #17

Keep in mind that's not valid JSON format. You need to replace your = with :.

And if this is a string as opposed to a an actual JSON object (lists and maps), then you need to transform it into a format Neo4j can use. Try using apoc.convert.fromJsonList() using APOC Procedures.


(Kunal Goyal) #18

Hi @andrew.bowman
thank you for all your help. I am following your suggestions and i am having good progress with this :slight_smile:
Now i am using batch import (creating map of 20k records) . i tried it with 500 records and it is working fine .
but when i am passing data like 200000 records then it get stuck or may be taking so much time .
do i need to increase my cluster configuration??
since in future i need to insert 8 million record in neo4j. so what is ideal sys configuration for neo4j to handle this much data .??


(Andrew Bowman) #19

20k should certainly be doable for your batch sizes. You may want to do an EXPLAIN on your query and make sure it's using index lookups for your starting nodes, and not label scans or all node scans. That would indicate you need to create an index on the relevant label/property combinations that you're merging on.

For large inserts batching is a must. We usually recommend 10k-50k for batch sizes per transaction.


(Kunal Goyal) #20

@andrew.bowman

as i mentioned above , i am using below query for inserting data

UNWIND '[{ id=12345, project_ids=50, has=ABC10}, {entity_id=859685, project_ids=50, has=DCV12}]' as row  MERGE (c:Test{ID: row.id}) ON CREATE SET c.PROJECT_IDS=row.project_ids,c.HAS=row.has

here i am having index on id and i have tried with 10k batches as well but same result. it is taking so much time even i waiited for 10 min but no records inserted.
then for testing i tried with 1000 records and they have imported.

that's why i asked about sys config

please suggest