OGM save node performance

performance

(Mail4mattharrison) #1

Hi, I'm very new to using the Neo4j database and OGM, but have got it basically working for what I'm doing (reading through XML documents and feeding that structure along with associated metadata into Neo4j). I'm using ogm 3.1.7 with the bolt driver.
The question I have at the moment is around the performance of inserts. In the specific case I'm looking into the session.save call is taking around about 5 seconds to complete (very roughly); this is for inserting around 70 nodes in a descending tree structure.
Does this seem like a normal amount of time for this operation? This is all running on my i7 laptop with an SSD. Is there any logging I can turn on to track what is happening?
I've seem a couple of posts on the internet suggesting its better to insert all the nodes separately and then do another insert to put in the relationships, or alternatively to use the java API rather than OGM for speed; but I wasn't sure if this more related to older OGM versions?

Not sure if this is a bit too vague, but if anyone has any suggestions around what I should be expecting performance wise, or thoughts on how to dig a little deeper that would be great.

Thanks in advance,

Matt


(Radke) #2

A few questions come to mind:
How much data do you already have in your database?
Does your save() operation includes merges?
Do you have indexes in the attributes you use in the merges?
How much memory have you assigned to neo4j (in relation to database size)?


(Mail4mattharrison) #3

Hi, thanks for getting back to me. To answer your questions:

  • About 103k nodes, and about 102k relationships
  • I think its quite likely there are merge operations, I am pulling in a few objects which I'm not modifying, will these get merged back in? Is there any kind of updated object tracking in OGM (i.e. unit of work tracking)?
  • I haven't added indexes in at all at the moment, is it beneficial to add them in, or will it make the inserts slower?
  • I haven't made any configuration changes to the server (community 3.5.2), so it looks to be using just over 2.5Gb of memory at the moment with the database saying its 124Mb (although 84Mb of that is the logical log - which I think is the transaction log?)

I will try cutting down on the data I'm loading - hopefully that will remove a lot of the possible merges. Is it possible to get OGM to log the statements its issuing?

Also I guess the meta question around this is, do I carry on with OGM, or look at doing it more with the native java API? I don't mind to much either way - just want to head in the right/best direction :slight_smile:

Thanks,

Matt


(Mail4mattharrison) #4

Hmm, OK, I got the logging working. I found the note in the docs which just says it uses logback, and then links to logback - possibly this is a bit brief, but then I found another post on the blog with an example logback file (might be worth putting an example in the docs), so got it working in the end - it was a bit trickier from me as this is embedded in a Dropwizard project, so the configuration was a bit different in my case.
Anyway. Looking at the logs, there is a command issued:
UNWIND {rows} as row MERGE (n:Element:LabeledParagraph{id: row.props.id})
SET n=row.props RETURN row.nodeRef as ref, ID(n) as id, {type} as type with params
{type=node, rows=[{nodeRef=-34, props={label=d, id=d8301edf-3d3c-4269-bb70-8f17947f824a}},
{nodeRef=-66, props={label=aa, id=dd53e725-abe0-4b00-917b-4fc478762631}},
{nodeRef=-36, props={label=e, id=b2a15783-7152-41da-93b5-e4b2ff286693}},
{nodeRef=-38, props={label=f, id=f56c7ddc-6caa-45b1-9329-540618b99087}},
{nodeRef=-102, props={label=iv, id=9eb86c36-3d45-4266-9f65-ccb94b0e911e}},
{nodeRef=-104, props={label=A, id=e568aa5f-7a10-4bdb-b0d0-182bf6cfb9c0}},
{nodeRef=-42, props={label=a, id=73c80757-4fea-42e1-bc8e-5097733721bb}},
{nodeRef=-74, props={label=ia, id=e0525c0f-e399-4c4d-ac10-97b5ed6dbf8f}},
{nodeRef=-106, props={label=B, id=c2dbbae9-8e00-42fc-b2ef-bb8c1a5ff2e0}},
{nodeRef=-44, props={label=b, id=716e6c2a-55d5-4fdd-8a68-54d2b674c880}},
{nodeRef=-46, props={label=c, id=94b35511-020f-49f9-b03e-c15708e8de9a}},
{nodeRef=-48, props={label=d, id=0bbcaa1e-a867-4b45-a399-2741b6bfdf64}},
{nodeRef=-50, props={label=e, id=29262d29-8308-4902-960b-445bab39646e}},
{nodeRef=-52, props={label=f, id=de351745-9b7d-4fd9-9db9-59b4a20a1f5c}},
{nodeRef=-22, props={label=a, id=9e6aaec4-f0a7-49f1-80e7-a669f977ed4b}},
{nodeRef=-54, props={label=g, id=05919bb0-f9e0-44c7-b88e-b03aa6e686e2}},
{nodeRef=-24, props={label=b, id=73f307b2-33ad-41c0-af81-79ceeee526fa}},
{nodeRef=-56, props={label=h, id=8f1c8ba7-4ad6-4b7e-b2f6-e7c5db0783c9}},
{nodeRef=-28, props={label=a, id=09e2975f-d26d-4042-b5e7-8a25c279aa07}},
{nodeRef=-30, props={label=b, id=d6bf4cb4-80c4-44f3-92b2-7a0b00ce3cac}},
{nodeRef=-32, props={label=c, id=f13ba92e-08b0-4360-9125-d78b1c51fd6c}}]}

Which seems to basically take almost all the time for the save. As part of loading the document, my current setup is to issue a cypher command to remove all of the nodes under the document node, and then load all of the newly collected nodes to connect to the existing document (basically its a replace of the structure under a document). Is this what could be causing this request? Is there a better way to deal with loading the nodes?

Thanks in advance for any thoughts or suggestions.

Matt


(Mail4mattharrison) #5

Ah, I had a look at that, and thought about your mention of indexes. I see by default indexes are off (is this correct?), so I wondered if the issue was that it was looking for elements by ID, but with no index. I've just tried out setting the configuration with 'autoIndex("update")' and this looks to have taken the save operation down to around 200 miliseconds - which looks fine.
Does what I've deduced and done sound rational/reasonable/correct?

Thanks,

Matt