cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Pause, resolving how to handle anonymous content

What's the best way to incrementally add content to neo4j database?

lingvisa
Graph Fellow

If I have an initial neo4j database created and I need to periodically update the database in several ways like below:

  1. Add new nodes
  2. Add new attributes on existing nodes
  3. Add new relationships on existing nodes
  4. Delete or modify any existing nodes, attributes or relationships.

For example, for 'Add new nodes', how do I know it's new? I have to search and compare it with all existing nodes? Similar things go to attributes and relationships. Does the 'merge' function can take care of all the internal complexities, so that users just prepare data in the normal csv file and a merge statement can add anything that appears new. By "appears new", I mean that if an exactly same node exists, then it doesn't add at all.

Is there any use case or blog article on incremental adding content to neo4j database?

14 REPLIES 14

You'll want to identify which properties on nodes of a certain label denote a node as unique. After you've figured out which of those properties indicate uniqueness, you'll want either an index or a node key constraint on them, and when you MERGE only MERGE on that set of properties. All other properties can be set after the MERGE operation.

So for example, if for :Person nodes, firstName and lastName indicate uniqueness, then you'll want to create an index on :Person(firstName, lastName), and use a query like

...
MATCH (p:Person {firstName:$firstName, lastName:$lastName})
SET p.hobbies = $hobbies, p.favoriteColor = $favColor
...

Hi, Andrew, so you mean when I initially create the database, I need to identify unique properties and create an index or constraints on them. When I update the database, I shall use "merge" first and then use the 'Match ... set' statements.

"when you MERGE only MERGE on that set of properties", can you be a little more explicit on this in combining it with match?

Also, when creating neo4j databases, do you suggest using the import tool to load csv files without writing code, or you suggest using cypher queries created by graph creator?

For example from the documentation:


LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
WITH row WHERE row.Company IS NOT NULL
MERGE (c:Company {companyId: row.Id})

This statement says that "For each row in the csv file for Company nodes", the MERGE function will take the current row's Id as the node's identifier and try to match it with a node in the graph. If a company node with a companyId = row.Id, then it doesn't insert the new node; otherwise it will insert a new Company node. In order for this to work, the companyId node in the graph has to be indexed first. ". Is that right?

Yes, for best performance an index (or unique constraint, if appropriate) is needed.

Indexes aid MATCH and MERGE operations provided the label and property (or properties, for compound indexes and node keys) are present in the pattern.

While technically these will still work without an index, it may become increasingly expensive depending on the number of :Company nodes in the database (since with each MERGE it would have to check every single :Company node to see if the node already exists, as opposed to a much quicker index lookup).

So this 'MERGE (c:Company {companyId: row.Id})' will automatically take care of new nodes and existing nodes. Right? I don't have to search an existing node myself. In this example, if there is no such a companyId existing in the graph, does the 'merge' do nothing?

On the other hand, if I want to modify properties or relationships on existing node, I need to explicitly Match an existing node first, and then update its properties, without using a Merge operation?

A MERGE is like a MATCH, and if no such node exists, then a CREATE. So by the time the MERGE is done, a node with that label and properties will exist in the graph (whether it existed before or needed to be created).

A MATCH will just match to the node if the node exists, it won't create it otherwise.

'MERGE (c:Company {companyId: row.Id})' Does this fail if no such an existing companyId exist? Or this also mean if there is no such a companyId, it will create a new Id?

If no such node exists, it will create it.

Please review the documentation for MERGE, as well as one of our knowledge base articles that does a deeper dive into how MERGE works:

https://neo4j.com/docs/cypher-manual/current/clauses/merge/
and

Thanks for the info!

Really interesting thread. It helps me understand update/add much better.

ofer_bar
Node Link

Hi, I'm not sure what exactly is your scenario, but it sounds like you may want to use liquigraph for this task. It's an "update" open source tool for Neo4j.
It is relevant is you have a Spring application that servers data from Neo4j and will keep track of changesets in your database.
You will still need to write incremental Cypher statements, but the tool will create "metadata" to keep track of the changes already made.

Thanks ofer. I am not using spring.

You can still use an external Java Spring application to update the database. It doesn't have to be part of the main application.