Create node with specific internal id using LOAD CSV

For initial data import (step 1) into Neo4j database I'm using the neo4j-admin import tool. There you can specify the internal id of a node by :ID in the header.

I would also like to use LOAD CSV command for creating more nodes (step 2) into an already existing database (with data from previous step). I can't find the answer on how to specify internal id of a node by using this command.

Why it is not possible, while at initial import it is? In second step I'm having a similar csv files as in the first step, which means I have a csv file of nodes with first column being an id of a node AND I have a csv file of relationships between them with columns be like start_id,end_id,relationshipType.

Thanks, Petr M

You may have misread this. There is no way to manually set the internal id used when nodes are imported whether it's by neo4j-admin import or LOAD CSV.

What you are referring to is likely how we define ids that are used when connecting up nodes and creating relationships, but that doesn't mean the id will continue to be used as the graph id. If you observed this during the import, then that is nice, but not guaranteed. After all, you may have nodes of different types with the same id assigned via your neo4j-admin import usage, but the internal graph id is globally unique, so there would be no way those would be the ids used.

The internal graph id of the node is actually an index offset into the node store file, so you see we cannot manually set it. You can however set your own id property to whatever you like.

And do you advise to create index on this "my own id property"?

If you plan to regularly lookup (MATCH or MERGE) nodes of that type by that id property, then yes, create an index (or unique constraint if it's meant to be unique).

What purpose then servers the id(n) function in graph? Just to create a quick reference to a certain node that I can use furthermore in the query?

Does this mean, that if I have used neo4j-admin import tool and let's say node X had it's :ID column set to 12345, inside when I try to find this node (by filtering other its properties so im certain Im querying just for node X) and return id(X), the number I get from this query could be 12345 or completely random (based on id association in the graph) number? Am I telling this right?

Thanks for all the responses, really appreciate it!

Yes, id(n) gets the node's internal graph id, or allows lookup of a node by its internal graph id, and it is impossible to set it yourself, since it is actually an offset into the node store to where the node data is stored, like a pointer.

It can be useful in the short term when passing ids from to other queries, or for using within the same query (though a node variable will work just as well, since within query execution before a return a node variable is just a lightweight structure with the node's graph id)

Any ids you set via import will be properties on the node, and you need an index (and the label present in the pattern) in order to do a fast lookup of a node by that property.

1 Like