How to auto increment to get unique value for a new node's field

How to auto increment to get unique value for a new node's field?

One way would be to use CREATE (n:Node {field: timestamp(), alternatively: datetime()}) for two different unique values that would increase with time. If you want first node to be 1 and second to be 2, you would need to do MATCH (n) WITH count(n) AS total CREATE (newNode:Node {number: total}) but this only works if you never delete anything :slight_smile:

Jacob offered some good tips. Personally I'm a fan of UUIDs

RETURN apoc.create.uuid() AS uuid;

https://neo4j.com/docs/labs/apoc/current/graph-updates/uuid/

3 Likes

Sorry to necro but I see UUID look like 5530953d-b85e-4939-b37f-a79d54b770a3.
You have this list of users and they can change their name. You usually use an auto incremented ID for that. It appears in the url. It's clean and stable and a lot if not most websites do that. Even if it's not about users (because I know this trend of having a chosen-by-user-string and a pseudo you can change as much as you want, like twiter), it can be a list of anything with unstable names, I don't know, a list of movies in imdb where the title can be updated.
But are you telling you can't make something as simple as generating an unique auto incremented non reusable ID with neo4j and I will have to do with www.domain.com/user/5530953d-b85e-4939-b37f-a79d54b770a3 or manage it from the outside (meaning you have to manage a counter or something, for every object like that)?
That's weird, honestly.

1 Like

I really think you're getting hung up on the data type; Int vs. UUID String. Yes an Int takes up less characters in an URL but the benefits pretty much end there is just length. There many more cons to the auto-id than anything gained.

An auto-id Int ss also very predictable and someone could spam your site gathering all sorts of data because they know how to increment the Int in the URL. Much more difficult if it's the UUID in the URL. If you want your user list to get scraped and philfered, it wouldn't take much effort to loop through a million API calls your URL and get all your user info and then your competitor would know your entire user base in your app.

Then as you mentioned, yes you could put something else in the URL such as the user name. This has the disadvantage as well that if the username changes then any links using the old name breaks. This is how Facebook is, Facebook and if the user changes their name, there's an underlying ID behind your account and if you change your name, they can go and propagate that name change throughout their systems but any outside links are broken because it's out of their control. Depending on out important it is to you to maintain any deep linking functionality, you could build a history of previous name changes. Honestly you can bookmark any link in your browser now and the internet could remove that link and your bookmark is toast. Happens all the time, someone blogs and deletes a post. A user makes a tweet and deletes it. Your website should gracefully handle a redirect of a broken link anyways.

Another reason why I'm not a fan of auto-ids is that they're only unique within the system that is generating them. If you have a database that is sharded across the globe, Asia vs. Europe, it's a lot of work to ensure that you're not getting auto-id collisions between the two shards if you ever tried to reconcile the database into a single data source. UUIDS can be generate by any application or any shard of a db and you're very unlikely to get an id collision.

I can understand why Neo4j doesn't have an auto-increment feature. In a SQL DB auto-ids are defined as a per-table basis. There are no tables in a graph. There's just nodes. Any auto-id would have be graph unique regardless of the label(s) that a node has. If you were trying to make an ID in SQL throughout the entire database, you'd still probably end up making a dummy id table that all other tables would have to reference to petition for a new id. And think of this, when each table has their own seed for an auto-id, that id value 123 in table A could also exist in table B as well. A user could write a join and it would succeed because the values match but the results would nonsensical. If you used UUID the join would never succeed because no two tables would ever have the same id inside their table data.

UUID is unique across all tables, all dbs, across all the global from anyone else in the known universe, all you have to do is call a function already available in every language.

1 Like

I can't think of a single website that uses UUID in their url. It looks like junk and will most likely be url rewritten.
No offense but the (two) reasons you wrote are weak, maybe because I am not business-centered.
System-centric: well, not all people have a database sharded across the globe... Scrapping: plenty of websites won't be saved from scrapping with a UUID (or even care). They just have to have a member list. Auto-ID can be useful because it is predictable and you can watch the progression of added objects, and play with the ID directly. It is also human readable and "elegant". If you fear scrapping, you will take measures to not use auto-ID.

I am not saying we should all use auto-ID for everything, even more with every label. I find weird that the option was removed when it is really useful for a number of reasons, and not just because it's the norm with sql. I know I need to think graphs, not tables, but you could link an auto-ID to a label and put a constraint that prevent other labels also with auto-id to intersect. You put an auto-ID on :Person and :Movie, you are forbidden to have a node with :Person:Movie. That simple.

@discourse,

It is not that simple. While your suggestion would work for the simplest of use-cases: data structures which could just be in a relational db; it breaks many graph patterns, and makes impossible many of the advantages of having a graph db at all. Labels are not tables, they are logical groupings of nodes. Nodes are meant to have multiple labels, so that they may be queried and indexed across a node's many relevant taxonomies. For example, a :Person, may also be a :User, and an :Admin. While you could create separate nodes for each representation of a person, you could also simple add the labels.

Database engines must prioritize performance and stability first, flexibility second, and convenience third. Your solution puts convenience first.

What are UUIDs?

There are many standards, but UUID stands for, "Universally Unique IDentity." Why would you chose to use IDs which are only internally consistent, when instead you could have a primary-key which can be used for merging data across systems and structures? Convenience alone? Keep reusing old patterns and habits learned from designing relational dbs?

Auto Increment INT

+ Simpler
+ Requires no code
- Fragile
    - Potential ID reuse
    - Only internally consistent
- Useless externally
- Vulnerable (can be abused) 

UUID

+ Unique
    + Can be used across systems and software (import/export csv)
+ Stable
    + As it is *not* auto-generated by the db, you have complete control over when a new UUID is created, and a near-zero risk of reuse.
+ Resistant to abuse
- You have to write a line of code somewhere to generate the UUID when you create a new record.
2 Likes