We evaluated UUIDv7 and determined that it's unwise to use it as a primary key.
We have applications where we control the creation of the primary key, and where the primary key will be exposed to end users, such as when using a typical web app framework built with Rails, Phoenix, Loco, Laravel, etc. For these applications, UUIDv7 time is too problematic for security, so we prefer binary-stored UUIDv4 even though it's less efficient.
We also have applications where we control the creation of the primary key, and where we can ensure the primary key is never shown to users. For these applications, UUIDv7 is slower at inserts and joins, so we prefer BIGSERIAL for primary key, and binary-stored UUIDv4 for showing to users such as in URLs.
Why would exposing any primary key be bad for security? If your system's security *in any way* depends on the randomness of a database private key, you have other problems. It's not the job of a primary key to add to security. Not to mention that UUIDv7 has 6 random bytes, which, for the vast majority of web applications, even finance, is more than enough randomness. Just imagine how many requests an attacker would need to make to guess even one UUID (281 trillion possible combinations for 6 random bytes, and he also would need to guess the unix timestamp in ms correctly). The only scenario I can think of is that you use the primary as a sort of API key.
> system's security in any way depends on the randomness of a database private key
Unlisted URLs, like YouTube videos are a popular example used by a reputable tech company.
> UUIDv7 has 6 random bytes
Careful. The spec allows 74 bits to be filled randomly. However you are allowed to exchange up to 12 bits for a more accurate timestamp and a counter of up to 42 bits. If you can get a fix on the timestamp and counter, the random portion only provides 20 bits (1M possiblities).
Python 3.14rc introduces a UUIDv7 implementation that has only 32 random bits, for example.
Basically, you need to see what your implementation does.
only 32bits, so 4 billion guesses per microsecond... Even if youtube has 1 million videos per microsecond you would never guess them before rate limits.
Example: if user IDs are not random but eg Bigserial (autoincremented) and they're exposed through some API, then API clients can infer the creation time of said users in the system. Now if my system is storing eg health data for a large population, then it'll be easy to guess the age of the user. Etc. This is not a security problem, this is an information governance problem. But it's a problem. Now if you say that I should not expose these IDs - fine, but then whatever I expose is essentially an ID anyway.
One of the big things here is de-anonymization and account correlation. Say you have an application where users'/products' affiliation with certain B2B accounts is considered sensitive; perhaps they need to interact with each other anonymously for bid fairness, perhaps people might be scraping for "how many users does account X have onboarded" as metadata for a financial edge.
If users/products are onboarded in bulk/during B2B account signup, then, leaking the creation times of each of them with any search that returns their UUIDs, becomes metadata that can be used to correlate users with each other, if imperfectly.
Often, the benefits of a UUID with natural ordering outweigh this. But it's something to weigh before deciding to switch to UUIDv7.
Because anything that knows the primary key now knows the timestamp. The UUID itself leaks information. It's not that it's not adding security. It's that it's actually subtracting security.
It would have to leak sensitive information to be "subtracting security", which implies you're relying on timestamp secrecy to ensure your security. This would be one of the "other problems" the gp mentioned.
Pretty much any information can be used for something. You're ignoring everything they say about how something not critical to application security may still not be desirable to be leaked for other reasons. Example: Target and Walmart may not depend on satellites being unable to image their parking lots from the perspective of loss prevention or corporate security. But it still leaks information they may not want financial analysts to know about their performance.
Sam Walton used to fly investors in his plane over Walmart stores and ask them to count the cars in the parking lot, then he would fly them over competitors stores and ask the same. Just a fun fact about how this is a very real scenario!
Deploying UUIDv7 certainly requires more thought about the implications. In many cases leaking the creation time of a key is completely fine, in some cases it isn't
An interesting compromise is transforming the UUIDv7 to a UUIDv4 at the API boundary, like e.g. UUIDv47 [1]. On the other hand if you are doing that you can also go with u64 primary keys and transform those
I wonder if the issue is with exposing internal IDs to end users. I'm sure the experts here have already thought of this, but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense? Maybe because the extra processing is more expensive than just using UUIDv4? Using a KDF such as argon2id on the random bits of a UUIDv7 seems like it might work well for external IDs.
(And why the heck are different types or variants of UUIDs called "versions"?)
Because now, for the rest of eternity, every single person who writes any code that moves data from this table to somewhere else, for any purpose, has to remember that the primary key gives away the creation time of something, which can potentially be linked to something else. A lot of people won't notice that, and a lot of people who do notice it will get the remediation wrong. And you can now forget using a simple view on the database to give any information to any person or program that shouldn't get the creation times.
the question was why not use encryption (sqids/hashids/etc) to secure publicly exposed surrogate keys, I don't think this reply is on point .. surrogate keys ideally are never exposed (for a slew of reasons beyond just leaking information) so securing them is a perfectly reasonable thing to do (as seen everywhere on the internet). otoh using any form of uuid as surrogate key is an awful thing to do to your db engine (making its job significantly harder for no benefit)
> You've embrittled your system.
this is the main argument for keeping surrogate keys internal - they really should be thought of like pointers, dangling pointers outside of your control are brittle. ideally anything exposed to the wild that points back to a surrogate key decodes with extra information you can use to invalidate it (like a safe-pointer!)
- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.
This implies the existence of an endpoint that returns a list of items, which could by itself be used to determine customers or rate of sales. This also means you have a broken security model that leaks a list of customers or list of sales, that you should probably not have access to begin with.
- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.
Again if you can list items freely you can do this anyway, capture what exists now and do diffs to determine update times and creation times.
Yeah I am trying to imagine a universe where having the creation time of an item breaks your security model and every path I go down is that the system has terrible security.
I know that the person I'm stalking created a pseudonymous account on service X around time Y. Based on other information, I have a limited number of suspect accounts. The creation time leaks to me, either via a bug which would otherwise have been harmless, or because somebody writing code "can't imagine a universe where having the creation time of an item breaks your security". I use the creation time to figure out which of my candidates is actually the target.
It took you 15 seconds because its a terrible example, _around time Y_ is doing insane lifting of this concept. Then "based on other information" okay so some other information is enabling this.
It turns out that in reality, I usually know both "around time Y" and "other information". You're going to narrow me down from 10 accounts to 1, or from 100 to 10.
Currently evaluating UUIDv7 as primary key for some inventory origin. I think it should be ok to use it for such use case since it will indicate the time of creation? Any thoughts?
You have to ask what problems exactly are you solving? Unless there is a compelling reason to use them, sticking with auto increment IDs is much simpler.
And I say this as someone who recently has to convert some tables from auto increment IDs to uuid. In that instance, they were sharded tables that were relying on the IDs to be globally unique, and made heavy use of the IDs to scan records in time order. So uuids solved both those problems.
This is why the UUID versions should have been labeled by letter rather than number. Each UUID version doesn't replace the last. They do different things. The numbered versioning gives the impression that "higher numbers = better" and that's neither the case nor the intention.
One of the main points of using a UUID as a record identifier was to make it so people couldn't guess adjacent records by incrementing or decrementing the ID. If you add sequential ordering, won't that defeat the purpose?
Seems like it would be wise to add caveats around using this form in external facing applications or APIs.
There's still 62 bits of random data, even if you know the EXACT milisecond the row was created (you likely won't), you still need to do 1 billion guesses per second for 73 years.
Ideally you have some sort of rate limit on your APIs...
UUIDv7 has a 48 bit timestamp, 12 bits that either provide sub-millisecond precision or are random (in pg they provide precision) and another 62 bits that are chosen at random.
The A UUIDv7 leaks to the outside when it was created, but guessing the next timestamp value is still completely unfeasible. 62 bits is plenty of security if each attempt requires an API request
... and the next person working on the system thinks "well, this thing is unpredictable, so it's OK if I leak an unsalted hash of it". If they think at all, which is far from certain.
Why does everybody want to find excuses to leave footguns around?
Does anyone know if there are any sorts of optimizations (either internally or available to the user) for a table with a UIIDv7 PK and a `date_created` column?
various engines have what's called "fast key" optimization specifically for integer sequences - if you're testing performance between an int/serial pk and a uuid the impact is profound to digusting depending on the engine.
We evaluated UUIDv7 and determined that it's unwise to use it as a primary key.
We have applications where we control the creation of the primary key, and where the primary key will be exposed to end users, such as when using a typical web app framework built with Rails, Phoenix, Loco, Laravel, etc. For these applications, UUIDv7 time is too problematic for security, so we prefer binary-stored UUIDv4 even though it's less efficient.
We also have applications where we control the creation of the primary key, and where we can ensure the primary key is never shown to users. For these applications, UUIDv7 is slower at inserts and joins, so we prefer BIGSERIAL for primary key, and binary-stored UUIDv4 for showing to users such as in URLs.
Recently someone shared a method for encrypting the timestamp portion as well:
https://news.ycombinator.com/item?id=45275973
Why would exposing any primary key be bad for security? If your system's security *in any way* depends on the randomness of a database private key, you have other problems. It's not the job of a primary key to add to security. Not to mention that UUIDv7 has 6 random bytes, which, for the vast majority of web applications, even finance, is more than enough randomness. Just imagine how many requests an attacker would need to make to guess even one UUID (281 trillion possible combinations for 6 random bytes, and he also would need to guess the unix timestamp in ms correctly). The only scenario I can think of is that you use the primary as a sort of API key.
> system's security in any way depends on the randomness of a database private key
Unlisted URLs, like YouTube videos are a popular example used by a reputable tech company.
> UUIDv7 has 6 random bytes
Careful. The spec allows 74 bits to be filled randomly. However you are allowed to exchange up to 12 bits for a more accurate timestamp and a counter of up to 42 bits. If you can get a fix on the timestamp and counter, the random portion only provides 20 bits (1M possiblities).
Python 3.14rc introduces a UUIDv7 implementation that has only 32 random bits, for example.
Basically, you need to see what your implementation does.
only 32bits, so 4 billion guesses per microsecond... Even if youtube has 1 million videos per microsecond you would never guess them before rate limits.
Example: if user IDs are not random but eg Bigserial (autoincremented) and they're exposed through some API, then API clients can infer the creation time of said users in the system. Now if my system is storing eg health data for a large population, then it'll be easy to guess the age of the user. Etc. This is not a security problem, this is an information governance problem. But it's a problem. Now if you say that I should not expose these IDs - fine, but then whatever I expose is essentially an ID anyway.
One of the big things here is de-anonymization and account correlation. Say you have an application where users'/products' affiliation with certain B2B accounts is considered sensitive; perhaps they need to interact with each other anonymously for bid fairness, perhaps people might be scraping for "how many users does account X have onboarded" as metadata for a financial edge.
If users/products are onboarded in bulk/during B2B account signup, then, leaking the creation times of each of them with any search that returns their UUIDs, becomes metadata that can be used to correlate users with each other, if imperfectly.
Often, the benefits of a UUID with natural ordering outweigh this. But it's something to weigh before deciding to switch to UUIDv7.
Because anything that knows the primary key now knows the timestamp. The UUID itself leaks information. It's not that it's not adding security. It's that it's actually subtracting security.
> leaks information
It would have to leak sensitive information to be "subtracting security", which implies you're relying on timestamp secrecy to ensure your security. This would be one of the "other problems" the gp mentioned.
Pretty much any information can be used for something. You're ignoring everything they say about how something not critical to application security may still not be desirable to be leaked for other reasons. Example: Target and Walmart may not depend on satellites being unable to image their parking lots from the perspective of loss prevention or corporate security. But it still leaks information they may not want financial analysts to know about their performance.
Sam Walton used to fly investors in his plane over Walmart stores and ask them to count the cars in the parking lot, then he would fly them over competitors stores and ask the same. Just a fun fact about how this is a very real scenario!
Deploying UUIDv7 certainly requires more thought about the implications. In many cases leaking the creation time of a key is completely fine, in some cases it isn't
An interesting compromise is transforming the UUIDv7 to a UUIDv4 at the API boundary, like e.g. UUIDv47 [1]. On the other hand if you are doing that you can also go with u64 primary keys and transform those
1: https://github.com/stateless-me/uuidv47
I wonder if the issue is with exposing internal IDs to end users. I'm sure the experts here have already thought of this, but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense? Maybe because the extra processing is more expensive than just using UUIDv4? Using a KDF such as argon2id on the random bits of a UUIDv7 seems like it might work well for external IDs.
(And why the heck are different types or variants of UUIDs called "versions"?)
It does make all kinds of sense and is a majorly underutilized tool.
Because now, for the rest of eternity, every single person who writes any code that moves data from this table to somewhere else, for any purpose, has to remember that the primary key gives away the creation time of something, which can potentially be linked to something else. A lot of people won't notice that, and a lot of people who do notice it will get the remediation wrong. And you can now forget using a simple view on the database to give any information to any person or program that shouldn't get the creation times.
You've embrittled your system.
the question was why not use encryption (sqids/hashids/etc) to secure publicly exposed surrogate keys, I don't think this reply is on point .. surrogate keys ideally are never exposed (for a slew of reasons beyond just leaking information) so securing them is a perfectly reasonable thing to do (as seen everywhere on the internet). otoh using any form of uuid as surrogate key is an awful thing to do to your db engine (making its job significantly harder for no benefit)
> You've embrittled your system.
this is the main argument for keeping surrogate keys internal - they really should be thought of like pointers, dangling pointers outside of your control are brittle. ideally anything exposed to the wild that points back to a surrogate key decodes with extra information you can use to invalidate it (like a safe-pointer!)
> but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense?
it does make sense and it's what you should do instead of using a UUID as PK for this purpose.
If knowing IDs has a negative impact on security, then application system design is probably a trash.
The actual concern is privacy.
Privacy wise,
- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.
- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.
- Random IDs reveal nothing.
---
Security wise,
- Sequential IDs can be guessed.
Performance wise,
- Sequential IDs may result in self-inflicted hotspots.
- Random IDs lends themselves to sharding, but make indexing, column-compression, and maintaining order after inserts hard.Why leak your primary keys? They are for the DBMS, not your end users.
- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.
This implies the existence of an endpoint that returns a list of items, which could by itself be used to determine customers or rate of sales. This also means you have a broken security model that leaks a list of customers or list of sales, that you should probably not have access to begin with.
- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.
Again if you can list items freely you can do this anyway, capture what exists now and do diffs to determine update times and creation times.
Yeah I am trying to imagine a universe where having the creation time of an item breaks your security model and every path I go down is that the system has terrible security.
I know that the person I'm stalking created a pseudonymous account on service X around time Y. Based on other information, I have a limited number of suspect accounts. The creation time leaks to me, either via a bug which would otherwise have been harmless, or because somebody writing code "can't imagine a universe where having the creation time of an item breaks your security". I use the creation time to figure out which of my candidates is actually the target.
It took me under 15 seconds to come up with that.
It took you 15 seconds because its a terrible example, _around time Y_ is doing insane lifting of this concept. Then "based on other information" okay so some other information is enabling this.
It turns out that in reality, I usually know both "around time Y" and "other information". You're going to narrow me down from 10 accounts to 1, or from 100 to 10.
Currently evaluating UUIDv7 as primary key for some inventory origin. I think it should be ok to use it for such use case since it will indicate the time of creation? Any thoughts?
You have to ask what problems exactly are you solving? Unless there is a compelling reason to use them, sticking with auto increment IDs is much simpler.
And I say this as someone who recently has to convert some tables from auto increment IDs to uuid. In that instance, they were sharded tables that were relying on the IDs to be globally unique, and made heavy use of the IDs to scan records in time order. So uuids solved both those problems.
This is why the UUID versions should have been labeled by letter rather than number. Each UUID version doesn't replace the last. They do different things. The numbered versioning gives the impression that "higher numbers = better" and that's neither the case nor the intention.
Interesting comment from a previous thread on UUIDv7 in Postgres: https://news.ycombinator.com/item?id=39262286
One of the main points of using a UUID as a record identifier was to make it so people couldn't guess adjacent records by incrementing or decrementing the ID. If you add sequential ordering, won't that defeat the purpose?
Seems like it would be wise to add caveats around using this form in external facing applications or APIs.
There's still 62 bits of random data, even if you know the EXACT milisecond the row was created (you likely won't), you still need to do 1 billion guesses per second for 73 years.
Ideally you have some sort of rate limit on your APIs...
UUIDv7 has a 48 bit timestamp, 12 bits that either provide sub-millisecond precision or are random (in pg they provide precision) and another 62 bits that are chosen at random.
The A UUIDv7 leaks to the outside when it was created, but guessing the next timestamp value is still completely unfeasible. 62 bits is plenty of security if each attempt requires an API request
... and the next person working on the system thinks "well, this thing is unpredictable, so it's OK if I leak an unsalted hash of it". If they think at all, which is far from certain.
Why does everybody want to find excuses to leave footguns around?
The next ID can't be found just by adding 1, can it? How would you guess the next value?
Does anyone know if there are any sorts of optimizations (either internally or available to the user) for a table with a UIIDv7 PK and a `date_created` column?
various engines have what's called "fast key" optimization specifically for integer sequences - if you're testing performance between an int/serial pk and a uuid the impact is profound to digusting depending on the engine.