Astro/Solid - Hacker News

$jph 3 hours ago

We evaluated UUIDv7 and determined that it's unwise to use it as a primary key.

We have applications where we control the creation of the primary key, and where the primary key will be exposed to end users, such as when using a typical web app framework built with Rails, Phoenix, Loco, Laravel, etc. For these applications, UUIDv7 time is too problematic for security, so we prefer binary-stored UUIDv4 even though it's less efficient.

We also have applications where we control the creation of the primary key, and where we can ensure the primary key is never shown to users. For these applications, UUIDv7 is slower at inserts and joins, so we prefer BIGSERIAL for primary key, and binary-stored UUIDv4 for showing to users such as in URLs.

[-]

$nighthawk454 2 minutes ago

Recently someone shared a method for encrypting the timestamp portion as well:

https://news.ycombinator.com/item?id=45275973

$laughing_snyder 2 hours ago

Why would exposing any primary key be bad for security? If your system's security *in any way* depends on the randomness of a database private key, you have other problems. It's not the job of a primary key to add to security. Not to mention that UUIDv7 has 6 random bytes, which, for the vast majority of web applications, even finance, is more than enough randomness. Just imagine how many requests an attacker would need to make to guess even one UUID (281 trillion possible combinations for 6 random bytes, and he also would need to guess the unix timestamp in ms correctly). The only scenario I can think of is that you use the primary as a sort of API key.

[-]

$8organicbits 30 minutes ago

> system's security in any way depends on the randomness of a database private key

Unlisted URLs, like YouTube videos are a popular example used by a reputable tech company.

> UUIDv7 has 6 random bytes

Careful. The spec allows 74 bits to be filled randomly. However you are allowed to exchange up to 12 bits for a more accurate timestamp and a counter of up to 42 bits. If you can get a fix on the timestamp and counter, the random portion only provides 20 bits (1M possiblities).

Python 3.14rc introduces a UUIDv7 implementation that has only 32 random bits, for example.

Basically, you need to see what your implementation does.

[-]

$bearjaws 25 minutes ago

only 32bits, so 4 billion guesses per microsecond... Even if youtube has 1 million videos per microsecond you would never guess them before rate limits.

$sz4kerto an hour ago

Example: if user IDs are not random but eg Bigserial (autoincremented) and they're exposed through some API, then API clients can infer the creation time of said users in the system. Now if my system is storing eg health data for a large population, then it'll be easy to guess the age of the user. Etc. This is not a security problem, this is an information governance problem. But it's a problem. Now if you say that I should not expose these IDs - fine, but then whatever I expose is essentially an ID anyway.

$btown an hour ago

One of the big things here is de-anonymization and account correlation. Say you have an application where users'/products' affiliation with certain B2B accounts is considered sensitive; perhaps they need to interact with each other anonymously for bid fairness, perhaps people might be scraping for "how many users does account X have onboarded" as metadata for a financial edge.

If users/products are onboarded in bulk/during B2B account signup, then, leaking the creation times of each of them with any search that returns their UUIDs, becomes metadata that can be used to correlate users with each other, if imperfectly.

Often, the benefits of a UUID with natural ordering outweigh this. But it's something to weigh before deciding to switch to UUIDv7.

$Hizonner an hour ago

Because anything that knows the primary key now knows the timestamp. The UUID itself leaks information. It's not that it's not adding security. It's that it's actually subtracting security.

[-]

$lucideer an hour ago

> leaks information

It would have to leak sensitive information to be "subtracting security", which implies you're relying on timestamp secrecy to ensure your security. This would be one of the "other problems" the gp mentioned.

[-]

$atomicnumber3 an hour ago

Pretty much any information can be used for something. You're ignoring everything they say about how something not critical to application security may still not be desirable to be leaked for other reasons. Example: Target and Walmart may not depend on satellites being unable to image their parking lots from the perspective of loss prevention or corporate security. But it still leaks information they may not want financial analysts to know about their performance.

[-]

$limagnolia 43 minutes ago

Sam Walton used to fly investors in his plane over Walmart stores and ask them to count the cars in the parking lot, then he would fly them over competitors stores and ask the same. Just a fun fact about how this is a very real scenario!

$wongarsu 2 hours ago

Deploying UUIDv7 certainly requires more thought about the implications. In many cases leaking the creation time of a key is completely fine, in some cases it isn't

An interesting compromise is transforming the UUIDv7 to a UUIDv4 at the API boundary, like e.g. UUIDv47 [1]. On the other hand if you are doing that you can also go with u64 primary keys and transform those

1: https://github.com/stateless-me/uuidv47

$wpollock 2 hours ago

I wonder if the issue is with exposing internal IDs to end users. I'm sure the experts here have already thought of this, but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense? Maybe because the extra processing is more expensive than just using UUIDv4? Using a KDF such as argon2id on the random bits of a UUIDv7 seems like it might work well for external IDs.

(And why the heck are different types or variants of UUIDs called "versions"?)

[-]

$bri3d 5 minutes ago

It does make all kinds of sense and is a majorly underutilized tool.

$Hizonner an hour ago

Because now, for the rest of eternity, every single person who writes any code that moves data from this table to somewhere else, for any purpose, has to remember that the primary key gives away the creation time of something, which can potentially be linked to something else. A lot of people won't notice that, and a lot of people who do notice it will get the remediation wrong. And you can now forget using a simple view on the database to give any information to any person or program that shouldn't get the creation times.

You've embrittled your system.

[-]

$gfody 11 minutes ago

the question was why not use encryption (sqids/hashids/etc) to secure publicly exposed surrogate keys, I don't think this reply is on point .. surrogate keys ideally are never exposed (for a slew of reasons beyond just leaking information) so securing them is a perfectly reasonable thing to do (as seen everywhere on the internet). otoh using any form of uuid as surrogate key is an awful thing to do to your db engine (making its job significantly harder for no benefit)

> You've embrittled your system.

this is the main argument for keeping surrogate keys internal - they really should be thought of like pointers, dangling pointers outside of your control are brittle. ideally anything exposed to the wild that points back to a surrogate key decodes with extra information you can use to invalidate it (like a safe-pointer!)

$gfody an hour ago

> but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense?

it does make sense and it's what you should do instead of using a UUID as PK for this purpose.

$bricss 2 hours ago

If knowing IDs has a negative impact on security, then application system design is probably a trash.

[-]

$dietr1ch an hour ago

The actual concern is privacy.

Privacy wise,

- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.

- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.

- Random IDs reveal nothing.

---

Security wise,

- Sequential IDs can be guessed.

Performance wise,

- Sequential IDs may result in self-inflicted hotspots.

   - Spanner doesn't like  writing rows first keyed with timestamps, https://cloud.google.com/spanner/docs/schema-design#primary-key-prevent-hotspots.

- Random IDs lends themselves to sharding, but make indexing, column-compression, and maintaining order after inserts hard.

[-]

$jrockway 2 minutes ago

Why leak your primary keys? They are for the DBMS, not your end users.

$bearjaws 16 minutes ago

- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.

This implies the existence of an endpoint that returns a list of items, which could by itself be used to determine customers or rate of sales. This also means you have a broken security model that leaks a list of customers or list of sales, that you should probably not have access to begin with.

- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.

Again if you can list items freely you can do this anyway, capture what exists now and do diffs to determine update times and creation times.

$bearjaws 2 hours ago

Yeah I am trying to imagine a universe where having the creation time of an item breaks your security model and every path I go down is that the system has terrible security.

[-]

$Hizonner an hour ago

I know that the person I'm stalking created a pseudonymous account on service X around time Y. Based on other information, I have a limited number of suspect accounts. The creation time leaks to me, either via a bug which would otherwise have been harmless, or because somebody writing code "can't imagine a universe where having the creation time of an item breaks your security". I use the creation time to figure out which of my candidates is actually the target.

It took me under 15 seconds to come up with that.

[-]

$bearjaws 28 minutes ago

It took you 15 seconds because its a terrible example, _around time Y_ is doing insane lifting of this concept. Then "based on other information" okay so some other information is enabling this.

[-]

$Hizonner 14 minutes ago

It turns out that in reality, I usually know both "around time Y" and "other information". You're going to narrow me down from 10 accounts to 1, or from 100 to 10.

$kasperset 2 hours ago

Currently evaluating UUIDv7 as primary key for some inventory origin. I think it should be ok to use it for such use case since it will indicate the time of creation? Any thoughts?

[-]

$gtowey 2 hours ago

You have to ask what problems exactly are you solving? Unless there is a compelling reason to use them, sticking with auto increment IDs is much simpler.

And I say this as someone who recently has to convert some tables from auto increment IDs to uuid. In that instance, they were sharded tables that were relying on the IDs to be globally unique, and made heavy use of the IDs to scan records in time order. So uuids solved both those problems.

$moron4hire 2 hours ago

This is why the UUID versions should have been labeled by letter rather than number. Each UUID version doesn't replace the last. They do different things. The numbered versioning gives the impression that "higher numbers = better" and that's neither the case nor the intention.

$Recursing 3 hours ago

Interesting comment from a previous thread on UUIDv7 in Postgres: https://news.ycombinator.com/item?id=39262286

$raminf 2 hours ago

One of the main points of using a UUID as a record identifier was to make it so people couldn't guess adjacent records by incrementing or decrementing the ID. If you add sequential ordering, won't that defeat the purpose?

Seems like it would be wise to add caveats around using this form in external facing applications or APIs.

[-]

$bearjaws 2 hours ago

There's still 62 bits of random data, even if you know the EXACT milisecond the row was created (you likely won't), you still need to do 1 billion guesses per second for 73 years.

Ideally you have some sort of rate limit on your APIs...

$wongarsu 2 hours ago

UUIDv7 has a 48 bit timestamp, 12 bits that either provide sub-millisecond precision or are random (in pg they provide precision) and another 62 bits that are chosen at random.

The A UUIDv7 leaks to the outside when it was created, but guessing the next timestamp value is still completely unfeasible. 62 bits is plenty of security if each attempt requires an API request

[-]

$Hizonner 44 minutes ago

... and the next person working on the system thinks "well, this thing is unpredictable, so it's OK if I leak an unsalted hash of it". If they think at all, which is far from certain.

Why does everybody want to find excuses to leave footguns around?

$Demiurge 2 hours ago

The next ID can't be found just by adding 1, can it? How would you guess the next value?

$gm678 2 hours ago

Does anyone know if there are any sorts of optimizations (either internally or available to the user) for a table with a UIIDv7 PK and a `date_created` column?

[-]

$gfody an hour ago

various engines have what's called "fast key" optimization specifically for integer sequences - if you're testing performance between an int/serial pk and a uuid the impact is profound to digusting depending on the engine.

UUIDv7 in Postgres 18. With time extraction