It made me happy to see the pg_get_acl() function that I was involved in adding, is appreciated by users. I think there is still much improvement in the space of querying privileges. I think most users would probably struggle to come up with the query from the article:
postgres=# SELECT
(pg_identify_object(s.classid,s.objid,s.objsubid)).*,
pg_catalog.pg_get_acl(s.classid,s.objid,s.objsubid) AS acl
FROM pg_catalog.pg_shdepend AS s
JOIN pg_catalog.pg_database AS d
ON d.datname = current_database() AND
d.oid = s.dbid
JOIN pg_catalog.pg_authid AS a
ON a.oid = s.refobjid AND
s.refclassid = 'pg_authid'::regclass
WHERE s.deptype = 'a';
-[ RECORD 1 ]-----------------------------------------
type | table
schema | public
name | testtab
identity | public.testtab
acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
What I wanted to really add, was two new system views, pg_ownerships and pg_privileges [1]. The pg_get_acl() was a dependency that we needed to get in place first. In the end, I withdrew the patch trying to add these views. If there is enough interest from users, I might consider picking up the task of trying to work out the remaining obstacles.
Do people here need pg_ownerships and/or pg_privileges?
Absolutely. A lot of data security risk is gauged by who has access to what, and the sad fact is that many teams don’t use row or column level security for ergonomic reasons. Features like this would do a lot to make these features easier to reason about, understand, and verify.
During the past 15 years I have been using Postgres quite a bit. I have also almost exclusively have been accessing databases using an ORM. Love them or hate them, they do provide value for a lot of common cases and speed up development and debugability, or at least they did for me and I know they remain very popular.
This all meant that as databases like Postgres keep adding cool new features they mostly go unused because an ORM just doesn’t let you pierce that layer of abstraction except dropping to pure SQL which is typically seen as a code smell and an annoyance to everyone involved.
So on the one hand I love that Postgres is getting amazing new features, not to mention all the cool extensions. On the other hand I and I suspect many others are essentially locked out of them and since most ORMa try to serve multiple databases they typically only include the most common denominator features. As I get more experienced I both see why RDBMS is the right choice most times AND see the appeal of an object store instead of a row store.
It depends on the ORM, I know SQLAlchemy and Django ORM have some Postgres specific features (e.g. full text search/indexing). Some also let you extend the ORM to add your own features, or at least write raw SQL.
Yeah I mostly use Django ORM and TortoiseORM these days and have used SQLAlchemy in the past. FTS is nice but things like locks, pubsub, stored procedures, triggers, views/materialized views, and many others are not supported out of the box and for some of these I don’t even see where they could be hooked up. And this is before you get into things like replication and multi-server clusters.
One obvious thing I still can’t believe pg doesn’t have is the ability to define triggers at the database or schema level. I must have written code to mass generate DROP/CREATE TRIGGER probably 5 times (yes I know you can reuse the trigger procedure itself). And then you need to remember to re-run whenever tables are added/removed.
I've worked in an environment where 3 applications accessed the same database, one of the applications wasn't really maintained.
Having computed (stored or virtual) columns would've been awesome.
The use case isn't really "multiply a value by 12", but more like "we have a single boolean is_active column, and want to migrate to a more extensive status model" or "migrate from an is_active column to a (begin, end) timestamp tuple" or so.
With a virtual column, you can present a read-only, compatible column to the legacy application, while the other applications can use the more detailed, new columns, without having to keep the legacy column in sync.
I think it /can/ also be to 'multiply a number by 12'. For example lets say I'm a supermarket, who gets their order data out of a 20 year old IT system and as a result I've got a orders table which has a id, user_id and data column with an array of 'sku, price and qty'. If I regularly want to query/sort/filter based on total order value the easiest and most performant solution absolutely is to use a computed column.
If you need to retrieve the values, the fastest would probably be a stored generated column, not a virtual computed on the fly.
In case you only want to filter without returning values, you could also index directly on the expression without needing to add a stored generated column with an index on it
Much faster to do it in the database than in the client, especially in a normal situation where the postgres server is on a different machine that's possibly not even in the same building as the app server or where it allows you to use existing indexes.
I agree that they're often a symptom of bad schema design or data normalisation though. Sometimes that can't be helped however.
Since you can index expressions I wonder if that's because you essentially emhave to store the value in the index anyway and that wouldn't be expected for a virtual column?
I've used them for common transforms such as timezones (e g. data comes in as uutc and we query everything in Chicago time) just to make querying easier.
It made me happy to see the pg_get_acl() function that I was involved in adding, is appreciated by users. I think there is still much improvement in the space of querying privileges. I think most users would probably struggle to come up with the query from the article:
What I wanted to really add, was two new system views, pg_ownerships and pg_privileges [1]. The pg_get_acl() was a dependency that we needed to get in place first. In the end, I withdrew the patch trying to add these views. If there is enough interest from users, I might consider picking up the task of trying to work out the remaining obstacles.Do people here need pg_ownerships and/or pg_privileges?
[1] https://www.postgresql.org/message-id/flat/bbe7d1cb-0435-4ee...
Absolutely. A lot of data security risk is gauged by who has access to what, and the sad fact is that many teams don’t use row or column level security for ergonomic reasons. Features like this would do a lot to make these features easier to reason about, understand, and verify.
Yes. I was looking for something like this. And I've had people ask me about this before too
pg_ownerships and pg_privileges would be incredibly useful.
Yes!
During the past 15 years I have been using Postgres quite a bit. I have also almost exclusively have been accessing databases using an ORM. Love them or hate them, they do provide value for a lot of common cases and speed up development and debugability, or at least they did for me and I know they remain very popular.
This all meant that as databases like Postgres keep adding cool new features they mostly go unused because an ORM just doesn’t let you pierce that layer of abstraction except dropping to pure SQL which is typically seen as a code smell and an annoyance to everyone involved.
So on the one hand I love that Postgres is getting amazing new features, not to mention all the cool extensions. On the other hand I and I suspect many others are essentially locked out of them and since most ORMa try to serve multiple databases they typically only include the most common denominator features. As I get more experienced I both see why RDBMS is the right choice most times AND see the appeal of an object store instead of a row store.
It depends on the ORM, I know SQLAlchemy and Django ORM have some Postgres specific features (e.g. full text search/indexing). Some also let you extend the ORM to add your own features, or at least write raw SQL.
Yeah I mostly use Django ORM and TortoiseORM these days and have used SQLAlchemy in the past. FTS is nice but things like locks, pubsub, stored procedures, triggers, views/materialized views, and many others are not supported out of the box and for some of these I don’t even see where they could be hooked up. And this is before you get into things like replication and multi-server clusters.
One obvious thing I still can’t believe pg doesn’t have is the ability to define triggers at the database or schema level. I must have written code to mass generate DROP/CREATE TRIGGER probably 5 times (yes I know you can reuse the trigger procedure itself). And then you need to remember to re-run whenever tables are added/removed.
I always find it hard to think of a good reason for a (computed) virtual column
Why would you ever force your db to multiply a value by 12 to another column, or parse a json path, if it’s not for filtering?
Move that effort to your clients so you’re not needlessly consuming db resources.
I've worked in an environment where 3 applications accessed the same database, one of the applications wasn't really maintained.
Having computed (stored or virtual) columns would've been awesome.
The use case isn't really "multiply a value by 12", but more like "we have a single boolean is_active column, and want to migrate to a more extensive status model" or "migrate from an is_active column to a (begin, end) timestamp tuple" or so.
With a virtual column, you can present a read-only, compatible column to the legacy application, while the other applications can use the more detailed, new columns, without having to keep the legacy column in sync.
I think it /can/ also be to 'multiply a number by 12'. For example lets say I'm a supermarket, who gets their order data out of a 20 year old IT system and as a result I've got a orders table which has a id, user_id and data column with an array of 'sku, price and qty'. If I regularly want to query/sort/filter based on total order value the easiest and most performant solution absolutely is to use a computed column.
If you need to retrieve the values, the fastest would probably be a stored generated column, not a virtual computed on the fly.
In case you only want to filter without returning values, you could also index directly on the expression without needing to add a stored generated column with an index on it
Similar experience here. I have used it to make several legacy migrations much smoother.
Much faster to do it in the database than in the client, especially in a normal situation where the postgres server is on a different machine that's possibly not even in the same building as the app server or where it allows you to use existing indexes.
I agree that they're often a symptom of bad schema design or data normalisation though. Sometimes that can't be helped however.
Think about schema migrations, and the need to be compatible with the new and old versions of the schema.
Why should each one of your clients reimplement the calculated value, each in a slightly different way?
I'm not a fan of stored procedures but this is lightweight enough that I like how it simplifies by removing responsibility from the code.
I imagine the computed column could be indexed or materialized if needed.
> I imagine the computed column could be indexed or materialized if needed.
The article mentions that "you cannot create indexes on VIRTUAL generated columns".
Oh thanks I missed the distinction between a virtual and regular generated column.
Since you can index expressions I wonder if that's because you essentially emhave to store the value in the index anyway and that wouldn't be expected for a virtual column?
I've used them for common transforms such as timezones (e g. data comes in as uutc and we query everything in Chicago time) just to make querying easier.
For me: computed columns in change stream records. Super nice.
We use them to pull out some fields out of JSON blobs, and until PG18 they were not available for logical replication log consumers.
Some more discussion: https://news.ycombinator.com/item?id=45372283
Thanks! Macro-expanded:
PostgreSQL 18 Released https://news.ycombinator.com/item?id=45372283 - 3 days ago, 21 comments