"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers.""
You don't have to store things in a database to do this. Code is almost almost read from disk into some kind of in memory data structure that is amenable to such analysis, maybe even more so than a generic database. Doesn't matter if you use vscode or vim, most developers have some kind of tool that does semantic analysis and which affords navigation and organization of code.
Its just that the main way code editors present navigation follows the path hierarchy, also because its often intimately tied to how programming languages shape modules. Most editors have at least some alternative navigation however, and most people are using at least some of them: outlining by declaration symbols, search, changes, unittests, open files, bookmarks, etc.
So in a way, this is already how it is done, except the 'database' part is really tied to the code editor and its storage component nicely decoupled (in the end, databases are usually also just a bunch of files).
I think any real improvements on this model can only come from a new programming language design, and as others have pointed out, this hasn't caught on in the past. The reason for this is probably not that file oriented modularity is the best thing there is, but rather the escape velocity needed to get out of the vast ecosystem of tooling around files, like the OS, git and existing code editors and whatnot.
Hmmm... Smalltalk, a pure object-oriented language, stores everything in an image, and has tons of different browsers to inspect its "object soup". Install a Squeak Smalltalk if you're curious :-)
Userland Frontier was a wonderful scripting environment born on macOS and ported to Windows. It was a mix of an object database, storing code and data, an extensible scripting language called UserScript, and very powerful InterApplication capabilities, based on Apple's Open Scripting Architecture. Dave Winer, its author, worked on the XML-RPC standard afterwards.
In the Unison language, code is stored in a database, with a hash code of its content as the key. Quoting https://www.unison-lang.org :
A new approach to Storing code. Other tools try to recover structure from text; Unison stores code in a database. This eliminates builds, provides for instant nonbreaking renames, type-based search, and lots more.
I don't think this is unique to code, but a limitation of filesystems in general. You could make the same argument for photos: I want them sorted by date, by tag, by person in the image, by location.
I can do this in Lightroom or my "Photo" app, but then you are always reliant on some third-party tool. It would be nice if there was some native way for files to not have to commit to a single hierarchy, but able to switch views on the fly (without it being insanely slow for larger amount of files).
We did this for a long time for our CMS - although we did simulate a filesystem structure. We also set up a git-like system to store versioning information and set up WebDav to mount it all and allow direct source code editing. It worked pretty well for years.
We eventually stopped because we were relying much more on external tools (eg npm, webpack) which had all sort of issues over webdav mounts. Maintaining all this code management infrastructure in parallel wasn't worth it in the end, and we moved the code back to disk, switched to git, etc.
And photoshop silently ignoring webdav I/O errors when saving designs didn't help either.
You already have tagging by type on the filesystem - the file extension. That allows you to limit file searches. Add extra metadata to extensions if the same extensions have different roles (.backend.ts, .frontend.ts, .html.template, .text.template)
These days I prefer to structure for easy removal of code - everything for eg. a widget (frontend, backend, css) goes into a folder and I only need to remove that folder when the widget is retired, and linting/validation will show me the few remaining path references I need to cleanup.
I do store all my code in a database. It's got time-travel functionality, the ability to switch into parallel universes, and a nice hierarchical view that lets me find things easily if I don't want to use my language-specific indexes.
Yes, that's git, a filesystem, and an IDE -- and the physical layout of the code isn't the way I normally navigate it. It's useful structure for the tooling, though.
It's definitely true that "using git" or "putting our code on the filesystem" aren't ends in themselves, they are means to an end. If we found a way to meet our requirements that has fewer trade-offs to git then I'm sure we'd jump. Git and filesystems are possibly the worst options for organising code and history, except for all the other options out there :P.
That's basically what an LSP is. It's true that it's built on top of the file system, and most IDE users will navigate using the folder hierarchy, but it still stores information about the name, type, and connectedness of the codebase, and allows querying. Your idea about arbitrary tags (feature, environment) would be useful but does not seem to be supported by the spec [^1] yet.
Lotus Notes did that. The database held the code, the data, the UX, the security. There was a standard UX for accessing different types of code, design elements, and the data.
On the positive side, DevOps was a breeze - push a DB to a server and everything just worked. Pushing new code to all the DBs was a breeze. Any dev could immediately jump into an app and have a sense of where they would find elements of the app. All apps ran the same way, so it was realistic for small shops to deliver large products.
On the downside, source control was sub-optimal. That was a weakness in the platform even 25 years ago when it was modern, and never quite improved... although there are ways to import/export the code to make it work with modern source control like git. It also made each app heavier than it needed to be - instead of sharing centralized code, each app had its own copy. Your infrastructure footprint got big, fast.
For a modern take on it, I think other comments are hitting the key point - you might want to have fuzzier definitions of what a database and a file system are. At the end of the day, they are both ways of storing data to disk with different access methods. But it sounds like you are more concerned about DX. To get to your vision, I'd focus more on an IDE that lets you navigate code how you desire, while leaving the actual code storage as a DevOps exercise where they can focus on whatever solutions optimizes delivery and reliability.
I looked into four interesting incarnations of this over the years:
1. During the peak phase of couchDB as application server (2006 - 2009) it was common to store not just the data but all the app assets and code in the database and replicate everything together. Plenty of the community tried to bring this to the extreme with every function being stored as versioned document (i see it as precursor to FAAS) and the whole application being editable with an integrated IDE. Also functions in my incarnation of this system were not loaded by filename but with a content addressed manifest. You would reference functions by name but the name would be resolved with a hash manifest.
2. There were several systems with erlang/BEAM to take the hot code replacement to the extreme in similar way, storing code in i believe mnesia.
3. I think bloomberg (i cannot find the hn post to confirm it was them, if someone has the link that would be great) has/had a bespoke code database with custom version control and fully integrated IDE. They leveraged this for some pretty interesting workflows
4. Probably not exactly what you mean as it does not include the runtime integration, but google and sourcegraph are building code databases with indices on symbols and semantic understanding of references and more. I hear great things from people who worked with it especially
I can think of an argument for justifying the status quo.
The folder structure reflects the subdivision of code into modules. Each module may have submodules, and each module decides the visibility of its children to other modules at the same level as itself, and to its own supermodule. This is a naturally hierarchical structure, which file systems lend themselves well to. A code database would have to replicate this structure within it somehow anyway.
A non-hierarchical tag system would help model situations where you have multiple orthogonal axes along which to organise the code (as you point out). But in these cases, which axis gets the top-level hierarchy just doesn't matter. Pick one, maybe loosely informed by organisational factors or by your problem conceptualisation.
On the flipside, in situations where a stricter hierarchy would improve modularity, the tag system might _discourage_ clean crystallisation, and cause responsibilities to bleed into each other. IMO, it's more important for there to be modules at all than for their boundaries to be perfect.
what problem should it solve? you can store anything in a db fetch it and run it. binaries, what not. parts of then. web components.
id ask, is it really a bottleneck. fetching code. maybe in some systems or types of execution environments it could be worth it. really dont know.
Id assume data is stored in databases because it needs to be viewed from different angles. (join statements etc ) and it has different peeformance and layout requirements.
most code is 'read only' too, so theres no need to do stuff like synchronization / locking on writes and ordering stuff.
then again, there might be systems that dont have this aspect, and somehow have very high load on fetching code, and maybe code is writable too, and could have queries to extract certain parts of code, or combined code from various files/tables.
i think tho the main reason is this difference between how code and data are fetched and used will be the reason why in the general case it works like it works. its not been needed to work differently. so no one looked for a solution. no big problems in the space. (my guessing)
Not only should code structure not be modeled in a database, even textual code shouldn't be stored in a database. We had version control systems that used databases, and they all sucked. A lot of that suckage had to do with the database.
I have been playing with the exact opposite, representing a database as a file structure where databases show up as top level folders, tables are subfolders, and each row appears as a form like file automatically generated from the schema. You can see a screenshot of such form in [1] which you can edit and save back, effectively enabling anyone familiar with Dropbox to edit data on a database as it just look like a form to fill
The project is oss [2] and the storage connector is "mysql". It even handles foreign key by creating links to another folder with a search query to find the table row it's associated with
There are programming languages that store code in some kind of non-hierarchical format. For example, Unison (https://www.unison-lang.org/) stores code in a database just as you suggest, and projects it down to text for editing. A more established example is probably Smalltalk, which stores the code as part of an image that is edited live in the Smalltalk environment.
On the other side, you can have filesystems that are not hierarchical, for example semantic filesystems like Tagsistant for Linux — these can be used for more flexible relationships between any kind of file, not just code.
The idea was that you don't have files, just functions that you can bring in and out of scope while editing. You have branches per-function. This all worked more or less transparently to the user using the normal emacs Sly Common Lisp flow.
It was implemented overriding the +DEFUN+ macro, so function re-definitions automatically serialized and created a new entry in the DB.
The Proof-of-Concept used SQLite, but I also envisioned a postgres backed version for jamming on programs with your friends in real time.
At first I couldn't see the point of it. But perhaps one could visualise a quite granular approach to code with rows corresponding to short blocks or lines of code. Worth considering, but would need a revolution in coding practice!
And the keyword here `database` not need to mean the typical one. In fact, most tools (like git) are databases over the code. IDE, parsers, etc. POOR ONES, and probably in the way of 'any program is a poorly implemented half of lisp', but intentionally create a database interface with a relational(enhanced!) view with intentional CRUD+Queries make a lot of sense.
the key idea is to admit/realize that a program is a graph, and that a flat string of characters is not an ideal way to store such complex structures.
once that leap is made, a whole lot of the complexities of namespaces, modules, source control, and parsing become much simpler/better. this comes at the cost of more complexity in the editor/infrastructure, but that is a singular place while in return it is simplifying every program written.
In order to uniquely reference one piece of code from another it needs to have a unique name/namespace/reference. Whatever organising principle you use for that will tend to become the hierarchy that your code is stored by.
This doesn't stop you from also accessing it in other ways. And with modern IDEs you can search across a fairly chunky codebase near-instantly, which would allow you to treat it as if it's in a database.
You are talking about taxonomy, and specifically multifaceted classification. In practice a module system with unique names is sufficient- like for example npm.
However its worth noting that all of the systems that rely on databases to store code (SharePoint, SAP, Power Platform) suck haaaaaaard, mainly due to issues with versioning and configuration management.
Isn't the filesystem also a KV database, depending on it some even have versioning and deduplication. Although I agree a language focused on being stored in a SQL database would offer new capabilities for versioning not available in stuff like git or svn/hg. With that also new challenges will arise the will need to be explored.
Slang code at Goldman Sachs is all stored in a database, which is very useful 'cos if someone is having a problem with part of the infra that you're responsible, you can access their scripts and dependent libraries (assuming they decide to make them public)
IBM Visual Age for Java was the first product I used that did this. The problem was that all utilities and other processes required access to the source code, which had to first be exported from the database then re-imported.
The code is meant for humans, and in my view, the database is not really for humans.
There are plenty of tools in IDEs that transform code into a 'database,' and in 20 years I've never really needed to use them on a daily basis.
Except for getting familiar with legacy code, and even then to create a mental map, the database seems like overkill to me.
That said, I should note that I've primarily worked on startup or business codebases, not FAANG or equivalent ones.
Edit : Reformulation because I was voted down : Why change the storage format if the IDE already manage it ? And I add, For storing in database, you have to think about the granularity of your data. and it rapdily become the line, if not the character. Working daily with code stored in database, (salesforce), where the granularity is the class, is really anigthmare from a Content Version point of view.
1. The biggest advantage of storing on the filesystem is that Files are the core primitive on UNIX based operating systems, and is core-enough on Windows. Giving that up would require tremendously good reasons.
2. Everyone organizes projects into folders differently, but in most languages the only reason why you organize things into folders is to make it easier for humans to find things. The computer doesn't care where the files are stored. So, you're proposing: Give up a feature that exists solely to make it easier for humans to find things; its extremely difficult to envision a world where this results in a more ergonomic world.
3. The hierarchy is only one way of thinking about how you can browse a filesystem. There is nothing stopping editors from indexing files in different ways, allowing you to browse by, for example, files tagged with some comment at the top, or files which contain classes versus interfaces. In fact, more comprehensive IDEs like JetBrains already do this for some languages. You don't have to change the storage substrate to get 100% of the benefits you propose, with the extremely small cost of some indexing process when a project is first opened.
4. There are twenty billion programming languages out there, like four of them do what you're suggesting, yet no one uses any of them for anything meaningful. There is nothing, period, stopping any new language from doing something like this. Golang could have been designed like this; but it wasn't.
Currently working with salesforce, where code is stored in the database ... to put it simply it's not a really good idea because they never decide to go full db or full files. SO you have to manage files in git and deploy to database. So multiple source of truth : sf db, disk, you git, git of other dev, and central repo.
Welcome in 1990 when you work as a team. ie ask your collegue if they already work on component if you need to change it ...
> Golang could have been designed like this; but it wasn't.
Funny choice to use as an example. Go was designed by the same people who designed Plan 9. If no other language out there used files, Go would have still chosen to use files.
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers.""
https://www.google.com/books/edition/Mastering_ENVY_Develope...
~
pdf 1992 Product Review: Object Technology’s ENVY Developer
http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...
You don't have to store things in a database to do this. Code is almost almost read from disk into some kind of in memory data structure that is amenable to such analysis, maybe even more so than a generic database. Doesn't matter if you use vscode or vim, most developers have some kind of tool that does semantic analysis and which affords navigation and organization of code.
Its just that the main way code editors present navigation follows the path hierarchy, also because its often intimately tied to how programming languages shape modules. Most editors have at least some alternative navigation however, and most people are using at least some of them: outlining by declaration symbols, search, changes, unittests, open files, bookmarks, etc.
So in a way, this is already how it is done, except the 'database' part is really tied to the code editor and its storage component nicely decoupled (in the end, databases are usually also just a bunch of files).
I think any real improvements on this model can only come from a new programming language design, and as others have pointed out, this hasn't caught on in the past. The reason for this is probably not that file oriented modularity is the best thing there is, but rather the escape velocity needed to get out of the vast ecosystem of tooling around files, like the OS, git and existing code editors and whatnot.
Hmmm... Smalltalk, a pure object-oriented language, stores everything in an image, and has tons of different browsers to inspect its "object soup". Install a Squeak Smalltalk if you're curious :-)
Userland Frontier was a wonderful scripting environment born on macOS and ported to Windows. It was a mix of an object database, storing code and data, an extensible scripting language called UserScript, and very powerful InterApplication capabilities, based on Apple's Open Scripting Architecture. Dave Winer, its author, worked on the XML-RPC standard afterwards.
Smalltalk stores:
memory snapshot "image"
AND "change log" text file
AND "sources" text file.
https://cuis-smalltalk.github.io/TheCuisBook/Code-Management...
If the "sources" file is missing the byte code will be decompiled to show class and method definitions, but the original names will be unknown.
It is only an implementation detail. It is a matter of change of a few methods only to store sources directly in methods. I tried that once.
It's all only "an implementation detail".
Some of them are documented and expected.
SAP did it - they store the code in the database https://www.reddit.com/r/SAP/comments/jsgb1c/where_and_how_i...
In the Unison language, code is stored in a database, with a hash code of its content as the key. Quoting https://www.unison-lang.org :
A new approach to Storing code. Other tools try to recover structure from text; Unison stores code in a database. This eliminates builds, provides for instant nonbreaking renames, type-based search, and lots more.
I don't think this is unique to code, but a limitation of filesystems in general. You could make the same argument for photos: I want them sorted by date, by tag, by person in the image, by location.
I can do this in Lightroom or my "Photo" app, but then you are always reliant on some third-party tool. It would be nice if there was some native way for files to not have to commit to a single hierarchy, but able to switch views on the fly (without it being insanely slow for larger amount of files).
We did this for a long time for our CMS - although we did simulate a filesystem structure. We also set up a git-like system to store versioning information and set up WebDav to mount it all and allow direct source code editing. It worked pretty well for years.
We eventually stopped because we were relying much more on external tools (eg npm, webpack) which had all sort of issues over webdav mounts. Maintaining all this code management infrastructure in parallel wasn't worth it in the end, and we moved the code back to disk, switched to git, etc.
And photoshop silently ignoring webdav I/O errors when saving designs didn't help either.
You already have tagging by type on the filesystem - the file extension. That allows you to limit file searches. Add extra metadata to extensions if the same extensions have different roles (.backend.ts, .frontend.ts, .html.template, .text.template)
These days I prefer to structure for easy removal of code - everything for eg. a widget (frontend, backend, css) goes into a folder and I only need to remove that folder when the widget is retired, and linting/validation will show me the few remaining path references I need to cleanup.
I do store all my code in a database. It's got time-travel functionality, the ability to switch into parallel universes, and a nice hierarchical view that lets me find things easily if I don't want to use my language-specific indexes.
Yes, that's git, a filesystem, and an IDE -- and the physical layout of the code isn't the way I normally navigate it. It's useful structure for the tooling, though.
It's definitely true that "using git" or "putting our code on the filesystem" aren't ends in themselves, they are means to an end. If we found a way to meet our requirements that has fewer trade-offs to git then I'm sure we'd jump. Git and filesystems are possibly the worst options for organising code and history, except for all the other options out there :P.
That's basically what an LSP is. It's true that it's built on top of the file system, and most IDE users will navigate using the folder hierarchy, but it still stores information about the name, type, and connectedness of the codebase, and allows querying. Your idea about arbitrary tags (feature, environment) would be useful but does not seem to be supported by the spec [^1] yet.
[^1]: https://microsoft.github.io/language-server-protocol/specifi...
Lotus Notes did that. The database held the code, the data, the UX, the security. There was a standard UX for accessing different types of code, design elements, and the data.
On the positive side, DevOps was a breeze - push a DB to a server and everything just worked. Pushing new code to all the DBs was a breeze. Any dev could immediately jump into an app and have a sense of where they would find elements of the app. All apps ran the same way, so it was realistic for small shops to deliver large products.
On the downside, source control was sub-optimal. That was a weakness in the platform even 25 years ago when it was modern, and never quite improved... although there are ways to import/export the code to make it work with modern source control like git. It also made each app heavier than it needed to be - instead of sharing centralized code, each app had its own copy. Your infrastructure footprint got big, fast.
For a modern take on it, I think other comments are hitting the key point - you might want to have fuzzier definitions of what a database and a file system are. At the end of the day, they are both ways of storing data to disk with different access methods. But it sounds like you are more concerned about DX. To get to your vision, I'd focus more on an IDE that lets you navigate code how you desire, while leaving the actual code storage as a DevOps exercise where they can focus on whatever solutions optimizes delivery and reliability.
I looked into four interesting incarnations of this over the years:
1. During the peak phase of couchDB as application server (2006 - 2009) it was common to store not just the data but all the app assets and code in the database and replicate everything together. Plenty of the community tried to bring this to the extreme with every function being stored as versioned document (i see it as precursor to FAAS) and the whole application being editable with an integrated IDE. Also functions in my incarnation of this system were not loaded by filename but with a content addressed manifest. You would reference functions by name but the name would be resolved with a hash manifest.
2. There were several systems with erlang/BEAM to take the hot code replacement to the extreme in similar way, storing code in i believe mnesia.
3. I think bloomberg (i cannot find the hn post to confirm it was them, if someone has the link that would be great) has/had a bespoke code database with custom version control and fully integrated IDE. They leveraged this for some pretty interesting workflows
4. Probably not exactly what you mean as it does not include the runtime integration, but google and sourcegraph are building code databases with indices on symbols and semantic understanding of references and more. I hear great things from people who worked with it especially
I can think of an argument for justifying the status quo.
The folder structure reflects the subdivision of code into modules. Each module may have submodules, and each module decides the visibility of its children to other modules at the same level as itself, and to its own supermodule. This is a naturally hierarchical structure, which file systems lend themselves well to. A code database would have to replicate this structure within it somehow anyway.
A non-hierarchical tag system would help model situations where you have multiple orthogonal axes along which to organise the code (as you point out). But in these cases, which axis gets the top-level hierarchy just doesn't matter. Pick one, maybe loosely informed by organisational factors or by your problem conceptualisation.
On the flipside, in situations where a stricter hierarchy would improve modularity, the tag system might _discourage_ clean crystallisation, and cause responsibilities to bleed into each other. IMO, it's more important for there to be modules at all than for their boundaries to be perfect.
what problem should it solve? you can store anything in a db fetch it and run it. binaries, what not. parts of then. web components.
id ask, is it really a bottleneck. fetching code. maybe in some systems or types of execution environments it could be worth it. really dont know.
Id assume data is stored in databases because it needs to be viewed from different angles. (join statements etc ) and it has different peeformance and layout requirements.
most code is 'read only' too, so theres no need to do stuff like synchronization / locking on writes and ordering stuff.
then again, there might be systems that dont have this aspect, and somehow have very high load on fetching code, and maybe code is writable too, and could have queries to extract certain parts of code, or combined code from various files/tables.
i think tho the main reason is this difference between how code and data are fetched and used will be the reason why in the general case it works like it works. its not been needed to work differently. so no one looked for a solution. no big problems in the space. (my guessing)
Not only should code structure not be modeled in a database, even textual code shouldn't be stored in a database. We had version control systems that used databases, and they all sucked. A lot of that suckage had to do with the database.
What benefits do you expect from this approach compared to the hugh number of tools that work very well with folders and text files?
Imagine reviewing changes where you changed the name of a function and the name of a field.
Current Text based approach. 187 files changed.
Structured Data approach. Function Fobar changed to FooBar. Struct Baz field vargle changed to bargle.
I have been playing with the exact opposite, representing a database as a file structure where databases show up as top level folders, tables are subfolders, and each row appears as a form like file automatically generated from the schema. You can see a screenshot of such form in [1] which you can edit and save back, effectively enabling anyone familiar with Dropbox to edit data on a database as it just look like a form to fill
The project is oss [2] and the storage connector is "mysql". It even handles foreign key by creating links to another folder with a search query to find the table row it's associated with
[1] https://i.imgur.com/OBJGIeg.png
[2] https://github.com/mickael-kerjean/filestash
How easily can you break data consistency?
From two directions:
There are programming languages that store code in some kind of non-hierarchical format. For example, Unison (https://www.unison-lang.org/) stores code in a database just as you suggest, and projects it down to text for editing. A more established example is probably Smalltalk, which stores the code as part of an image that is edited live in the Smalltalk environment.
On the other side, you can have filesystems that are not hierarchical, for example semantic filesystems like Tagsistant for Linux — these can be used for more flexible relationships between any kind of file, not just code.
I implemented something similar for Common Lisp: https://github.com/marcecoll/rekishi
The idea was that you don't have files, just functions that you can bring in and out of scope while editing. You have branches per-function. This all worked more or less transparently to the user using the normal emacs Sly Common Lisp flow.
It was implemented overriding the +DEFUN+ macro, so function re-definitions automatically serialized and created a new entry in the DB.
The Proof-of-Concept used SQLite, but I also envisioned a postgres backed version for jamming on programs with your friends in real time.
I bet you could do something similar with Prolog
> To me it seems obvious that code should be stored in a database
Where are you storing code if not in a database?
> rather than a hierarchical, text-based format.
Okay, so you mean not a hierarchical database, but rather a... Relational database, I guess?
> The main way we navigate and organize code is by folder hierarchies.
Organize I can buy, I suppose. But I navigate by AST representation (as provided by an LSP in this day of age). It turns out code is a database too!
> Rather than folders and file names, everything could just be tagged in different ways.
So you are looking for WinFS? While it suffered from many technical issues, its biggest problem is that users really didn't gain much from it.
At first I couldn't see the point of it. But perhaps one could visualise a quite granular approach to code with rows corresponding to short blocks or lines of code. Worth considering, but would need a revolution in coding practice!
There's CodeQL, but it seems to be mostly limited to security research (code review automation to find vulns). See: https://codeql.github.com/
Yes.
And the keyword here `database` not need to mean the typical one. In fact, most tools (like git) are databases over the code. IDE, parsers, etc. POOR ONES, and probably in the way of 'any program is a poorly implemented half of lisp', but intentionally create a database interface with a relational(enhanced!) view with intentional CRUD+Queries make a lot of sense.
the key idea is to admit/realize that a program is a graph, and that a flat string of characters is not an ideal way to store such complex structures.
once that leap is made, a whole lot of the complexities of namespaces, modules, source control, and parsing become much simpler/better. this comes at the cost of more complexity in the editor/infrastructure, but that is a singular place while in return it is simplifying every program written.
In order to uniquely reference one piece of code from another it needs to have a unique name/namespace/reference. Whatever organising principle you use for that will tend to become the hierarchy that your code is stored by.
This doesn't stop you from also accessing it in other ways. And with modern IDEs you can search across a fairly chunky codebase near-instantly, which would allow you to treat it as if it's in a database.
You are talking about taxonomy, and specifically multifaceted classification. In practice a module system with unique names is sufficient- like for example npm.
However its worth noting that all of the systems that rely on databases to store code (SharePoint, SAP, Power Platform) suck haaaaaaard, mainly due to issues with versioning and configuration management.
Isn't the filesystem also a KV database, depending on it some even have versioning and deduplication. Although I agree a language focused on being stored in a SQL database would offer new capabilities for versioning not available in stuff like git or svn/hg. With that also new challenges will arise the will need to be explored.
there's this interesting proof of concept written in common lisp:
https://github.com/projectured/projectured
the depth work is almost done. IIRC there are only a couple of nontrivial issues left, but it's been abandoned.
Slang code at Goldman Sachs is all stored in a database, which is very useful 'cos if someone is having a problem with part of the infra that you're responsible, you can access their scripts and dependent libraries (assuming they decide to make them public)
IBM Visual Age for Java was the first product I used that did this. The problem was that all utilities and other processes required access to the source code, which had to first be exported from the database then re-imported.
> To me it seems obvious that code should be stored in a database rather than a hierarchical, text-based format.
So, granted a filesystem does exhibit CRUD, and hierarchical relations, it's already a relational database.
I take this as you are arguing about the utility of a text based format?
The code is meant for humans, and in my view, the database is not really for humans. There are plenty of tools in IDEs that transform code into a 'database,' and in 20 years I've never really needed to use them on a daily basis. Except for getting familiar with legacy code, and even then to create a mental map, the database seems like overkill to me. That said, I should note that I've primarily worked on startup or business codebases, not FAANG or equivalent ones.
Edit : Reformulation because I was voted down : Why change the storage format if the IDE already manage it ? And I add, For storing in database, you have to think about the granularity of your data. and it rapdily become the line, if not the character. Working daily with code stored in database, (salesforce), where the granularity is the class, is really anigthmare from a Content Version point of view.
https://en.wikipedia.org/wiki/Source_Code_in_Database
1. The biggest advantage of storing on the filesystem is that Files are the core primitive on UNIX based operating systems, and is core-enough on Windows. Giving that up would require tremendously good reasons.
2. Everyone organizes projects into folders differently, but in most languages the only reason why you organize things into folders is to make it easier for humans to find things. The computer doesn't care where the files are stored. So, you're proposing: Give up a feature that exists solely to make it easier for humans to find things; its extremely difficult to envision a world where this results in a more ergonomic world.
3. The hierarchy is only one way of thinking about how you can browse a filesystem. There is nothing stopping editors from indexing files in different ways, allowing you to browse by, for example, files tagged with some comment at the top, or files which contain classes versus interfaces. In fact, more comprehensive IDEs like JetBrains already do this for some languages. You don't have to change the storage substrate to get 100% of the benefits you propose, with the extremely small cost of some indexing process when a project is first opened.
4. There are twenty billion programming languages out there, like four of them do what you're suggesting, yet no one uses any of them for anything meaningful. There is nothing, period, stopping any new language from doing something like this. Golang could have been designed like this; but it wasn't.
Currently working with salesforce, where code is stored in the database ... to put it simply it's not a really good idea because they never decide to go full db or full files. SO you have to manage files in git and deploy to database. So multiple source of truth : sf db, disk, you git, git of other dev, and central repo. Welcome in 1990 when you work as a team. ie ask your collegue if they already work on component if you need to change it ...
> Golang could have been designed like this; but it wasn't.
Funny choice to use as an example. Go was designed by the same people who designed Plan 9. If no other language out there used files, Go would have still chosen to use files.
>text-based format
I'm sorry i don't read binary files.
Don't need to put the code in a database to do that. You can do that entirely with specially formatted comments and a projectional editor.
Best solution to date?
Maybe I’m being pedantic, but isn’t a database ultimately a bunch of files on disk? Unless you’re using a pure in-memory DB?
You are.
[dead]