Almost all of this is solved by basically putting quotes around strings.
Yaml has its uses cases where you want things json doesnt do like recursion or anchors/aliases/tags. Or at least it has had - perhaps cue/dhall/hcl solves things better. Jsonnet is another. I havent tried enough to test how much better they are.
I feel like these two tenets - (1) yaml should require quotes & (2) the value in yaml is in recursion/anchors - are fundamentally the opposite of why yaml exists & why people use it.
The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.
Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.
Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.
Genuinely curious - What major flaws does TOML have? I've used it before and it seems like a simple no-nonsense config language. Plenty of blog articles about the flaws behind YAML, I don't really see complaints about TOML!
I see where you are coming from but YAML anchors are definitely a great and powerful feature that deserves more attention. The other day I was refactoring a broken [1] k8s deployment based on a 3rd-party Helm chart and since I didn't have the time to migrate to a better chart, YAML anchors permitted me to easily reduce YAML duplication, with everything else (Helm, Kustomize, Flux, Kubernetes) completely unaware of anything. Just a standard YAML pattern.
[1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work
> The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters.
Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.
It's because JSON doesn't have comments.
If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.
But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.
JSONC is JSON with comments (and trailing commas) and it's fairly widely supported, namely because VS Code ships with support built in and they use it for all their config files. I've seen libraries for a number of languages.
VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).
This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?
>Many of the problems with yaml are caused by unquoted things that look like strings but behave differently. This is easy to avoid: always quote all strings.
> Almost all of this is solved by basically putting quotes around strings.
Yeah, that was my first thought as well. I personally don't mind YAML, but I've also made a habit out of quoting strings. And, I mean, you're quoting both keys and strings in JSON, so you're still saving approx. 2 double quotes per key/value pair in YAML if that's a metric that's important to you.
The argument was that most of the mentioned problems could be solved by quoting the values. I don't have a problem with avoiding "on" as a key, and I apparently haven't used it ever, because I've never run into this particular problem in my 15+ years using YAML.
So, sure, if you want to play it super safe, quote keys as well. But I'm personally fine with the trade-off in not quoting keys.
JSON doesn’t do them as part of the spec, but there’s nothing stopping you from doing them as post-processing. Eg OpenAPI does it by using a special $ref key where the post processor swaps in the value referenced there.
That’s effectively what jsonnet/cue/hcl do, though as a preprocessor instead of a postprocessor.
It's very fair to cry "why the hell do I need a linter for my trivial config file format", and these footguns are a valid reason to avoid YAML.
But overall YAML's sketchiness is a pretty easy problem to solve and if you have a good reason to keep/choose YAML, and a context where adding a linter is viable, it's not really a big deal IMO.
And as hinted in the post, there's really no well-established universal alternative. TOML is a good default but it's only usable for pretty straightforward stuff. I'm personally a fan of the "just use Nix" approach but you can't put a Nix interpreter everywhere. And Cue is way overpowered for most usecases.
I guess the tldr is that the takeaway isn't "don't use YAML" but just "beware of YAML footguns, know the alternatives".
Jsonnet is pretty nice but the library support isn't quite as good. There are some nice libraries for yaml that do round trip processing for example so you can modify a yaml programmatically and keep comments. Yaml certainly has some warts (and a few things that are just frankly moronic) but it deserves some credit for hitting the sweet spot in a bunch of ways.
It's very counter-intuitive to me that 22:22 would need to be a quoted string, since functionally it's a K-V-pair. YAML itself even uses : in the Dict syntax!
It's a key pair in whatever thing reads the YAML and then assign some meaning to that string. In YAML you need to put a space between the semi-colon and the value.
Whoever thought supporting sexagesimal numbers was a good idea needs to spend some extended time away from their computer to reflect on what they’ve done
That makes sense, but I think the vast majority of tools that need time values would actually expect users to just input a string and parse that themselves.
IMO anything other than the basic types supported by JSON (number, true, false, null) ought to be be parsed as a string. Or if you really insist, some kind of special syntax to make it clear it's not a string would probably be acceptable.
We wanted a file format that's easy to read and less verbose than xml and all we got was something that is so full of pitfalls that it would be easier just not to use it.
This is basically every problem in YAML. Someone couldn't resist adding more stuff and either didn't realise or didn't care about the ambiguities it created.
It basically feels like overfitting. They saw some use case so they added it. But they didn't think about how this would generalize and now this nice use case is disproportionately supported at the cost of surprising everyone who doesn't need time-of-day fields in their file.
In a lot of the Ansible documentation, yes/no are used instead of true/false. When seeing this in the official docs, I used it, figuring this was the preferred convention in Ansible. These days it now throws warnings or lint errors, so I’m updating it all over the places as I find it. Yet the Ansible documentation still commonly uses it.
Ansible isn't a gold standard for docs. The docs are updated and maintained, but the underlying interfaces aren't consistent and that leaks to the docs. One can only wonder why, maybe different developers with different ideas for conventions without a style guide.
Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
> Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
The only advantage Ansible has is how easy it is to start with it - you don't need to deploy agents or even understand a lot about how it works.
Trouble is, it doesn't really scale. It's pretty slow when running against a bunch of machines, and large configurations get unwieldily quickly (be it because of YAML when in large documents its impossible to orient/know what is where/at what level, or because of the structure of playbooks vs roles vs whatever, or because templating a whitespace-as-logic-"language" is just hell). It's also fun to debug "missing X at line A, but the error can be somewhere else". Cool, thanks for the tip.
So it's pretty great to get started with, or at a home lab. Big organisations struggling with it is a bit weird.
I found job slicing speeds up jobs dramatically. In a test I did recently it dropped the time from nearly 4 hours, down to 17 minutes, for an inventory of about 4500 hosts.
It depends on how they parse/decode/unmarshal the file. If they use a "generic" yaml parser, no will be translated to false. But if the parser knows the types of the data structure, or can be instructed not to replace certain strings, or has hooks, it can treat no as a string. So it might be that the linter doesn't operate like the parser.
Halloween isn't for a few more weeks, but this framework for creating bespoke YAML dialects that can only be parsed by a specific implementation and with the correct type annotations will scare the pants off of your devops colleagues around the campfire.
(In case I haven't succeeded in hitting the right tone, this is intended to be good-natured jest and not snark.)
Well, JSON cannot represent dates (nor Sets, Maps, NaN, etc.), so quite a few applications with a JSON parser have their own conversion (e.g. seconds since epoch, string parsing, object with date fields). Is that a bespoke JSON dialect that scares the pants off?
Now, JSON is more suited for machine-to-machine, but YAML works fairly well for humans. It's a pity, but a few domain specific don't really hurt, since you can't copy some bit of YAML and paste it in an entirely different config anyway.
PS campfire story? "When we were still working in the old building, deep down in the cellar, there was a colleague who had been there since the early days. Nobody saw him arrive at work or leave. It was as if he was always there. One of the things he had written was a custom parser ... FOR YAML!"
I find it remarkable that YAML has become our goto for configuration when it is riddled with parsing traps and inconsistent behaviour that catches out even experienced developers
And furthermore I find it remarkable how much people like the visual format where you indent nested things with whitespace. I'm pretty sure it's the main reason Python took off as well.
IMO, JSON, YAML, and TOML should all interpret all keys as strings, and only enforce quotes when syntactically necessary.
So, `key1` is a string and doesn't need to be quoted. `12345` as a key is interpreted as a string (because keys are strings) and doesn't need to be quoted. `"key 1"` has a space, so it needs to be quoted.
Specs change from time to time. It requires effort. Nothing new here. It's necessary sometimes. Dealing with annoyances and footguns also takes effort.
I have always thought that there is a place for YAML but I do tend to avoid it when I can. I will say while working with terraform I have absolutely falled in love with HCL. It makes a lot of sense to me and there are a lot of validating you can do along the way leading to much more confidence in larger setups. iAC in my case at least.
not only is YAML a pain but JSON has native parser in major languages, while not yaml. I find it crazy some people are still actively choosing this over JSON (or alternatives)
I wish I had a good answer for you. I've been dissatisfied with Dhall, Nickle, Cue, and possibly others. Dhall's type system is both too strong (you have to plumb type variables by hand if you want to do any kind of routine FP idioms) and too weak (you can't really _do_ much with record types - it's really hard to swizzle and rearrange deeply nested records).
On top of that, the grammar is quite difficult to parse. You need a parser that can keep several candidate parses running in parallel (like the classic `Parser a = Parser (String -> [(a, String)])` type) to disambiguate some of the gnarlier constructs (maybe around file paths, URLs, and record accesses? I forget). The problem with this is that it makes the parse errors downright inscrutable, because it's hard to know when the parse you actually intended was rejected by the parser when the only error you get was "Unexpected ','".
Oh, and you can't multiply integers together, only naturals.
Maybe Nix in pure eval mode, absurd as that sounds?
I think the best thing for tools to do is to take and return JSON (possible exception: tools whose format is simple enough for old-school UNIX-style stdin/stdout file formats). Someone will come up with a good functional abstraction over JSON eventually, and until then you can make do with Dhall, YAML, or whatever else.
For configuration I dislike the XML object model KDL is built around. It needlessly complicates things to have two different incompatible ways (properties and children) of nesting configuration keys under an element.
Pkl seems syntactically beautiful and powerful, but having types and functions and loops makes it a lot more complicated than the dead-simple JSON data model that YAML is based on.
In JSON I often end up recreating XML attributes equivalent for metadata fields and using custom prefixes to differentiate those fields from actual data. I find it's nice the data/metadata separation at the language level.
Which already exists and is called StrictYAML. It's just strings, lists and dicts. No numbers. No booleans. No _countries_. No anchors. No JSON-compatible blocks. So, essentially it's what most of use think as being proper YAML, without all the stupid/bad/overcomplicated stuff. Just bring your own schema and types where required.
We found yaml to be a great exchange format for electronic exam data. It allows us to put student submitted answers and source code into a yaml file and there is no weird escaping. It's very readable with a text editor. And then we just add notes and a score as a list below and then there's the next submission.
For readability of large blocks of texts that may or may not contain various special characters and newlines the only other alternative we have seen was XML, but that is very verbose.
So what the author finds as a negative, the many string formats, are exactly what drew us to yaml in the first place.
Somebody in these discussions always correctly points out that s-expressions are as expressive as XML but without the excess line noise, so it might as well be me.
It's really interesting that after all these years we still don't have a document format that just works. They all suck in their own sweet ways and we still have culture wars over them.
Not many know that the inventor of the YAML specification built a fully working pendulum clock as a teenager. With Lego bricks. YAML is a good standard for simple settings files. For more complex data structures, use JSON.
This would be a massive breaking change for Kubernetes. There are piles and piles of YAML all around the opensource that would need updating. It would be very hard to adopt.
Also, quoting strings 100% of the time just looks ugly in my opinion. Not a big deal with autogenerated YAML, or YAML that I do not maintain, but for anything handwritten it's annoying.
This one is amazing, I almost pissed myself laughing reading it. So true about YAML. Another caveat is using --- as section separator in the file. It will starts new file inside your existing file.
More seriously: this is a good overview of the reasons I dislike YAML as a web configuration language. There's too much overlap between the "friendly" auto-type-determination in YAML and the symbols used in web tech, from colons to Norway having a TLD. It wouldn't be so bad if yaml parsers could use expected type of each value as a hint, but that's not a feature in any parser I've met, so I'd rather just not use yaml for anything that's going to end up describing a web service.
So, at what point does YAML needing magic incantations, wrapping everything in quotes, avoiding any form of templating, etc. stop being less verbose (oops, meant noisy), and "annoying?"
Reality is, clunky XML is badly designed, or simply has no schema attached.
Almost all of this is solved by basically putting quotes around strings.
Yaml has its uses cases where you want things json doesnt do like recursion or anchors/aliases/tags. Or at least it has had - perhaps cue/dhall/hcl solves things better. Jsonnet is another. I havent tried enough to test how much better they are.
I feel like these two tenets - (1) yaml should require quotes & (2) the value in yaml is in recursion/anchors - are fundamentally the opposite of why yaml exists & why people use it.
The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.
Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.
Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.
Genuinely curious - What major flaws does TOML have? I've used it before and it seems like a simple no-nonsense config language. Plenty of blog articles about the flaws behind YAML, I don't really see complaints about TOML!
I see where you are coming from but YAML anchors are definitely a great and powerful feature that deserves more attention. The other day I was refactoring a broken [1] k8s deployment based on a 3rd-party Helm chart and since I didn't have the time to migrate to a better chart, YAML anchors permitted me to easily reduce YAML duplication, with everything else (Helm, Kustomize, Flux, Kubernetes) completely unaware of anything. Just a standard YAML pattern.
[1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work
> The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters.
Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.
It's because JSON doesn't have comments.
If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.
But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.
JSONC is JSON with comments (and trailing commas) and it's fairly widely supported, namely because VS Code ships with support built in and they use it for all their config files. I've seen libraries for a number of languages.
VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).
> It's because JSON doesn't have comments.
This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?
Most protocols defined in RFCs require the use of regular JSON. You don’t have a choice.
from the article:
>Many of the problems with yaml are caused by unquoted things that look like strings but behave differently. This is easy to avoid: always quote all strings.
> Almost all of this is solved by basically putting quotes around strings.
Yeah, that was my first thought as well. I personally don't mind YAML, but I've also made a habit out of quoting strings. And, I mean, you're quoting both keys and strings in JSON, so you're still saving approx. 2 double quotes per key/value pair in YAML if that's a metric that's important to you.
As the article points out with the `on` example, you really have to quote yaml keys as well, if you want the defense to work...
The argument was that most of the mentioned problems could be solved by quoting the values. I don't have a problem with avoiding "on" as a key, and I apparently haven't used it ever, because I've never run into this particular problem in my 15+ years using YAML.
So, sure, if you want to play it super safe, quote keys as well. But I'm personally fine with the trade-off in not quoting keys.
JSON doesn’t do them as part of the spec, but there’s nothing stopping you from doing them as post-processing. Eg OpenAPI does it by using a special $ref key where the post processor swaps in the value referenced there.
That’s effectively what jsonnet/cue/hcl do, though as a preprocessor instead of a postprocessor.
Yeah and this is enforced by default in yamllint.
It's very fair to cry "why the hell do I need a linter for my trivial config file format", and these footguns are a valid reason to avoid YAML.
But overall YAML's sketchiness is a pretty easy problem to solve and if you have a good reason to keep/choose YAML, and a context where adding a linter is viable, it's not really a big deal IMO.
And as hinted in the post, there's really no well-established universal alternative. TOML is a good default but it's only usable for pretty straightforward stuff. I'm personally a fan of the "just use Nix" approach but you can't put a Nix interpreter everywhere. And Cue is way overpowered for most usecases.
I guess the tldr is that the takeaway isn't "don't use YAML" but just "beware of YAML footguns, know the alternatives".
Jsonnet is pretty nice but the library support isn't quite as good. There are some nice libraries for yaml that do round trip processing for example so you can modify a yaml programmatically and keep comments. Yaml certainly has some warts (and a few things that are just frankly moronic) but it deserves some credit for hitting the sweet spot in a bunch of ways.
It's very counter-intuitive to me that 22:22 would need to be a quoted string, since functionally it's a K-V-pair. YAML itself even uses : in the Dict syntax!
It's a key pair in whatever thing reads the YAML and then assign some meaning to that string. In YAML you need to put a space between the semi-colon and the value.
The n, no, off thing is just sad. It's a 100% avoidable issue. But whoever put that into spec was just so clever that they overflew and became stupid.
Whoever thought supporting sexagesimal numbers was a good idea needs to spend some extended time away from their computer to reflect on what they’ve done
Presumably that was to support time values.
That makes sense, but I think the vast majority of tools that need time values would actually expect users to just input a string and parse that themselves.
IMO anything other than the basic types supported by JSON (number, true, false, null) ought to be be parsed as a string. Or if you really insist, some kind of special syntax to make it clear it's not a string would probably be acceptable.
We wanted a file format that's easy to read and less verbose than xml and all we got was something that is so full of pitfalls that it would be easier just not to use it.
This is basically every problem in YAML. Someone couldn't resist adding more stuff and either didn't realise or didn't care about the ambiguities it created.
It basically feels like overfitting. They saw some use case so they added it. But they didn't think about how this would generalize and now this nice use case is disproportionately supported at the cost of surprising everyone who doesn't need time-of-day fields in their file.
Too clever by half
Discussion from 3 years ago, when this was originally posted:
https://news.ycombinator.com/item?id=34351503 , 566 points, 358 comments
I think this article gets posted about every quarter.
The Norway problem drives me a bit nuts.
In a lot of the Ansible documentation, yes/no are used instead of true/false. When seeing this in the official docs, I used it, figuring this was the preferred convention in Ansible. These days it now throws warnings or lint errors, so I’m updating it all over the places as I find it. Yet the Ansible documentation still commonly uses it.
Ansible isn't a gold standard for docs. The docs are updated and maintained, but the underlying interfaces aren't consistent and that leaks to the docs. One can only wonder why, maybe different developers with different ideas for conventions without a style guide.
Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
> Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
The only advantage Ansible has is how easy it is to start with it - you don't need to deploy agents or even understand a lot about how it works.
Trouble is, it doesn't really scale. It's pretty slow when running against a bunch of machines, and large configurations get unwieldily quickly (be it because of YAML when in large documents its impossible to orient/know what is where/at what level, or because of the structure of playbooks vs roles vs whatever, or because templating a whitespace-as-logic-"language" is just hell). It's also fun to debug "missing X at line A, but the error can be somewhere else". Cool, thanks for the tip.
So it's pretty great to get started with, or at a home lab. Big organisations struggling with it is a bit weird.
I found job slicing speeds up jobs dramatically. In a test I did recently it dropped the time from nearly 4 hours, down to 17 minutes, for an inventory of about 4500 hosts.
Seems like the right answer is "bootstrap your daemon installs with Ansible and then use something that scales better that runs on those daemons."
What are the best practices along these lines? What's the "something better"?
Curious about this myself!
It depends on how they parse/decode/unmarshal the file. If they use a "generic" yaml parser, no will be translated to false. But if the parser knows the types of the data structure, or can be instructed not to replace certain strings, or has hooks, it can treat no as a string. So it might be that the linter doesn't operate like the parser.
Halloween isn't for a few more weeks, but this framework for creating bespoke YAML dialects that can only be parsed by a specific implementation and with the correct type annotations will scare the pants off of your devops colleagues around the campfire.
(In case I haven't succeeded in hitting the right tone, this is intended to be good-natured jest and not snark.)
Well, JSON cannot represent dates (nor Sets, Maps, NaN, etc.), so quite a few applications with a JSON parser have their own conversion (e.g. seconds since epoch, string parsing, object with date fields). Is that a bespoke JSON dialect that scares the pants off?
Now, JSON is more suited for machine-to-machine, but YAML works fairly well for humans. It's a pity, but a few domain specific don't really hurt, since you can't copy some bit of YAML and paste it in an entirely different config anyway.
PS campfire story? "When we were still working in the old building, deep down in the cellar, there was a colleague who had been there since the early days. Nobody saw him arrive at work or leave. It was as if he was always there. One of the things he had written was a custom parser ... FOR YAML!"
Has this really been a problem in the last ten years? Version 1.2 of the spec (if I recall) fixed it in 2009.
I find it remarkable that YAML has become our goto for configuration when it is riddled with parsing traps and inconsistent behaviour that catches out even experienced developers
And furthermore I find it remarkable how much people like the visual format where you indent nested things with whitespace. I'm pretty sure it's the main reason Python took off as well.
It's because other config formats aren't as expressive.
> It's because other config formats aren't as expressive.
Oh yeah it is literally the best of a bad bunch in my opinion
I'm hopeful of languages like CUE https://cuelang.org/
See starlark, dall, jsonnet, cuelang, toml, etc.
IMO, JSON, YAML, and TOML should all interpret all keys as strings, and only enforce quotes when syntactically necessary.
So, `key1` is a string and doesn't need to be quoted. `12345` as a key is interpreted as a string (because keys are strings) and doesn't need to be quoted. `"key 1"` has a space, so it needs to be quoted.
What does IMO configuration look like
IMO means "in my opinion", or if you were being sarcastic, putting /s helps.
We'd have to change the spec and then all the core libs. Big task.
Use more quotes, use yamllint.
Like bash, more quotes and shellcheck.
Specs change from time to time. It requires effort. Nothing new here. It's necessary sometimes. Dealing with annoyances and footguns also takes effort.
I have always thought that there is a place for YAML but I do tend to avoid it when I can. I will say while working with terraform I have absolutely falled in love with HCL. It makes a lot of sense to me and there are a lot of validating you can do along the way leading to much more confidence in larger setups. iAC in my case at least.
not only is YAML a pain but JSON has native parser in major languages, while not yaml. I find it crazy some people are still actively choosing this over JSON (or alternatives)
This is a case of the right tool for the right job. YAML is far easier to read and parse as a human than JSON.
If you're passing data between processes, and you still want the data to be human readable, then JSON is a good choice.
If you're writing a configuration file that's going to be edited by a human, then YAML is easier to look at and understand.
So... what are the good alternatives to yaml?
For quite some time I thought toml, but the way you can spread e.g. lists all over the document can also cause some headaches.
Dhall is exactly my kind of type fest but you can hit a hard brick wall because the type system is not as strong as you think.
I wish I had a good answer for you. I've been dissatisfied with Dhall, Nickle, Cue, and possibly others. Dhall's type system is both too strong (you have to plumb type variables by hand if you want to do any kind of routine FP idioms) and too weak (you can't really _do_ much with record types - it's really hard to swizzle and rearrange deeply nested records).
On top of that, the grammar is quite difficult to parse. You need a parser that can keep several candidate parses running in parallel (like the classic `Parser a = Parser (String -> [(a, String)])` type) to disambiguate some of the gnarlier constructs (maybe around file paths, URLs, and record accesses? I forget). The problem with this is that it makes the parse errors downright inscrutable, because it's hard to know when the parse you actually intended was rejected by the parser when the only error you get was "Unexpected ','".
Oh, and you can't multiply integers together, only naturals.
Maybe Nix in pure eval mode, absurd as that sounds?
I think the best thing for tools to do is to take and return JSON (possible exception: tools whose format is simple enough for old-school UNIX-style stdin/stdout file formats). Someone will come up with a good functional abstraction over JSON eventually, and until then you can make do with Dhall, YAML, or whatever else.
> Maybe Nix in pure eval mode, absurd as that sounds?
It doesn’t sound absurd, it’s pretty nice. What do you think about https://rcl-lang.org?
Just been reading the docs, I like it :)
Gonna have to set aside some time to play with it compared to HCL where I spend a lot of time.
What about KDL (https://kdl.dev/) or Pkl (https://pkl-lang.org/)?
For configuration I dislike the XML object model KDL is built around. It needlessly complicates things to have two different incompatible ways (properties and children) of nesting configuration keys under an element.
Pkl seems syntactically beautiful and powerful, but having types and functions and loops makes it a lot more complicated than the dead-simple JSON data model that YAML is based on.
In JSON I often end up recreating XML attributes equivalent for metadata fields and using custom prefixes to differentiate those fields from actual data. I find it's nice the data/metadata separation at the language level.
Can you give an example of metadata you would put in a config file that isn't configuration and isn't a comment?
KDL is really, really nice. And lightweight.
No one mentioned HashiCorp HCL so far, though it's really a shame that it didn't get much traction...
How about textproto? And the proto definition gives the schema.
The article mentions
> A simple subset of yaml
Which already exists and is called StrictYAML. It's just strings, lists and dicts. No numbers. No booleans. No _countries_. No anchors. No JSON-compatible blocks. So, essentially it's what most of use think as being proper YAML, without all the stupid/bad/overcomplicated stuff. Just bring your own schema and types where required.
https://hitchdev.com/strictyaml/
We found yaml to be a great exchange format for electronic exam data. It allows us to put student submitted answers and source code into a yaml file and there is no weird escaping. It's very readable with a text editor. And then we just add notes and a score as a list below and then there's the next submission.
For readability of large blocks of texts that may or may not contain various special characters and newlines the only other alternative we have seen was XML, but that is very verbose.
So what the author finds as a negative, the many string formats, are exactly what drew us to yaml in the first place.
Somebody in these discussions always correctly points out that s-expressions are as expressive as XML but without the excess line noise, so it might as well be me.
What is so verbose about a cdata directive? Everybody complains about XML being verbose, never once heard complains about HTML being too verbose.
I came to regard YAML as a kind of a syntactic HFC syrup, a bearable idea that was taken too far.
Alas, YAML is just about everywhere, so the chances for a replacement that'll be both better behaved and as ubiquitous are unfortunately slim.
I’m amazed how sane the “document from hell” looks.
The author didn’t even get into the weird stuff GitLab does with YAML too!
I wonder if you could make a new standard something based on yaml where every value was prefixed by a type so there is no ambiguity.
We'd need a "YAML, the good parts".
It's called StrictYAML.
Obligatory https://xkcd.com/927/
Yup, author made RCL
It's really interesting that after all these years we still don't have a document format that just works. They all suck in their own sweet ways and we still have culture wars over them.
Yaml is an interesting case study that we can (and have) learned a lot about. Mistakes to avoid. :)
I never really understood why nobody ever just forked YAML and took out the ugly bits. It’s not a very complicated parser.
In the mean time, I’m very much enjoying KDL.
TOML
Not many know that the inventor of the YAML specification built a fully working pendulum clock as a teenager. With Lego bricks. YAML is a good standard for simple settings files. For more complex data structures, use JSON.
Wow, I wasn't aware there was so much magic and arcane features in yaml. Great post. Thanks.
the problem is that yaml came from geeked out devops employees that used bash where as json came from javascript.
stupid question: why dont they announce a newer version of YAML that is not backwards compatible and allow only quoted strings in their parser?
> that is not backwards compatible
This would be a massive breaking change for Kubernetes. There are piles and piles of YAML all around the opensource that would need updating. It would be very hard to adopt.
Also, quoting strings 100% of the time just looks ugly in my opinion. Not a big deal with autogenerated YAML, or YAML that I do not maintain, but for anything handwritten it's annoying.
how is it annoying...? it's literally like that in almost every single language out there. IMO seeing unquoted strings in YAML feels weird.
As I said, it's subjective. I like this
more than this That's all. Not sure about quoting keys though.The norway problem is well known.
This one is amazing, I almost pissed myself laughing reading it. So true about YAML. Another caveat is using --- as section separator in the file. It will starts new file inside your existing file.
Still love it.
I despise yaml. On top of the points from the article, I never know where to indent and how whitespace is handled on multiline fields.
Just a yucky standard all-around
Whitespace gets weird with indenting code.
I use block scalars constantly now, with liberal use of the trimming dashes all over the place.
Any time I need to preserve some indentation in my result, I always hate the formatting I’m left with, especially if there is logic involved.
Perfectly normal YAML document detected.
More seriously: this is a good overview of the reasons I dislike YAML as a web configuration language. There's too much overlap between the "friendly" auto-type-determination in YAML and the symbols used in web tech, from colons to Norway having a TLD. It wouldn't be so bad if yaml parsers could use expected type of each value as a hint, but that's not a feature in any parser I've met, so I'd rather just not use yaml for anything that's going to end up describing a web service.
Can't take this seriously if XML isn't listed as an alternative.
FTA: Xml is noisy and annoying to write by hand
So, at what point does YAML needing magic incantations, wrapping everything in quotes, avoiding any form of templating, etc. stop being less verbose (oops, meant noisy), and "annoying?"
Reality is, clunky XML is badly designed, or simply has no schema attached.
It's honestly absurd how prevalent YAML is. It's clearly dumb.