Can YAML go away entirely and instead allow pipelines to be defined with an actual language? What benefits does the runner-interpreted yaml-defined pipeline paradigm actually achieve? Especially with runners that can't be executed and tested locally, working with them is a nightmare.
I agree somewhat with the proposition that YAML is annoying for configuring something like a workflow engine (CI systems) or Kubernetes. But having it defined in YAML is actually preferable in an enterprise context. It makes it trivial to run something like OPA policy against the configuration so that enterprise standards and governance can be enforced.
When something is written in a real programming language (that doesn't just compile down to YAML or some other data format), this becomes much more challenging. What should you do in that case? Attempt to parse the configuration into an AST and operate over the AST? But in many programming languages, the AST can become arbitrarily complex. Behavior can be implemented in such a way as to make it difficult to discover or introspect.
Of course, YAML can also become difficult to parse too. If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals. But in principal, that's still at least more tractable than trying to parse an AST.
> If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals.
Why do we think an arbitrary language is easier to reason about? If it was so easy you could just do it now. The yaml could be extremely simple and just call into your app, but most don't bother.
I'm certainly willing to believe that yaml is not the ideal answer but unless we're comparing it to a concrete alternative, I feel like this is just a "grass is always greener" type take.
Is it actually possible to just have the YAML that calls into your app today, without losing the granularity or other important features?
I am not sure you can do this whilst having the granular job reporting (i.e. either you need one YAML block per job or you have all your jobs in one single 'status' item?) Is it actually doable?
You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.
Good general-purpose programming languages provide type systems that do just this. It is criminal that the industry simply ignores this and chooses to use blobs of YAML/JSON/XML with disastrous results---creating ad-hoc programming languages without a typesystem in their chosen poison.
The real issue isn't the code part. You can just call into whatever arbitrary thing you want for the actual script part.
YAML is used for the declarative part of structuring the job graph. The host (in this case, GitHub) would need to call into your code to build the job graph. Which means it would need to compile your code, which means it needs its own build step. This means it would need to run on a build machine that uses minutes because GitHub is not going to just run arbitrary code for free.
There's no guarantee that your arbitrary language is thread safe or idempotent so it can't really run in parallel like how a declarative file could be used.
So now you're in a situation where you add another spin up and tear down step even if your actual graph gen call is zero cost.
I've done exactly this a few times... ensure my scripting host is present then use scripts for everything. I can use the same scripts locally without issue and they work the same on self-hosted runners.
Note: mostly using Deno these days for this, though I will use .net/grate for db projects.
HCL is same s**, different smell. Equally hamstrung. It’s the reason hashicorp came out with an actually programmable version of the hcl semantics: CDKTF.
<if>
<equals arg1="${foo}" arg2="bar" />
<then>
<echo message="The value of property foo is 'bar'" />
</then>
<elseif>
<equals arg1="${foo}" arg2="foo" />
<then>
<echo message="The value of property foo is 'foo'" />
</then>
</elseif>
<else>
<echo message="The value of property foo is not 'foo' or 'bar'" />
</else>
A custom language in GHA would be worse. You'd be limited by whatever language they supported, and any problems with it would have to go through their support team. It adds more burden on GHA (they spending more time/money on support) without creating value (new features you want).
You already don't have to use YAML. Use whatever language you want to define the configuration, and then dump it as YAML. By using your own language and outputting YAML, you get to implement any solution you want, and GitHub gets to spend more cycles building features.
Simple example:
1. Create a couple inherited Python classes
2. Write class functions to enable/disable GHA features and validate them
3. Have the functions store data in the class object
4. Use a library to output the class as YAML
5. Now craft your GHA config by simply calling a Python object
6. Run code, save output file, apply to your repo
I don't know why nobody has made this yet, but it wouldn't be hard. Read GHA docs, write Python classes to match, output as YAML.
Hey. I'm currently making Typeflows to solve this (amongst) another few pain points, and am planning to make it available in JVM (this exists now)/TS and Python at least.
There are existing solutions around, but do miss out a bunch of things that are blatantly missing in the space:
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what. Testing workflows anyone? :)
- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Early days yet, but am planning to make it free for OSS and paid for commercial users. I'm also dogfooding it on one of my other open source projects so to make sure that it can handle non-trivial cases. Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
Hey - Typeflows maintainer here. We know that there are other similar libraries out there that do some of the same thing as Typeflows, but am hoping to go much much further than anything out there to help out teams struggling with their pipelines. Examples of things on the roadmap:
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what;
- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
Hey - maintainer here. Sorry about your bad experience and thanks for mentioning it! The Core Web Vitals test did come back ok - but evidently there's more to do so will get that sorted. (Web design not a strong point! ). The code examples should be showing on smaller screens when in landscape on mobile (they looked awful in portrait) - but will also look at that as well!
Could I possibly ask you to reply with the model of your phone so can make sure it works ok after have fixed?
I agree. I like YAML for a lot of things, but this is very much not one of them. CI pipelines are sufficiently complex that you will very quickly exceed the capabilities of "it's just a simple plain text markup". You need a real programming language.
I couldn’t agree more. I think we should just write our pipelines in languages our teams are familiar with and prioritise being able to run them locally.
That is the key function any serious CI platform needs to tackle to get me interested. FORCE me to write something that can run locally. I'll accept using containers, or maybe even VMs, but make sure that whatever I build for your server ALSO runs on my machine.
I absolutely detest working on GitHub Actions because all too often it ends up requiring that I create a new repo where I can commit to master (because for some reason everybody loves writing actions that only work on master). Which means I have to move all the fucking secrets too.
Solve that for me PLEASE. Don't give me more YAML features.
Well, Groovy is a bit of a basket case programming language, so that doesn't help.
I say this as someone that built entire Jenkins Groovy frameworks for automating large Jenkins setups (think hundreds of nodes, thousands of Jenkins jobs, stuff like that).
You could make a builder to do this for you. It could build your actions in a pre-commit hook or whatever.
Although, I think it is generally an accepted practice to use declarative configuration over imperative configuration? In part, maybe what the article is getting at, maybe?
Basically what we ended up doing at work is creating some kind of YAML generator.
We write Bash or Python, and our tool will produce the YAML pipeline reflecting it.
So we dont need to maintain YAML with over-complicated format.
The resulting YAML is not meant to be read by an actual human since its absolute garbage, but the code we want to run is running when we want, without having to maintain the YAML.
I work on a monorepo that does this using Typescript, for type checking. It's a mess. Huge learning curve for some type checking that very often will build perfectly fine but fail a type-check in CI.
Honestly, just having a linter should be enough. Ideally, anything complicated in your build should just be put into a script anyways - it minimizes the amount of lines in that massive YAML file and the potential for merge conflicts when making small changes.
I'm surprised by this take. I love YAML for this use case. Easy to write and read by hand, while also being easy to write and read with code in just about every language.
YAML is a serialization format. I like YAML as much as I like base64, that is I don't care about it unless you make me write it by hand, then I care very much.
GitHub Actions have a lot of rules, logic and multiple sublanguages in lots of places (e.g. conditions, shell scripts, etc.) YAML is completely superficial, XML would be an improvement due to less whitespace sensitivity alone.
Sure, easy to read, but quite difficult to /reason/ about in your head, let alone have proper language server/compiler support given the abstraction over provider events and runner state. I have never written a CI pipeline correctly without multiple iterations of pushing updates to the pipeline definition, and I don't think I'm alone on that.
Easy to write and read until it gets about a page or two long. Then you have to figure out stuff like "Oh gee, I'm no nesting layer 18, so that's... The object.... That is.... The array of.... The objects of....."
Plus it has exactly enough convenience-feature-related sharp edges to be risky to hand to a newbie, while wearing the dress of something that should be too bog-simple to have that problem. I, too, enjoy languages that arbitrarily decide the Norwegian TLD is actually a Boolean "false."
This is why I've become a fan of StrictYAML [0]. Of course it is not supported by many projects, but at least you are given the option to dispense with all the unnecessary features and their associated pitfalls in the context of your own projects.
Most notably it only offers three base types (scalar string, array, object) and moves the work of parsing values to stronger types (such as int8 or boolean) to your codebase where you tend to wrap values parsed from YAML into other types anyway.
Less surprises and headaches, but very niche, unfortunately.
That only matters if you're parsing the same yaml file with different parsers, which GitHub doesn't (and I doubt most people do - it's mostly used for config files)
This so much this.
Vscode has a very good syntax check github actions yaml so it's not yaml that's the problem.
It's the workflow for developing pipelines that's the problem. If I had something I could run locally - even in a debug dry-run only form that would go a long way to debugging job dependencies, etc. Testing failure cases flow conditional logic in the expected manner etc.
I'm not convinced there should be anything to define at all versus basically just some extremely broad but bare platform and a slot to stick an executable in.
Yes. Most of my custom pipeline stuff is a thin wrapper around a normal-ass scripting-language because the yaml/macro stuff is so hard to check and debug.
Agreed. YAML is not a great format to begin with, but using it for anything slightly more sophisticated (looking at you Ansible, k8s, etc.) is an exercise in frustration.
I really enjoyed working with the Earthfile format[1] used for Earthly CI, which unfortunately seems like a dead end now. It's a mix of Dockerfile and Makefile, which made it made very familiar to read and write. Best of all, it allowed running the pipeline locally exactly as it would run remotely, which made development and troubleshooting so much easier. The fact GH Actions doesn't have something equivalent is awful UX[2].
Honestly, I wish the industry hadn't settled on GitHub and GH Actions. We need better tooling and better stewards of open source than a giant corporation who has historically been hostile to open source.
I think YAML anchors in GitHub Actions are very welcome, for example for DRYing the push/pull_request 'paths:' filters [1].
Now only if they supported paths filter for `workflow_call` [2] event in addition to push/pull_request and my life would be a lot easier. Nontrivial repos have an unfortunate habit of building some sort of broken version of change detection themselves.
The limit of 20 unique workflow calls is quite low too but searching the docs for a source maybe they have removed it? It used to say
> You can call a maximum of 20 unique reusable workflows from a single workflow file.
but now it's max of 4 nested workflows without loops, which gives a lot of power for the more complex repos [3]. Ooh. Need to go test this.
This. So true. Yaml has always been an overly complicated format, with weird quirks ( like norway becoming false in a list of country codes ).
I find it an absolute shame that languages like Dhall did not become more popular earlier. Now everything in devops is yaml, and I think many developers pick yaml configs not out of good reasons but defaulting to its ubiquity as sufficient.
If your data format is so complicated that all commonly used implementations are not compliant with your spec, maybe it's a problem with the data-format.
Every single implementation people actually use seems to be a messy mix of yaml 1.1 and 1.2....
Maybe if the yaml project wants to consider this fixed, they should have written some correct reference parsers themselves for any languages in need, and encouraged their use.
I noted this in reply to the comment above, but: the YAML 1.2 spec doesn't actually mandate that parsers use the Core Schema. They left it as a recommendation. So I don't consider it to be "fixed" at all.
I would not say it "fixed" the problem. It removed the _recommendation_ for parser implementations to use the regex `y|Y|yes|Yes|YES|n|N|no|No|NO|true|True|TRUE|false|False|FALSE|on|On|ON|off|Off|OFF` for parsing scalars as bools, it changed the _canonical_ presentation of bools from `y|n` to `true|false`, and it introduced the "schema" concept. It also introduced and recommended the use of the Core Schema, which uses `true|True|TRUE|false|False|FALSE` as the regex for inferring bools from scalars. But unsurprisingly, since using this schema is only a recommendation and not a requirement, many implementations elected to keep their backwards-compatible implementations that use the original regex.
the recommendation was what caused the norway problem. it now strongly recommends not to do this, and it says that a Yaml parser should use the core schema unless instructed otherwise. going against the recommendation while saying that you're yaml 1.2 compliant feels like an issue that should be raised with the parser to me. I've never run into this issue in practice though.
is there a parser that says that it's Yaml 1.2 compliant that uses that regex? I don't know of one.
Why introduce all-new langugage, like Dhall, just for configuration? This seems like a total waste of time. And you still need to use "real" language (or bash) to write glue to connect to github configuration.
The config generators are very simple, and should to be written in whatever language your developers already know - which likely means Python or Javascript or Go.
Having used CI systems and application frameworks that support YAML anchors for configuration, adding in a programming language would be a massive amount of complexity for very little gain. We're not talking about dozens of locations with hundreds of lines of shared code.
Asking the team to add a new build dependency, learn a new language, and add a new build step would create considerably more problems, not fewer. Used sparingly and as needed, YAML anchors are quite easy to read. A good editor will even allow you to jump to the source definition just as it would any other variable.
Being self-contained without any additional dependencies is a huge advantage, particularly for open source projects, IMHO. I'd wager very few people are going to learn Dhall in order to fix an issue with an open source project's CI.
Your team doesn't know YAML, it knows github actions. There's zero transferable knowledge when switching from github actions to kubernetes deployments, as there is precisely the same zero correlation between kubernetes and ansible configs. 'It's all YAML' is a lie and I'm continuously surprised so many people are falling for it for so long. YAML is the code-as-data, but the interpreter determines what it all means.
Oh, for goodness' sake. We know YAML syntax and that's the only part that's relevant here. Pointing out that different software uses different keys for their configuration or even takes different actions for keys that happen to share the same name isn't particularly insightful. We haven't been bamboozled.
I don’t understand even more now. If you freely admit they’re different languages, the only reason to keep using the stupid deficient syntax is momentum, and while it isn’t a bad reason, it is costing you and everyone else in the long run.
Huh? I'm using YAML because that's the language used to configure GitHub Actions. You may not like YAML, and that's fine. But if we collectively had to learn the unique way each project generates their GitHub Actions config, that would be a massive waste of time.
YAML isn't that hard. Most GitHub Actions configs I see are well under 500 lines; they're not crumbling under the weight of complexity.
Assembly isn't hard either and yet almost nobody is writing it anymore, for a reason, just as nobody (to an epsilon) is writing jvm opcodes directly. Somehow the industry decided assembly is actually fine when doing workflows.
I'm saying GHA should use a proper programming language instead of assembly.
Use the language you are already working in? Most languages have good YAML serialization and I think in most languages a function call taking a couple parameters that vary to produce slightly different but related objects is going to be as readable or more readable than YAML anchors.
That would be better, but it's an option I already have available to me and it's just not attractive. AFAIK, GitHub Actions requires the config files to be committed. So, now I need to guard against someone making local modifications to a generated file. It's doable of course, but by the time I've set all this up, it would have been much easier for everyone to copy and paste the six lines of code in the three places they're needed. YAML anchors solve that problem without really creating any new ones.
If generating your GitHub Actions config from a programming language works for you, fantastic. I'm just happy we now have another (IMHO, attractive) option.
Most of the debate here is that a lot of us don't find YAML anchors attractive. It can be one of the papercuts of using YAML.
I mostly agree with the article that with GitHub Actions specifically, I try to refactor things to the top-level "workflow" level first, and then yeah resort to copy and paste in most other cases.
I'm a little less adamant that GitHub should remove anchor support again than the original poster, but I do sympathize greatly, having had to debug some CircleCI YAML and Helm charts making heavy use of YAML anchors. CircleCI's YAML is so bad I have explored options to build it with a build process. Yeah, it does create new problems and none of those explorations got far enough to really improve the process, but one of the pushes to explore them was certainly that YAML anchors are a mess to debug, especially when you've got some other tool concatenating YAML files together and can result in anchor conflicts (and also other parts of the same YAML that depend on a particular form of how anchor conflicts overwrite each other, oof). I don't see GitHub Actions necessarily getting that bad just by enabling anchors, but I have seen enough of where anchors become a crutch and a problem.
That's fair. And I'm not arguing that YAML anchors can never be a problem. I am saying that layering in a whole custom build system to handle a 250 line ci.yml file is not the trade-off I'd make. What I'd hazard to say most teams do in that situation is duplicate config, which is not without its own problems. I think YAML anchors is a fine solution for these cases and don't think they'll lead to total chaos. Alas, not all config options can be hoisted to a higher level and I'm trusting a team has explored that option when it's available.
If you're dealing with 10s of files that are 1000s of lines long, then YAML anchors may very well not be the ideal option. Having the choice lets each team find what works best for them.
Ouch. That sounds terrible with or without YAML anchors. GitHub Actions has overall been a great addition, allowing projects to integrate CI directly into their PR process. But, I never understood why it didn't have simpler paths for the very common use cases of CI and CD. Virtually any other dedicated CI product is considerably easier to bootstrap.
Or just use composite actions, it's not 2020 anymore.
Templating GitHub Actions is very powerful (I've worked with such a setup) but it has its own headaches and if you don't _need_ custom tooling better to not have it.
I can wish for improvements on the native setup without reaching out for the sledgehammer.
I think most of the pain with GitHub Actions goes away if you use actionlint, action-validator, prettier/yamlfmt in a single ci job to validate your setup. Can even add them as git hooks that automatically stage changes and give quick feedback when iterating.
Just introduce a templating language and the CI to generate the yaml as part of your pipeline as way to make it simpler?
Above a certain level of complexity, sure. But having nothing in between is an annoying state of affairs. I use anchors in Gitlab pipelines and I hardly curse their names.
I don't think this is a fair characterization: it's not that I don't have a use for it, but that I think the uses are redundant with existing functionality while also making static and human analysis of workflows harder.
The flattening out is the problem: most (all?) widely used YAML parsers represent the YAML document using the JSON object model, which means that there's no model or element representation of the anchors themselves.
That in turn means that there's no way to construct a source span back to the anchor itself, because the parsed representation doesn't know where the anchor came from (only that it was flattened).
This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.
I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
> This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.
Yes, it's just difficult. The point made in the post isn't that it's impossible, but that it significantly changes the amount of of "ground work" that static analysis tools have to do to produce useful results for GitHub Actions.
> I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
It's worth noting that GitHub doesn't support other parts of the YAML spec: they intentionally use their own bespoke YAML parser, and they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
All in all, I think conformance with YAML is a red herring here: GitHub Actions is already its own thing, and that thing should be easy to analyze. Adding anchors makes it harder to analyze.
maybe, but not entirely sure. 'Two wrongs don't make a right' kind of thinking on my side here.
But if they call it GFY and do what they want, then that would probably be better for everyone involved.
> they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
--
(though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
> But if they call it GFY and do what they want, then that would probably be better for everyone involved.
Yes, agreed.
> I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
I think GitHub has been pretty ambiguous about this: it's not clear to me at all that they intend to support either version of the spec explicitly. Part of the larger problem here is that programming language ecosystems as a whole don't consistently support either 1.1 or 1.2, so GitHub is (I expect) attempting to strike a happy balance between their own engineering goals and what common language implementations of YAML actually parse (and how they parse it). None of this makes for a great conformance story :-)
> (though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
The error report includes source spans, so the tool needs to map back to the original location of the anchor rather than its unrolled document position.
(This is table stakes for integration with formats like SARIF, which expect static analysis results to have physical source locations. It's not good enough to just say "there's a bug in this element and you need to find out where that's introduced," unfortunately.)
I think the main reason you see overwhelming support for anchors is that the existing Actions functionality is typically so cumbersome to implement and often makes it harder to understand a workflow. Anchor syntax is a little esoteric, but otherwise very simple and grokable.
To be clear, I understand why people want to use anchors. The argument isn't that they aren't useful: it's that the juice is not worth the squeeze, and that GitHub's decision to support them reflects a lack of design discretion.
Or in other words: if your problem is DRYness, GitHub should be fixing or enhancing the ~dozen other ways in which the components of a workflow shadow and scope with each other. Adding a new cross-cutting form of interaction between components makes the overall experience of using GitHub Actions less consistent (and less secure, per points about static analysis challenges) at the benefit of a small amount of deduplication.
No; GitHub shouldn't support YAML anchors because it's a deviation from the status quo, and the argument is specifically that the actions ecosystem doesn't need to make analysis any harder than it already is.
(As the post notes, neither I nor GitHub appears to see full compliance with YAML 1.1 to be an important goal: they still don't support merge keys, and I'm sure they don't support all kinds of minutiae like non-primitive keys that make YAML uniquely annoying to analyze. Conforming to a complex specification is not inherently a good thing; sometimes good engineering taste dictates that only a subset should be implemented.)
> No; GitHub shouldn't support YAML anchors because it's a deviation from the status quo, and the argument is specifically that the actions ecosystem doesn't need to make analysis any harder than it already is.
>
> (As the post notes, neither I nor GitHub appears to see full compliance with YAML 1.1 to be an important goal: they still don't support merge keys, and I'm sure they don't support all kinds of minutiae like non-primitive keys that make YAML uniquely annoying to analyze. Conforming to a complex specification is not inherently a good thing; sometimes good engineering taste dictates that only a subset should be implemented.)
"Because I don't like it" makes it sound like I don't have a technical argument here, which I do. Do you think it's polite or charitable to reduce peoples' technical arguments into "yuck or yum" statements like this?
> Conforming to a complex specification is not inherently a good thing
Kind of a hard disagree here; if you don't want to conform to a specification, don't claim that you're accepting documents from that specification. Call it github-flavored YAML (GFY) or something and accept a different file extension.
> YAML 1.1 to be an important goal: they still don't support merge keys
right, they don't do merge keys because it's not in YAML 1.2 anymore. Anchors are, however. They haven't said that noncompliance with YAML 1.2 spec is intentional
> Call it github-flavored YAML (GFY) or something and accept a different file extension.
Sure, I wouldn't be upset if they did this.
To be clear: there aren't many fully conforming YAML 1.1 and 1.2 parsers out there: virtually all YAML parsers accept some subset of one or the other (sometimes a subset of both), and virtually all of them emit the JSON object model instead of the internal YAML one.
is your criticism leveled at yaml anchors or github? in my anecdotal experience, yaml anchors were a huge help (and really, really not hard to grasp at a conceptual level) in maintaining uniform build processes across environments.
It is specifically leveled at YAML anchors in GitHub. I don't have a super strong opinion of YAML anchors in other contexts.
(This post is written from my perspective as a static analysis tool author. It's my opinion from that perspective that the benefits of anchors are not worth their costs in the specific context of GitHub Actions, for the reasons mentioned in the post.)
"YAML" should mean something. When I saw GitHub Actions supported "YAML", I thought "OK, certainly not my favorite, but I can deal with that", and so I read the YAML specification, saw anchors, and then had to realize the hard way that they didn't work on GitHub Actions, leaving me unsure what even would or wouldn't work going forward. Is this even the only way it differs? I don't know, as they apparently don't use YAML :/.
This also means that, if you use an off-the-shelf implementation to parse these files, you're "doing it wrong", as you are introducing a parser differential: I can put code in one of these files that one tool uses and another tool ignores. (Hopefully, the file just gets entirely rejected if I use the feature, but I do not remember what the experience I had was when I tried using the feature myself; but, even that is a security issue.)
> Except: GitHub Actions doesn’t support merge keys! They appear to be using their own internal YAML parser that already had some degree of support for anchors and references, but not for merge keys.
Well, hopefully they also prioritize fixing that? Doing what GitHub did, is apparently still doing, and what you are wanting them to keep doing (just only in your specific way) is not actually using "YAML": it is making a new bespoke syntax that looks a bit like YAML and then insisting on calling it "YAML" even though it isn't actually YAML and you can neither read the YAML documentation nor use off-the-shelf YAML libraries.
Regardless, it sounds like your tool already supports YAML anchors, as your off-the-shelf implementation of YAML (correctly) supports YAML anchors. You are upset that this implementation doesn't provide you source map attribution: that was also a problem with C preprocessors for a long time, but that can and should be fixed inside of the parser, not by deciding the language feature shouldn't exist because of library limitations.
but there isn't a single YAML spec, there are at least 2 in common use: yaml 1.1, and 1.2, which have discrete specs and feature-sets. re: anchor stuff specifically, 1.1 supports merge keys whereas 1.2 explicitly does not, so that's one thing
and github actions does not actually specify which yaml spec/version it uses when parsing workflow yaml files
it's unfortunately just not the case that "YAML means something" that is well-defined in the sense that you mean here
Sure, agreed. Another comment notes that GitHub probably should call this their own proprietary subset of YAML, and I wouldn't object to that.
> Well, hopefully they also prioritize fixing that?
I expect they won't, since it's not clear what version of YAML they even aim to be compatible with.
However, I don't understand why engineers who wouldn't jump off of a bridge because someone told them to would follow a spec to the dot just because it exists. Specifications are sometimes complicated and bad, and implementing a subset is sometimes the right thing to do!
GitHub Actions, for example, doesn't make use of the fact that YAML is actually a multi-document format, and most YAML libraries don't gracefully handle multiple documents in a single YAML stream. Should GitHub Actions support this? It's entirely unclear to me that there would be any value in them doing so; subsets are frequently the right engineering choice.
The argument that this is a security issue isn't very well fleshed out either. As far as I can tell, it boils down to his opinion that this makes YAML harder to read and thus less secure. But, the reality is we have to copy & paste config today and that's a process I've seen fail when a change needs to be made and isn't properly carried forward to all locations. I suppose I could argue that's a security concern as well.
Half the argument against supporting YAML anchors appears to boil down some level of tool breakage. While you can rely on simplifying assumptions, you take a risk that your software breaks when that assumption is invalidated. I don't think that's a reason to stop evolving software.
I've never seen a project use any of the tools the author listed, but I have seen duplicated config. That's not to say the tools have no value, but rather I don't want to be artificially restricted to better support tools I don't use. I'll grant that the inability to merge keys isn't ideal but, I'll take what I can get.
Common Lisp has this (and other dialects that imitate the Circle Notation; I think Scheme has it now, Emacs Lisp, TXR lisp, ):
E.g.
(#1=(a b) c d e #1#)
encodes
((a b) c d e (a b))
where the two (a b) occurrences are one object. It can express circular structures:
#1=(a b c . #1#)
encodes an infinite circular list
(a b c a b c a b c ...)
The object to be duplicated is prefixed with #<decimal-integer>=. This associates the object with the integer. The integer is later referenced as #<decimal-integer># to replicate it.
The thing is, you don't see a lot of this in human-written files, whether they are source code or data.
This is not the primary way that Lisp systems use for specifying replicated data in configurations, let alone code.
Substructure sharing occurs whether you use the notation or not due to interned symbols. (Plus compilers can deduplicate strings and such.) In (a a a) there is only one object a, a symbol.
If you feed the implementation circular source code though, ANSI CL says the behavior is undefined. Some interpreters can handle it under the right circumstances. In particular ones that don't try to do a full macro-expanding code walk before running the code. Compilers, not so much.
YAML anchors may be a sharp tool but no one is forced to use them. I have written many verbose Github workflows that would have benefited from using anchors, and I am relieved to learn I can clean those up now.
I disagree. Instead of anchors we had to rely on third party file change actions to only trigger jobs on certain file path changes, instead of using the built in mechanism, because each job required the list, and the list was long.
Using anchors would have improved the security of this, as well as the maintenance. The examples cited don't remotely demonstrate the cases where anchors would have been useful in GA.
I agree that YAML is a poor choice of format regardless but still, anchor support would have benefitted a number of projects ages ago.
I don't think anchors' primary function is to allow global definitions (of variables or whatever), rather it's more like arbitrary templates/snippets to be reused through the YAML file.
In GitLab, where YAML anchors have been supported for years, I personally find them very useful —it's the only way of "code" reuse, really. In GitLab there's a special edtor just for .gitlab-ci.yml, which shows the original view and the combined read-only view (with all anchors expanded).
I agree that it's hard to point to the specific line of the source code, but it's enough — in case of an error — to output an action name, action property name, and actual property value that caused an error. Based on these three things, a developer can easily find the correct line.
Wrote a new yaml grepping tool this past weekend and just realized thanks to this that I have a whole new can of worms to keep in mind. Ugh.
Turns out it does report values at their targets (which is desirable) but doesn't know or indicate that they're anchors (undesirable).
Also tested something with yq - if you tell it to edit a node that is actually from a yaml anchor, it updates the original anchor without warning you that that's what you're doing. Yikes.
Anchors will be exceptionally useful for a few workflows for me. We have what is essentially the same setup/teardown for three jobs within one workflow. I’d love to be able to factor that stuff out without introducing yet another yaml file to the repo, this will be a big help.
A very pedantic point, but merge keys are not part of the YAML spec [1]! Merge keys are a custom type [2], which may optionally be applied during the construction phase of loading. I definitely wouldn't say that merge keys are integral to anchors.
(Also, as a personal bias, merge keys are really bad because they are ambiguous, and I haven't implemented them in my C++ yaml library (yaml-cpp) because of that.)
Yeah, I find the situation here very confusing: I agree that merge keys are not part of YAML 1.2, but they are part of YAML 1.1. The reason they don't appear to be in the "main" 1.1 spec itself is because they were added to 1.1 after 1.1 was already deprecated[1].
I have a theory that in such cases one might as well just give up and write a Turing-complete language in the first place, as the sort of TC-complete languages we get with this sort of "slowly but surely backed against the wall" situations are way worse than just starting from scratch.
I hypothesize a TC-complete language for something like CSS that included deep tracking under the hood for where values are coming from and where they are going would be very useful, i.e., you would have the ability to point at a particular part of the final output and the language runtime could give you a complete accounting of where it came from and what went into making the decisions, could end up giving us the auditability that we really want from the "declarative" languages while giving us the full power of the programming langauges we clearly want. However I don't have the time to try to manifest such a thing myself, and I don't know of any existing language that does what I'm thinking of. Some of the more powerful languages could theoretically do it as a library. It's not entirely unlike the auditing monad I mention towards the end of https://www.jerf.org/iri/post/2958/ . It's not something I'd expect a general-purpose language to do by default since it would have bad general-purpose performance, but I think for specialized cases of a TC-complete configuration langauge it could have value, and one could always run it as an debugging option and have an optimized code path that didn't track the sources of everything.
That's my thought as well. I predict we'll be seeing SDK's for generating github workflows by mid-2026. Maybe pulumi will get an extension for it. (I'm well aware that codegen yaml has been a thing for a long time, but I'm thinking about something targeting github workflows _specifically_.)
TBH it's getting a bit exhausting watching us go through this hamster wheel again and again and again.
Anchors are so, so useful. Buildkite (which has its own CI pipelines syntax) is a good example. Let’s say I want every pipeline step to run on my custom agents (on my self-hosted infra). I could either copy/paste an identical “agents” property across however many hundreds or thousands of CI steps I have.
Author's replacement for anchors is to use global syntax, like a top-level "env:" block.
This is a terrible advice from security endpoint - given that env variables are often used for secrets data, you really _don't_ want them to set them at the top level. The secrets should be scoped as narrow as possible!
For example, if you have a few jobs, and some of them need to download some data in first step (which needs a secret), then your choices are (a) copy-paste "env" block 3 times in each step, (b) use the new YAML anchor and (c) set secret at top-level scope. It is pretty clear to me that (c) is the worst idea, security wise - this will make secret available to every step in the workflow, making it much easier for malware to exfiltrate.
I agree. OP’s statement ”the need to template environment variables across a subset of jobs suggests an architectural error in the workflow design” does not ring true for cases where you want developers to be able to quickly deploy a separate environment for each development branch, especially if said branch needs to connect to a matching backend/API/other service.
First, he can just not use the feature, not advocate for its removal.
Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
> First, he can just not use the feature, not advocate for its removal.
I maintain a tool that ~thousands of projects use to analyze their workflows and actions. I can avoid using anchors, but I can't avoid downstreams using them. That's why the post focuses on static analysis challenges.
> Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
This is explicitly addressed immediately below the example.
Just because it is expressed in YAML doesn't make YAML the party to blame here. I would say one of the main concerns I have with anything in GitHub Actions related to the word "merge" has to do with identifying the last commit for a merge, not merging of objects in YAML.
If you have two workflows... one to handle a PR creation/update and another to address the merge operation, it is like pulling teeth to get the final commit properly identified so you can grab any uploaded artifacts from the PR workflow.
Ok, now make the 'redundancy' argument with anything other than `env` or `permissions`.
I think they should be supported because it's surprising and confusing if you start saying 'actually, it's a proprietary subset of YAML', no more reason needed than that.
Obviously they are very useful. I still don't think they should exist in this usage of yaml.
Once you allow setting and reading of variables in a configuration file, you lose the safety that makes the format useful. You might as well be using a bash script at that point.
You already can set and read variables. The `matrix` section is often used to test against multiple versions of software. Environment variables can be referenced. And the project configuration supports both secrets and variables configured at the project level.
Honestly, everything about GH actions/AzDO pipelines is infuriating. The poor tooling with poor write-time assertions are just so frustrating.
Give me a proper platform that I can run locally on my development machine.
but, if those anchors are a blessed standard YAML feature that YAML tools will provide real assertions about unlike the ${{}} stuff that basically you're doing a commit-push-run-wait-without any proper debug tools besides prints?
Can YAML go away entirely and instead allow pipelines to be defined with an actual language? What benefits does the runner-interpreted yaml-defined pipeline paradigm actually achieve? Especially with runners that can't be executed and tested locally, working with them is a nightmare.
I agree somewhat with the proposition that YAML is annoying for configuring something like a workflow engine (CI systems) or Kubernetes. But having it defined in YAML is actually preferable in an enterprise context. It makes it trivial to run something like OPA policy against the configuration so that enterprise standards and governance can be enforced.
When something is written in a real programming language (that doesn't just compile down to YAML or some other data format), this becomes much more challenging. What should you do in that case? Attempt to parse the configuration into an AST and operate over the AST? But in many programming languages, the AST can become arbitrarily complex. Behavior can be implemented in such a way as to make it difficult to discover or introspect.
Of course, YAML can also become difficult to parse too. If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals. But in principal, that's still at least more tractable than trying to parse an AST.
> If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals.
cough CloudFormation cough
Why do we think an arbitrary language is easier to reason about? If it was so easy you could just do it now. The yaml could be extremely simple and just call into your app, but most don't bother.
I'm certainly willing to believe that yaml is not the ideal answer but unless we're comparing it to a concrete alternative, I feel like this is just a "grass is always greener" type take.
Is it actually possible to just have the YAML that calls into your app today, without losing the granularity or other important features?
I am not sure you can do this whilst having the granular job reporting (i.e. either you need one YAML block per job or you have all your jobs in one single 'status' item?) Is it actually doable?
You don't have to reason about them.
You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.
Good general-purpose programming languages provide type systems that do just this. It is criminal that the industry simply ignores this and chooses to use blobs of YAML/JSON/XML with disastrous results---creating ad-hoc programming languages without a typesystem in their chosen poison.
The real issue isn't the code part. You can just call into whatever arbitrary thing you want for the actual script part.
YAML is used for the declarative part of structuring the job graph. The host (in this case, GitHub) would need to call into your code to build the job graph. Which means it would need to compile your code, which means it needs its own build step. This means it would need to run on a build machine that uses minutes because GitHub is not going to just run arbitrary code for free.
There's no guarantee that your arbitrary language is thread safe or idempotent so it can't really run in parallel like how a declarative file could be used.
So now you're in a situation where you add another spin up and tear down step even if your actual graph gen call is zero cost.
There's a reason it works the way it does.
There is https://github.com/SchemaStore/schemastore which is effectively a type system for yaml/json.
> You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.
https://www.reddit.com/r/funny/comments/eccj2/how_to_draw_an...
I've done exactly this a few times... ensure my scripting host is present then use scripts for everything. I can use the same scripts locally without issue and they work the same on self-hosted runners.
Note: mostly using Deno these days for this, though I will use .net/grate for db projects.
> If it was so easy you could just do it now.
Some do just that: dagger.io. It is not all roses but debugging is certainly easier.
There is a battle tested example of YAML vs programming languages in CloudFormation templates vs CDK.
I don't think anybody serious has any argument in favor of CloudFormation templates.
GitHub Actions originally supported HCL (Hashicorp Configuration Language) instead of YAML. But the YAML force was too strong: https://github.blog/changelog/2019-09-17-github-actions-will....
HCL is same s**, different smell. Equally hamstrung. It’s the reason hashicorp came out with an actually programmable version of the hcl semantics: CDKTF.
If you have worked with HCL in any serious capacity, you'll be happy they didn't go that route.
Here's some fun examples to see why HCL sucks:
- Create an if/elseif/else statement
- Do anything remotely complex with a for loop (tip: you're probably going to have to use `flatten` a lot)
Stuff like HCL and Ansible YAML makes me want to require mandatory training in Ant contrib tasks for developers creating them:
https://ant-contrib.sourceforge.net/tasks/tasks/if.html
</if>https://ant-contrib.sourceforge.net/tasks/tasks/for.html
Yes, programming with them was as fun as you're imagining.A custom language in GHA would be worse. You'd be limited by whatever language they supported, and any problems with it would have to go through their support team. It adds more burden on GHA (they spending more time/money on support) without creating value (new features you want).
You already don't have to use YAML. Use whatever language you want to define the configuration, and then dump it as YAML. By using your own language and outputting YAML, you get to implement any solution you want, and GitHub gets to spend more cycles building features.
Simple example:
I don't know why nobody has made this yet, but it wouldn't be hard. Read GHA docs, write Python classes to match, output as YAML.If you want more than GHA features support [via configuration], use the GHA API (https://docs.github.com/en/rest/actions) or scripted workflows feature (https://github.com/actions/github-script).
Hey. I'm currently making Typeflows to solve this (amongst) another few pain points, and am planning to make it available in JVM (this exists now)/TS and Python at least.
There are existing solutions around, but do miss out a bunch of things that are blatantly missing in the space:
- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what. Testing workflows anyone? :)
- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Early days yet, but am planning to make it free for OSS and paid for commercial users. I'm also dogfooding it on one of my other open source projects so to make sure that it can handle non-trivial cases. Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
Wish me luck!
https://typeflows.io/
Have you seen https://typeflows.io/
It takes your programming language version and turns it into github actions yaml, so you dont need to do any of that sort of thing.
It has pricing tiers!? That's crazy, just use https://www.npmjs.com/package/github-actions-workflow-ts
Hey - Typeflows maintainer here. We know that there are other similar libraries out there that do some of the same thing as Typeflows, but am hoping to go much much further than anything out there to help out teams struggling with their pipelines. Examples of things on the roadmap:
- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what; - security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
:)
Its "pricing tiers" are "always free for OSS" and "TBD" for commercial use.
I like the things I depend on to actually have a funding model, so that's actually more appealing to me than something fully free.
Website barely loads on my old phone and I can't see any examples of the syntax.
Hey - maintainer here. Sorry about your bad experience and thanks for mentioning it! The Core Web Vitals test did come back ok - but evidently there's more to do so will get that sorted. (Web design not a strong point! ). The code examples should be showing on smaller screens when in landscape on mobile (they looked awful in portrait) - but will also look at that as well!
Could I possibly ask you to reply with the model of your phone so can make sure it works ok after have fixed?
I generate all my GH YAML files via Python. The thought of writing them by hand makes me want to vomit, one of the best design choices I ever made.
I agree. I like YAML for a lot of things, but this is very much not one of them. CI pipelines are sufficiently complex that you will very quickly exceed the capabilities of "it's just a simple plain text markup". You need a real programming language.
I couldn’t agree more. I think we should just write our pipelines in languages our teams are familiar with and prioritise being able to run them locally.
> prioritise being able to run them locally.
That is the key function any serious CI platform needs to tackle to get me interested. FORCE me to write something that can run locally. I'll accept using containers, or maybe even VMs, but make sure that whatever I build for your server ALSO runs on my machine.
I absolutely detest working on GitHub Actions because all too often it ends up requiring that I create a new repo where I can commit to master (because for some reason everybody loves writing actions that only work on master). Which means I have to move all the fucking secrets too.
Solve that for me PLEASE. Don't give me more YAML features.
+1
Working with ADO pipelines is painful.
- Make change locally
- Push change
- Run pipeline
- Wait forever because ADO is slow
- Debug the error caused by some syntax issue in their bastardized version of yaml
- Repeat
Gitlab does this locallyand i asdume in their cloud
Yes! Hopefully a language that supports code as data (homoiconicity).
Notable mentions are Zig build system and nob: https://github.com/tsoding/nob.h.
jenkins supports groovy dsl jobs. I would not say using it made anything easier
Well, Groovy is a bit of a basket case programming language, so that doesn't help.
I say this as someone that built entire Jenkins Groovy frameworks for automating large Jenkins setups (think hundreds of nodes, thousands of Jenkins jobs, stuff like that).
You could make a builder to do this for you. It could build your actions in a pre-commit hook or whatever.
Although, I think it is generally an accepted practice to use declarative configuration over imperative configuration? In part, maybe what the article is getting at, maybe?
YAML is neither declarative nor imperative. It's just a tree (or graph, with references) serialization to text.
Basically what we ended up doing at work is creating some kind of YAML generator.
We write Bash or Python, and our tool will produce the YAML pipeline reflecting it.
So we dont need to maintain YAML with over-complicated format.
The resulting YAML is not meant to be read by an actual human since its absolute garbage, but the code we want to run is running when we want, without having to maintain the YAML.
And we can easily test it locally.
I work on a monorepo that does this using Typescript, for type checking. It's a mess. Huge learning curve for some type checking that very often will build perfectly fine but fail a type-check in CI.
Honestly, just having a linter should be enough. Ideally, anything complicated in your build should just be put into a script anyways - it minimizes the amount of lines in that massive YAML file and the potential for merge conflicts when making small changes.
I'm surprised by this take. I love YAML for this use case. Easy to write and read by hand, while also being easy to write and read with code in just about every language.
YAML is a serialization format. I like YAML as much as I like base64, that is I don't care about it unless you make me write it by hand, then I care very much.
GitHub Actions have a lot of rules, logic and multiple sublanguages in lots of places (e.g. conditions, shell scripts, etc.) YAML is completely superficial, XML would be an improvement due to less whitespace sensitivity alone.
Sure, easy to read, but quite difficult to /reason/ about in your head, let alone have proper language server/compiler support given the abstraction over provider events and runner state. I have never written a CI pipeline correctly without multiple iterations of pushing updates to the pipeline definition, and I don't think I'm alone on that.
Easy to write and read until it gets about a page or two long. Then you have to figure out stuff like "Oh gee, I'm no nesting layer 18, so that's... The object.... That is.... The array of.... The objects of....."
Plus it has exactly enough convenience-feature-related sharp edges to be risky to hand to a newbie, while wearing the dress of something that should be too bog-simple to have that problem. I, too, enjoy languages that arbitrarily decide the Norwegian TLD is actually a Boolean "false."
> Easy to write and read by hand, while also being easy to write and read with code in just about every language
Language implementations for yaml vary _wildly_.
What does the following parse as:
If I google "yaml online" and paste it in, one gives me:{'some_map': {False: 'cap', 'key': 'value'}}
The other gives me:
{'some_map': {'false': 'cap', 'key': 'value'}}
... and neither gives what a human probably intended, huh?
This is why I've become a fan of StrictYAML [0]. Of course it is not supported by many projects, but at least you are given the option to dispense with all the unnecessary features and their associated pitfalls in the context of your own projects.
Most notably it only offers three base types (scalar string, array, object) and moves the work of parsing values to stronger types (such as int8 or boolean) to your codebase where you tend to wrap values parsed from YAML into other types anyway.
Less surprises and headaches, but very niche, unfortunately.
[0] https://hitchdev.com/strictyaml/
That only matters if you're parsing the same yaml file with different parsers, which GitHub doesn't (and I doubt most people do - it's mostly used for config files)
“The meaning of YAML is implementation-defined” is a big reason I stay far away whenever I can.
The classic Norway bug
It's less about YAML itself than the MS yaml-based API for interacting with build-servers. It's just so hard to check and test and debug.
This so much this. Vscode has a very good syntax check github actions yaml so it's not yaml that's the problem.
It's the workflow for developing pipelines that's the problem. If I had something I could run locally - even in a debug dry-run only form that would go a long way to debugging job dependencies, etc. Testing failure cases flow conditional logic in the expected manner etc.
I'm not convinced there should be anything to define at all versus basically just some extremely broad but bare platform and a slot to stick an executable in.
Yes. Most of my custom pipeline stuff is a thin wrapper around a normal-ass scripting-language because the yaml/macro stuff is so hard to check and debug.
Locally hard to test is the point. Lockin.
Agreed. YAML is not a great format to begin with, but using it for anything slightly more sophisticated (looking at you Ansible, k8s, etc.) is an exercise in frustration.
I really enjoyed working with the Earthfile format[1] used for Earthly CI, which unfortunately seems like a dead end now. It's a mix of Dockerfile and Makefile, which made it made very familiar to read and write. Best of all, it allowed running the pipeline locally exactly as it would run remotely, which made development and troubleshooting so much easier. The fact GH Actions doesn't have something equivalent is awful UX[2].
Honestly, I wish the industry hadn't settled on GitHub and GH Actions. We need better tooling and better stewards of open source than a giant corporation who has historically been hostile to open source.
[1]: https://earthly.dev/earthfile
[2]: Yes, I'm aware of `act`, but I've had nothing but issues with it.
I use CUE and generate the yaml, don't care what a giant unreadable slop it is anymore
I use CUE to read yamhell too
Wouldn't Terraform solve this? You can have all your infrastructure as code in a git repo.
I think YAML anchors in GitHub Actions are very welcome, for example for DRYing the push/pull_request 'paths:' filters [1].
Now only if they supported paths filter for `workflow_call` [2] event in addition to push/pull_request and my life would be a lot easier. Nontrivial repos have an unfortunate habit of building some sort of broken version of change detection themselves.
The limit of 20 unique workflow calls is quite low too but searching the docs for a source maybe they have removed it? It used to say
> You can call a maximum of 20 unique reusable workflows from a single workflow file.
but now it's max of 4 nested workflows without loops, which gives a lot of power for the more complex repos [3]. Ooh. Need to go test this.
[1] https://docs.github.com/en/actions/reference/workflows-and-a...
[2] https://docs.github.com/en/actions/reference/workflows-and-a...
[3] https://docs.github.com/en/actions/how-tos/reuse-automations...
Wanna DRY out your github actions yaml?
Generate it from Dhall, or cue, or python, or some real language that supports actual abstractions.
If your problem is you want to DRY out yaml, and you use more yaml features to do it, you now have more problems, not fewer.
This. So true. Yaml has always been an overly complicated format, with weird quirks ( like norway becoming false in a list of country codes ).
I find it an absolute shame that languages like Dhall did not become more popular earlier. Now everything in devops is yaml, and I think many developers pick yaml configs not out of good reasons but defaulting to its ubiquity as sufficient.
>norway
yaml 1.2 was released in 2009, and it fixed this problem. this is an implementation issue.
https://yaml.org/spec/1.2.2/#12-yaml-history
If your data format is so complicated that all commonly used implementations are not compliant with your spec, maybe it's a problem with the data-format.
Every single implementation people actually use seems to be a messy mix of yaml 1.1 and 1.2....
Maybe if the yaml project wants to consider this fixed, they should have written some correct reference parsers themselves for any languages in need, and encouraged their use.
I noted this in reply to the comment above, but: the YAML 1.2 spec doesn't actually mandate that parsers use the Core Schema. They left it as a recommendation. So I don't consider it to be "fixed" at all.
I would not say it "fixed" the problem. It removed the _recommendation_ for parser implementations to use the regex `y|Y|yes|Yes|YES|n|N|no|No|NO|true|True|TRUE|false|False|FALSE|on|On|ON|off|Off|OFF` for parsing scalars as bools, it changed the _canonical_ presentation of bools from `y|n` to `true|false`, and it introduced the "schema" concept. It also introduced and recommended the use of the Core Schema, which uses `true|True|TRUE|false|False|FALSE` as the regex for inferring bools from scalars. But unsurprisingly, since using this schema is only a recommendation and not a requirement, many implementations elected to keep their backwards-compatible implementations that use the original regex.
So the Norway problem persists.
the recommendation was what caused the norway problem. it now strongly recommends not to do this, and it says that a Yaml parser should use the core schema unless instructed otherwise. going against the recommendation while saying that you're yaml 1.2 compliant feels like an issue that should be raised with the parser to me. I've never run into this issue in practice though.
is there a parser that says that it's Yaml 1.2 compliant that uses that regex? I don't know of one.
Why introduce all-new langugage, like Dhall, just for configuration? This seems like a total waste of time. And you still need to use "real" language (or bash) to write glue to connect to github configuration.
The config generators are very simple, and should to be written in whatever language your developers already know - which likely means Python or Javascript or Go.
Having used CI systems and application frameworks that support YAML anchors for configuration, adding in a programming language would be a massive amount of complexity for very little gain. We're not talking about dozens of locations with hundreds of lines of shared code.
Asking the team to add a new build dependency, learn a new language, and add a new build step would create considerably more problems, not fewer. Used sparingly and as needed, YAML anchors are quite easy to read. A good editor will even allow you to jump to the source definition just as it would any other variable.
Being self-contained without any additional dependencies is a huge advantage, particularly for open source projects, IMHO. I'd wager very few people are going to learn Dhall in order to fix an issue with an open source project's CI.
Your team doesn't know YAML, it knows github actions. There's zero transferable knowledge when switching from github actions to kubernetes deployments, as there is precisely the same zero correlation between kubernetes and ansible configs. 'It's all YAML' is a lie and I'm continuously surprised so many people are falling for it for so long. YAML is the code-as-data, but the interpreter determines what it all means.
Oh, for goodness' sake. We know YAML syntax and that's the only part that's relevant here. Pointing out that different software uses different keys for their configuration or even takes different actions for keys that happen to share the same name isn't particularly insightful. We haven't been bamboozled.
I don’t understand even more now. If you freely admit they’re different languages, the only reason to keep using the stupid deficient syntax is momentum, and while it isn’t a bad reason, it is costing you and everyone else in the long run.
Huh? I'm using YAML because that's the language used to configure GitHub Actions. You may not like YAML, and that's fine. But if we collectively had to learn the unique way each project generates their GitHub Actions config, that would be a massive waste of time.
YAML isn't that hard. Most GitHub Actions configs I see are well under 500 lines; they're not crumbling under the weight of complexity.
Assembly isn't hard either and yet almost nobody is writing it anymore, for a reason, just as nobody (to an epsilon) is writing jvm opcodes directly. Somehow the industry decided assembly is actually fine when doing workflows.
I'm saying GHA should use a proper programming language instead of assembly.
Use the language you are already working in? Most languages have good YAML serialization and I think in most languages a function call taking a couple parameters that vary to produce slightly different but related objects is going to be as readable or more readable than YAML anchors.
That would be better, but it's an option I already have available to me and it's just not attractive. AFAIK, GitHub Actions requires the config files to be committed. So, now I need to guard against someone making local modifications to a generated file. It's doable of course, but by the time I've set all this up, it would have been much easier for everyone to copy and paste the six lines of code in the three places they're needed. YAML anchors solve that problem without really creating any new ones.
If generating your GitHub Actions config from a programming language works for you, fantastic. I'm just happy we now have another (IMHO, attractive) option.
Most of the debate here is that a lot of us don't find YAML anchors attractive. It can be one of the papercuts of using YAML.
I mostly agree with the article that with GitHub Actions specifically, I try to refactor things to the top-level "workflow" level first, and then yeah resort to copy and paste in most other cases.
I'm a little less adamant that GitHub should remove anchor support again than the original poster, but I do sympathize greatly, having had to debug some CircleCI YAML and Helm charts making heavy use of YAML anchors. CircleCI's YAML is so bad I have explored options to build it with a build process. Yeah, it does create new problems and none of those explorations got far enough to really improve the process, but one of the pushes to explore them was certainly that YAML anchors are a mess to debug, especially when you've got some other tool concatenating YAML files together and can result in anchor conflicts (and also other parts of the same YAML that depend on a particular form of how anchor conflicts overwrite each other, oof). I don't see GitHub Actions necessarily getting that bad just by enabling anchors, but I have seen enough of where anchors become a crutch and a problem.
That's fair. And I'm not arguing that YAML anchors can never be a problem. I am saying that layering in a whole custom build system to handle a 250 line ci.yml file is not the trade-off I'd make. What I'd hazard to say most teams do in that situation is duplicate config, which is not without its own problems. I think YAML anchors is a fine solution for these cases and don't think they'll lead to total chaos. Alas, not all config options can be hoisted to a higher level and I'm trusting a team has explored that option when it's available.
If you're dealing with 10s of files that are 1000s of lines long, then YAML anchors may very well not be the ideal option. Having the choice lets each team find what works best for them.
> not talking about dozens of locations with hundreds of lines of shared code.
:) :) :)
.github/workflows in my current project: 33 files, 3913 lines total, 1588 lines unique.
(and this was _after_ we moved all we can into custom actions and sub-workflows)
Ouch. That sounds terrible with or without YAML anchors. GitHub Actions has overall been a great addition, allowing projects to integrate CI directly into their PR process. But, I never understood why it didn't have simpler paths for the very common use cases of CI and CD. Virtually any other dedicated CI product is considerably easier to bootstrap.
Or just use composite actions, it's not 2020 anymore.
Templating GitHub Actions is very powerful (I've worked with such a setup) but it has its own headaches and if you don't _need_ custom tooling better to not have it.
I can wish for improvements on the native setup without reaching out for the sledgehammer.
I think most of the pain with GitHub Actions goes away if you use actionlint, action-validator, prettier/yamlfmt in a single ci job to validate your setup. Can even add them as git hooks that automatically stage changes and give quick feedback when iterating.
Seconded. I've had huge success with generating workflows with CUE. Would definitely recommend it to anyone struggling with YAML.
If we're going there let's just stay there and not translate it back to YAML, which is absolutely inappropriate for specifying CI pipelines.
Just introduce a templating language and the CI to generate the yaml as part of your pipeline as way to make it simpler?
Above a certain level of complexity, sure. But having nothing in between is an annoying state of affairs. I use anchors in Gitlab pipelines and I hardly curse their names.
YAML anchors are a welcome feature and will allow us to DRY some of our uglier workflows that currently have a lot of redundancy/duplication.
OPs main argument seems to be "I don't have a use for it and find it hard to read so it should be removed".
(I'm the author.)
I don't think this is a fair characterization: it's not that I don't have a use for it, but that I think the uses are redundant with existing functionality while also making static and human analysis of workflows harder.
The counterpoint is that I don’t want to be an expert in GitHub actions to not repeat myself.
YAML anchors are a standard that I can learn and use in a lot of places.
The idiosyncrasies of GitHub actions aren’t really useful for me to learn.
Just my $0.02
in what ways do they make static analysis harder? Don't they flatten out trivially after parsing?
The flattening out is the problem: most (all?) widely used YAML parsers represent the YAML document using the JSON object model, which means that there's no model or element representation of the anchors themselves.
That in turn means that there's no way to construct a source span back to the anchor itself, because the parsed representation doesn't know where the anchor came from (only that it was flattened).
This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.
I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
> This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.
Yes, it's just difficult. The point made in the post isn't that it's impossible, but that it significantly changes the amount of of "ground work" that static analysis tools have to do to produce useful results for GitHub Actions.
> I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
It's worth noting that GitHub doesn't support other parts of the YAML spec: they intentionally use their own bespoke YAML parser, and they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
All in all, I think conformance with YAML is a red herring here: GitHub Actions is already its own thing, and that thing should be easy to analyze. Adding anchors makes it harder to analyze.
> conformance with YAML
maybe, but not entirely sure. 'Two wrongs don't make a right' kind of thinking on my side here.
But if they call it GFY and do what they want, then that would probably be better for everyone involved.
> they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
--
(though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
> But if they call it GFY and do what they want, then that would probably be better for everyone involved.
Yes, agreed.
> I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
I think GitHub has been pretty ambiguous about this: it's not clear to me at all that they intend to support either version of the spec explicitly. Part of the larger problem here is that programming language ecosystems as a whole don't consistently support either 1.1 or 1.2, so GitHub is (I expect) attempting to strike a happy balance between their own engineering goals and what common language implementations of YAML actually parse (and how they parse it). None of this makes for a great conformance story :-)
> (though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
The error report includes source spans, so the tool needs to map back to the original location of the anchor rather than its unrolled document position.
(This is table stakes for integration with formats like SARIF, which expect static analysis results to have physical source locations. It's not good enough to just say "there's a bug in this element and you need to find out where that's introduced," unfortunately.)
I think the main reason you see overwhelming support for anchors is that the existing Actions functionality is typically so cumbersome to implement and often makes it harder to understand a workflow. Anchor syntax is a little esoteric, but otherwise very simple and grokable.
To be clear, I understand why people want to use anchors. The argument isn't that they aren't useful: it's that the juice is not worth the squeeze, and that GitHub's decision to support them reflects a lack of design discretion.
Or in other words: if your problem is DRYness, GitHub should be fixing or enhancing the ~dozen other ways in which the components of a workflow shadow and scope with each other. Adding a new cross-cutting form of interaction between components makes the overall experience of using GitHub Actions less consistent (and less secure, per points about static analysis challenges) at the benefit of a small amount of deduplication.
> GitHub's decision to ... reflects a lack of design discretion.
So true, not the first time, not the last time.
I'm at the point of exploring Gerrit as an alternative
So GitHub shouldn't implement the spec because you personally don't like that the spec solves a problem you can optionally solve at another layer?
No; GitHub shouldn't support YAML anchors because it's a deviation from the status quo, and the argument is specifically that the actions ecosystem doesn't need to make analysis any harder than it already is.
(As the post notes, neither I nor GitHub appears to see full compliance with YAML 1.1 to be an important goal: they still don't support merge keys, and I'm sure they don't support all kinds of minutiae like non-primitive keys that make YAML uniquely annoying to analyze. Conforming to a complex specification is not inherently a good thing; sometimes good engineering taste dictates that only a subset should be implemented.)
> No; GitHub shouldn't support YAML anchors because it's a deviation from the status quo, and the argument is specifically that the actions ecosystem doesn't need to make analysis any harder than it already is. > > (As the post notes, neither I nor GitHub appears to see full compliance with YAML 1.1 to be an important goal: they still don't support merge keys, and I'm sure they don't support all kinds of minutiae like non-primitive keys that make YAML uniquely annoying to analyze. Conforming to a complex specification is not inherently a good thing; sometimes good engineering taste dictates that only a subset should be implemented.)
That's a long way to say "yes, actually"
> That's a long way to say "yes, actually"
"Because I don't like it" makes it sound like I don't have a technical argument here, which I do. Do you think it's polite or charitable to reduce peoples' technical arguments into "yuck or yum" statements like this?
> Conforming to a complex specification is not inherently a good thing
Kind of a hard disagree here; if you don't want to conform to a specification, don't claim that you're accepting documents from that specification. Call it github-flavored YAML (GFY) or something and accept a different file extension.
https://github.com/actions/runner/issues/1182
> YAML 1.1 to be an important goal: they still don't support merge keys
right, they don't do merge keys because it's not in YAML 1.2 anymore. Anchors are, however. They haven't said that noncompliance with YAML 1.2 spec is intentional
> Call it github-flavored YAML (GFY) or something and accept a different file extension.
Sure, I wouldn't be upset if they did this.
To be clear: there aren't many fully conforming YAML 1.1 and 1.2 parsers out there: virtually all YAML parsers accept some subset of one or the other (sometimes a subset of both), and virtually all of them emit the JSON object model instead of the internal YAML one.
is your criticism leveled at yaml anchors or github? in my anecdotal experience, yaml anchors were a huge help (and really, really not hard to grasp at a conceptual level) in maintaining uniform build processes across environments.
It is specifically leveled at YAML anchors in GitHub. I don't have a super strong opinion of YAML anchors in other contexts.
(This post is written from my perspective as a static analysis tool author. It's my opinion from that perspective that the benefits of anchors are not worth their costs in the specific context of GitHub Actions, for the reasons mentioned in the post.)
"YAML" should mean something. When I saw GitHub Actions supported "YAML", I thought "OK, certainly not my favorite, but I can deal with that", and so I read the YAML specification, saw anchors, and then had to realize the hard way that they didn't work on GitHub Actions, leaving me unsure what even would or wouldn't work going forward. Is this even the only way it differs? I don't know, as they apparently don't use YAML :/.
This also means that, if you use an off-the-shelf implementation to parse these files, you're "doing it wrong", as you are introducing a parser differential: I can put code in one of these files that one tool uses and another tool ignores. (Hopefully, the file just gets entirely rejected if I use the feature, but I do not remember what the experience I had was when I tried using the feature myself; but, even that is a security issue.)
> Except: GitHub Actions doesn’t support merge keys! They appear to be using their own internal YAML parser that already had some degree of support for anchors and references, but not for merge keys.
Well, hopefully they also prioritize fixing that? Doing what GitHub did, is apparently still doing, and what you are wanting them to keep doing (just only in your specific way) is not actually using "YAML": it is making a new bespoke syntax that looks a bit like YAML and then insisting on calling it "YAML" even though it isn't actually YAML and you can neither read the YAML documentation nor use off-the-shelf YAML libraries.
Regardless, it sounds like your tool already supports YAML anchors, as your off-the-shelf implementation of YAML (correctly) supports YAML anchors. You are upset that this implementation doesn't provide you source map attribution: that was also a problem with C preprocessors for a long time, but that can and should be fixed inside of the parser, not by deciding the language feature shouldn't exist because of library limitations.
> and so I read the YAML specification
but there isn't a single YAML spec, there are at least 2 in common use: yaml 1.1, and 1.2, which have discrete specs and feature-sets. re: anchor stuff specifically, 1.1 supports merge keys whereas 1.2 explicitly does not, so that's one thing
and github actions does not actually specify which yaml spec/version it uses when parsing workflow yaml files
it's unfortunately just not the case that "YAML means something" that is well-defined in the sense that you mean here
> "YAML" should mean something.
Sure, agreed. Another comment notes that GitHub probably should call this their own proprietary subset of YAML, and I wouldn't object to that.
> Well, hopefully they also prioritize fixing that?
I expect they won't, since it's not clear what version of YAML they even aim to be compatible with.
However, I don't understand why engineers who wouldn't jump off of a bridge because someone told them to would follow a spec to the dot just because it exists. Specifications are sometimes complicated and bad, and implementing a subset is sometimes the right thing to do!
GitHub Actions, for example, doesn't make use of the fact that YAML is actually a multi-document format, and most YAML libraries don't gracefully handle multiple documents in a single YAML stream. Should GitHub Actions support this? It's entirely unclear to me that there would be any value in them doing so; subsets are frequently the right engineering choice.
The argument that this is a security issue isn't very well fleshed out either. As far as I can tell, it boils down to his opinion that this makes YAML harder to read and thus less secure. But, the reality is we have to copy & paste config today and that's a process I've seen fail when a change needs to be made and isn't properly carried forward to all locations. I suppose I could argue that's a security concern as well.
Half the argument against supporting YAML anchors appears to boil down some level of tool breakage. While you can rely on simplifying assumptions, you take a risk that your software breaks when that assumption is invalidated. I don't think that's a reason to stop evolving software.
I've never seen a project use any of the tools the author listed, but I have seen duplicated config. That's not to say the tools have no value, but rather I don't want to be artificially restricted to better support tools I don't use. I'll grant that the inability to merge keys isn't ideal but, I'll take what I can get.
As some one that has written massive gitlab pipelines for monoliths and super tiny ones and on boarded teams, YAML anchors are amazing.
Agreed. Anchors is YAMLs real feature. YAML without anchors is just JSON + comments otherwise.
We support anchors in our CI/CD syntax at RWX, but only in a specific `aliases` section. I think this is a nice compromise.
https://www.rwx.com/docs/mint/aliasesCommon Lisp has this (and other dialects that imitate the Circle Notation; I think Scheme has it now, Emacs Lisp, TXR lisp, ):
E.g.
encodes where the two (a b) occurrences are one object. It can express circular structures: encodes an infinite circular list The object to be duplicated is prefixed with #<decimal-integer>=. This associates the object with the integer. The integer is later referenced as #<decimal-integer># to replicate it.The thing is, you don't see a lot of this in human-written files, whether they are source code or data.
This is not the primary way that Lisp systems use for specifying replicated data in configurations, let alone code.
Substructure sharing occurs whether you use the notation or not due to interned symbols. (Plus compilers can deduplicate strings and such.) In (a a a) there is only one object a, a symbol.
If you feed the implementation circular source code though, ANSI CL says the behavior is undefined. Some interpreters can handle it under the right circumstances. In particular ones that don't try to do a full macro-expanding code walk before running the code. Compilers, not so much.
For context:
- The complaint is Github using a non-standard, custom fork of yaml
- This makes it harder to develop linters/security tools (as those have to explicitly deal with all features available)
- The author of this blogpost is also the author of zizmor, the most well-known Github Actions security linter (also the only one I'm aware of)
YAML anchors may be a sharp tool but no one is forced to use them. I have written many verbose Github workflows that would have benefited from using anchors, and I am relieved to learn I can clean those up now.
I disagree. Instead of anchors we had to rely on third party file change actions to only trigger jobs on certain file path changes, instead of using the built in mechanism, because each job required the list, and the list was long.
Using anchors would have improved the security of this, as well as the maintenance. The examples cited don't remotely demonstrate the cases where anchors would have been useful in GA.
I agree that YAML is a poor choice of format regardless but still, anchor support would have benefitted a number of projects ages ago.
I don't think anchors' primary function is to allow global definitions (of variables or whatever), rather it's more like arbitrary templates/snippets to be reused through the YAML file.
In GitLab, where YAML anchors have been supported for years, I personally find them very useful —it's the only way of "code" reuse, really. In GitLab there's a special edtor just for .gitlab-ci.yml, which shows the original view and the combined read-only view (with all anchors expanded).
I agree that it's hard to point to the specific line of the source code, but it's enough — in case of an error — to output an action name, action property name, and actual property value that caused an error. Based on these three things, a developer can easily find the correct line.
>it's the only way of "code" reuse, really.
not really. You can also use include/extends pattern. If that is not enough, there is dynamic pipeline generation feature.
> include
Interesting, although to me it looks more like a way to split one file into several (which is rather useful).
> extends
What's the difference with anchors? Looks the same, except works with include (and doesn't work with any other yaml tool).
> dynamic pipeline generation
Which is even harder to reason about compared to anchors, although certainly powerful.
Wrote a new yaml grepping tool this past weekend and just realized thanks to this that I have a whole new can of worms to keep in mind. Ugh.
Turns out it does report values at their targets (which is desirable) but doesn't know or indicate that they're anchors (undesirable).
Also tested something with yq - if you tell it to edit a node that is actually from a yaml anchor, it updates the original anchor without warning you that that's what you're doing. Yikes.
(For anyone who wants to test it: https://pypi.org/project/yamlgrep/)
Anchors will be exceptionally useful for a few workflows for me. We have what is essentially the same setup/teardown for three jobs within one workflow. I’d love to be able to factor that stuff out without introducing yet another yaml file to the repo, this will be a big help.
A very pedantic point, but merge keys are not part of the YAML spec [1]! Merge keys are a custom type [2], which may optionally be applied during the construction phase of loading. I definitely wouldn't say that merge keys are integral to anchors.
(Also, as a personal bias, merge keys are really bad because they are ambiguous, and I haven't implemented them in my C++ yaml library (yaml-cpp) because of that.)
[1]: https://yaml.org/spec/1.2.2/
[2]: https://yaml.org/type/merge.html
Yeah, I find the situation here very confusing: I agree that merge keys are not part of YAML 1.2, but they are part of YAML 1.1. The reason they don't appear to be in the "main" 1.1 spec itself is because they were added to 1.1 after 1.1 was already deprecated[1].
[1]: https://ktomk.github.io/writing/yaml-anchor-alias-and-merge-...
'Dear GitHub: no YAML, please' would be a so much better title (content optional)
It seems that any data-orient approach would inevitably evolve into a programming language given enough time.
One day we might even see for-loop in CSS...
I have a theory that in such cases one might as well just give up and write a Turing-complete language in the first place, as the sort of TC-complete languages we get with this sort of "slowly but surely backed against the wall" situations are way worse than just starting from scratch.
I hypothesize a TC-complete language for something like CSS that included deep tracking under the hood for where values are coming from and where they are going would be very useful, i.e., you would have the ability to point at a particular part of the final output and the language runtime could give you a complete accounting of where it came from and what went into making the decisions, could end up giving us the auditability that we really want from the "declarative" languages while giving us the full power of the programming langauges we clearly want. However I don't have the time to try to manifest such a thing myself, and I don't know of any existing language that does what I'm thinking of. Some of the more powerful languages could theoretically do it as a library. It's not entirely unlike the auditing monad I mention towards the end of https://www.jerf.org/iri/post/2958/ . It's not something I'd expect a general-purpose language to do by default since it would have bad general-purpose performance, but I think for specialized cases of a TC-complete configuration langauge it could have value, and one could always run it as an debugging option and have an optimized code path that didn't track the sources of everything.
There are languages specifically for writing configs. Like dhall https://dhall-lang.org/
That's my thought as well. I predict we'll be seeing SDK's for generating github workflows by mid-2026. Maybe pulumi will get an extension for it. (I'm well aware that codegen yaml has been a thing for a long time, but I'm thinking about something targeting github workflows _specifically_.)
TBH it's getting a bit exhausting watching us go through this hamster wheel again and again and again.
Anchors are so, so useful. Buildkite (which has its own CI pipelines syntax) is a good example. Let’s say I want every pipeline step to run on my custom agents (on my self-hosted infra). I could either copy/paste an identical “agents” property across however many hundreds or thousands of CI steps I have.
Or I could use a YAML anchor.
Author's replacement for anchors is to use global syntax, like a top-level "env:" block.
This is a terrible advice from security endpoint - given that env variables are often used for secrets data, you really _don't_ want them to set them at the top level. The secrets should be scoped as narrow as possible!
For example, if you have a few jobs, and some of them need to download some data in first step (which needs a secret), then your choices are (a) copy-paste "env" block 3 times in each step, (b) use the new YAML anchor and (c) set secret at top-level scope. It is pretty clear to me that (c) is the worst idea, security wise - this will make secret available to every step in the workflow, making it much easier for malware to exfiltrate.
I agree. OP’s statement ”the need to template environment variables across a subset of jobs suggests an architectural error in the workflow design” does not ring true for cases where you want developers to be able to quickly deploy a separate environment for each development branch, especially if said branch needs to connect to a matching backend/API/other service.
General YAML anchors as implemented by the standard: good (not great, just good).
Custom YAML anchors with custom support and surprise corner cases: bad.
I think the author is nuts.
First, he can just not use the feature, not advocate for its removal.
Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
The author does not think he's nuts :-)
> First, he can just not use the feature, not advocate for its removal.
I maintain a tool that ~thousands of projects use to analyze their workflows and actions. I can avoid using anchors, but I can't avoid downstreams using them. That's why the post focuses on static analysis challenges.
> Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
This is explicitly addressed immediately below the example.
Just because it is expressed in YAML doesn't make YAML the party to blame here. I would say one of the main concerns I have with anything in GitHub Actions related to the word "merge" has to do with identifying the last commit for a merge, not merging of objects in YAML.
If you have two workflows... one to handle a PR creation/update and another to address the merge operation, it is like pulling teeth to get the final commit properly identified so you can grab any uploaded artifacts from the PR workflow.
Ok, now make the 'redundancy' argument with anything other than `env` or `permissions`.
I think they should be supported because it's surprising and confusing if you start saying 'actually, it's a proprietary subset of YAML', no more reason needed than that.
Obviously they are very useful. I still don't think they should exist in this usage of yaml.
Once you allow setting and reading of variables in a configuration file, you lose the safety that makes the format useful. You might as well be using a bash script at that point.
You already can set and read variables. The `matrix` section is often used to test against multiple versions of software. Environment variables can be referenced. And the project configuration supports both secrets and variables configured at the project level.
Sorry, I think my comment was not clear.
I think allowing both setting and reading of variables directly in the configuration file is a problem.
Not reading variables that have been set outside of the configuration file alone.
Honestly, everything about GH actions/AzDO pipelines is infuriating. The poor tooling with poor write-time assertions are just so frustrating.
Give me a proper platform that I can run locally on my development machine.
but, if those anchors are a blessed standard YAML feature that YAML tools will provide real assertions about unlike the ${{}} stuff that basically you're doing a commit-push-run-wait-without any proper debug tools besides prints?
Then yes, they should use them.