I would assume that this is easy enough to implement that it will likely appear in a minor update to the upcoming Visual Studio version. MS kept updating the compiler since VS 2022, too.
I certainly hope so, but we'll see. To give an example, std::chrono::current_zone (C++20) still doesn't work on Android even to this day.
So as long as #embed isn't supported by all the 3 major compilers, I am sticking with my current embedding setup. I guess that's what I was thinking of.
let me know when my embedded target's compiler is C23 compliant (i mean, i whish. we may be getting C11 or even C17 some times next year but i'm not holding my breath)
What are you targetting? for instance all ESP32 now support GCC15 which has support for #embed. AVR also has GCC 15 toolchains for months, as well as ARM which also allows you to target STM32 and Nordic nRF stuff.
The thing that always irks me about c++ is this sort of thing:
> Explanation
1) Searches for the resource identified by h-char-sequence in implementation-defined manner.
Okay, so now I have to make assumptions that the implementation is reasonable, and won't go and "search" by asking an LLM or accidentally revealing my credit card details to a third party, right?
And even if the implementation _is_ reasonable the only way I know what "search" means in this context is by looking at an example, and the example says "it's basically a filename".
So now I think to myself: if I want to remain portable, I'll just write a python script to do a damn substitution to embed my file, which is guaranteed to work under _any_ implementation and I don't have to worry about it as soon as I have my source file.
You're not the only one who feels that way, but IMHO it's not a valid complaint.
The C++ standard says implementation defined because the weeds get very thick very quickly:
- Are paths formed with forward slash or backslash?
- Case sensitive?
- NT style drive letter or Posix style mounts?
- For relative paths, what is it relative to? When there are multiple matches, what is the algorithm to determine priority?
- What about symlinks and hard links?
- Are http and ftp URIs supported (e.g. an online IDE like godbolt). If so, which versions of those protocols? TLS 1.3+ only? Are you going to accept SHA-1?
- Should the file read be transactional?
People already complain that the C++ standard is overly complicated. So instead of adding even more complexity by redefining the OS semantics of your build platform in a language spec, they use "implementation defined" as a shorthand for "your compiler will call fopen" plus some implementation wiggle room like command line options for specifying search paths and the strategy for long paths on Windows
What if #embed steals my credit card data is a pointless strawman. If a malicious compiler dev wanted to steal your credit card data, they'd just inject the malicious code; not act like a genie, searching the C++ spec with a fine comb for a place where they could execute malicious code while still *technically* being standards conformant. You know that, I know that, we all know that. So why are we wasting words discussing it?
The real reason why this stuff in underspecified in the spec is that some mainframe operating systems don't have file systems in the common modern sense, but support C++. Those vendors push back a lot against narroed definitions as far as I know.
Including files also opens up some potential security issues that the standards committee just didn't want to prescribe solutions to. Compiler explorer hides easter eggs around the virtual filesystem, for example:
#include also searches for the file you give it in an "implementation-defined manner", so if you have this complaint about #embed, you ought to also consider #include equally problematic
> So now I think to myself: if I want to remain portable, I'll just write a python script
How can you know that your Python implementation won't send your credit card details to an LLM when it runs your script? It does not follow an ISO standard that says it can't do that. You're not making assumptions about it's behavior, are you?
This doesn't sound like the kind of portability anyone is really worried about. I get that the docs on the linked site are written in standards-ese and are complicated by macro replacement, but I don't think the outcome of sending your credit card details away is gonna be an outcome. If it was, an uncharitable implementation with access to your card details would be free to do that any time you gave it input invoking undefined behaviour (which is of course not uncommon, especially in incorrect code).
which makes me consider an interesting distinction, undefined behavior refers to the behavior of the compiler output, does the C standard "allow" compilers to do compile-time code executions with undefined behavior? is the runtime behavior of the compiler even in scope for the standard in general?
If you want to remain portable, write your code in the intersection of the big 3 - GCC, Clang and MSVC - and you’ll be good enough. Other implementations will either be weird enough that many things you’d expect to work won’t or are forced to copy what those 3 do anyway.
...what? What are you talking about? In what world would a compiler implement a preprocessor directive to ever use an llm, the internet, or your credit card details (from where would it get those)??? There are always implementation defined things in every language, for example, ub behavior. Do you get worried that someone will steal your bitcoin every time you use after free? Of course not! Even in Python when you OOM -- at least in CPython -- you crash with undefined behavior.
You can also do it using ld - it's something like ld -r --format binary -o out.o <file>, although you do want some build system assistance to generate header files allowing you to access the thing (somewhat similar to the assembly example here). It's a bit of a performance but I strongly prefer it to generating header files in the earlier options - those header files can end up being _very_ large (they generally multiply up the size of the embedded file by 2-4x) and slow to compile.
All a bit less relevant now since recent C++ versions have this built in by default. Generally something languages have been IMO too slow on (e.g. Go picked this up four or so years ago, after a bunch of less nice home-grown alternatives), it's actually just really useful to make things work in the real world, especially for languages that you can distribute as single-file binaries (which IMO should be all of them, but sadly it's not always).
you could also use the linker to link in basically anything into the file where u like.
it might be a bit 'arcane' way to do it idk... but to me it always seemed the logical way.. u can also define symbols etc around it and use extern in ur c/cpp program to reference those.to access the data in light of dynamic linking / alsr etc.
I use https://github.com/graphitemaster/incbin . It mostly works (had to make some mods when I needed more section magic to happen). Nothing I'm on supports #embed yet but here's to hoping.
You can apply `#` to __VA_ARGS__, which won’t preserve the exact whitespace, but for many languages it’s good enough. biggest issue is you can’t have `#` in the text.
Don't know what you mean, it works fine here. Python is too large and unreliable a dependency for something so trivial (which can be accomplished using standard POSIX utilities if need be).
Indeed, even writing this utility in C is trivial and has 0 extra dependency for a pure C/C++ project. Avoiding #embed also removes the dependency to a C++23 capable compiler, which might not be available in uncommon scenarios.
Python is pretty much mandatory for Linux systems nowadays, unless you're dealing with something really minimalist or trying to be very portable it's safe to rely on.
I know people who wont even write perl and will instead write shell scripts, because perl might not be available. And I don't think I've seen a unix-like system without perl.
Right... except for the use case I pointed out. That being that perl is available on virtually every unix like operating system, not just Linux, and python isn't.
Also, perl has other uses. Perl is much more competent at awk/sed/grep like tasks than python, and it can also be much faster. More people should be writing perl, imo. There's a ton of programs written in C that should just be perl.
Outdated, modern solution is baked in now
https://en.cppreference.com/w/c/preprocessor/embed
That's good to know, but I've noticed it was added in C++26 and seems to be supported in GCC 15 and Clang 19, but not MSVC.
I think in a few (3-4?) years it will be safe to use, but in any case not now.
Still, good to know that it exists.
I would assume that this is easy enough to implement that it will likely appear in a minor update to the upcoming Visual Studio version. MS kept updating the compiler since VS 2022, too.
I certainly hope so, but we'll see. To give an example, std::chrono::current_zone (C++20) still doesn't work on Android even to this day.
So as long as #embed isn't supported by all the 3 major compilers, I am sticking with my current embedding setup. I guess that's what I was thinking of.
It will be at least a decade before I can rely on that in systems software that needs to be portable.
There are good reasons to stick to C89.
let me know when my embedded target's compiler is C23 compliant (i mean, i whish. we may be getting C11 or even C17 some times next year but i'm not holding my breath)
What are you targetting? for instance all ESP32 now support GCC15 which has support for #embed. AVR also has GCC 15 toolchains for months, as well as ARM which also allows you to target STM32 and Nordic nRF stuff.
cake[0] might interest you. Basically transpiles C into C89.
[0]: https://github.com/thradams/cake
What current embedded target in $this_year doesn't have a C11 compiler? I'll send you $5 if you can name one.
easy: microchip.
The thing that always irks me about c++ is this sort of thing:
> Explanation 1) Searches for the resource identified by h-char-sequence in implementation-defined manner.
Okay, so now I have to make assumptions that the implementation is reasonable, and won't go and "search" by asking an LLM or accidentally revealing my credit card details to a third party, right?
And even if the implementation _is_ reasonable the only way I know what "search" means in this context is by looking at an example, and the example says "it's basically a filename".
So now I think to myself: if I want to remain portable, I'll just write a python script to do a damn substitution to embed my file, which is guaranteed to work under _any_ implementation and I don't have to worry about it as soon as I have my source file.
Does anyone else feel this way or is it just me?
You're not the only one who feels that way, but IMHO it's not a valid complaint.
The C++ standard says implementation defined because the weeds get very thick very quickly:
- Are paths formed with forward slash or backslash?
- Case sensitive?
- NT style drive letter or Posix style mounts?
- For relative paths, what is it relative to? When there are multiple matches, what is the algorithm to determine priority?
- What about symlinks and hard links?
- Are http and ftp URIs supported (e.g. an online IDE like godbolt). If so, which versions of those protocols? TLS 1.3+ only? Are you going to accept SHA-1?
- Should the file read be transactional?
People already complain that the C++ standard is overly complicated. So instead of adding even more complexity by redefining the OS semantics of your build platform in a language spec, they use "implementation defined" as a shorthand for "your compiler will call fopen" plus some implementation wiggle room like command line options for specifying search paths and the strategy for long paths on Windows
What if #embed steals my credit card data is a pointless strawman. If a malicious compiler dev wanted to steal your credit card data, they'd just inject the malicious code; not act like a genie, searching the C++ spec with a fine comb for a place where they could execute malicious code while still *technically* being standards conformant. You know that, I know that, we all know that. So why are we wasting words discussing it?
The real reason why this stuff in underspecified in the spec is that some mainframe operating systems don't have file systems in the common modern sense, but support C++. Those vendors push back a lot against narroed definitions as far as I know.
Including files also opens up some potential security issues that the standards committee just didn't want to prescribe solutions to. Compiler explorer hides easter eggs around the virtual filesystem, for example:
https://godbolt.org/z/KcqTM5bTr
#include also searches for the file you give it in an "implementation-defined manner", so if you have this complaint about #embed, you ought to also consider #include equally problematic
> So now I think to myself: if I want to remain portable, I'll just write a python script
How can you know that your Python implementation won't send your credit card details to an LLM when it runs your script? It does not follow an ISO standard that says it can't do that. You're not making assumptions about it's behavior, are you?
This doesn't sound like the kind of portability anyone is really worried about. I get that the docs on the linked site are written in standards-ese and are complicated by macro replacement, but I don't think the outcome of sending your credit card details away is gonna be an outcome. If it was, an uncharitable implementation with access to your card details would be free to do that any time you gave it input invoking undefined behaviour (which is of course not uncommon, especially in incorrect code).
which makes me consider an interesting distinction, undefined behavior refers to the behavior of the compiler output, does the C standard "allow" compilers to do compile-time code executions with undefined behavior? is the runtime behavior of the compiler even in scope for the standard in general?
If you want to remain portable, write your code in the intersection of the big 3 - GCC, Clang and MSVC - and you’ll be good enough. Other implementations will either be weird enough that many things you’d expect to work won’t or are forced to copy what those 3 do anyway.
This is what I have been doing for years. Works well for me.
Sometimes it is annoying but realistically it is a good strategy.
...what? What are you talking about? In what world would a compiler implement a preprocessor directive to ever use an llm, the internet, or your credit card details (from where would it get those)??? There are always implementation defined things in every language, for example, ub behavior. Do you get worried that someone will steal your bitcoin every time you use after free? Of course not! Even in Python when you OOM -- at least in CPython -- you crash with undefined behavior.
Sorry for being so aggressive. I suppose I'm just very confused at where you're coming from.
this take is basically equivalent to "don't write software unless you write the stack from scratch."
You can also do it using ld - it's something like ld -r --format binary -o out.o <file>, although you do want some build system assistance to generate header files allowing you to access the thing (somewhat similar to the assembly example here). It's a bit of a performance but I strongly prefer it to generating header files in the earlier options - those header files can end up being _very_ large (they generally multiply up the size of the embedded file by 2-4x) and slow to compile.
All a bit less relevant now since recent C++ versions have this built in by default. Generally something languages have been IMO too slow on (e.g. Go picked this up four or so years ago, after a bunch of less nice home-grown alternatives), it's actually just really useful to make things work in the real world, especially for languages that you can distribute as single-file binaries (which IMO should be all of them, but sadly it's not always).
The special ld argument is a gnu thing, it's not portable and at least lld doesn't support it.
https://github.com/jcalvinowens/ircam-viewer/commit/17b3533b...
It's also not reliable for most architectures.
you could also use the linker to link in basically anything into the file where u like.
it might be a bit 'arcane' way to do it idk... but to me it always seemed the logical way.. u can also define symbols etc around it and use extern in ur c/cpp program to reference those.to access the data in light of dynamic linking / alsr etc.
here is some resource on it with some examples: https://wiki.osdev.org/Linker_Scripts
u can include any file. another executable, images, etc. etc. no need for weird stuff in the c sources?
on the flipside, is there a benefit of doing it inside the source code?? (apart from not having to roll ur own linker script and learn that dragon?)
I use https://github.com/graphitemaster/incbin . It mostly works (had to make some mods when I needed more section magic to happen). Nothing I'm on supports #embed yet but here's to hoping.
My very first open source project[1] aimed to solve the same problem. Nice to see it still has quite a few weekly downloads.
[1] https://sourceforge.net/projects/bin2c/
My current workaround until it arrives in all C++ compilers
``` inline constexpr auto bootstrap = #include "bootstrap.lua" ;
// ... later
lua.script(bootstrap, "@bootstrap"); ```
The lua code ``` R"( -- your code here )"; ```
surely the preprocessor method doesn't work in the general case, since the data can contain commas or parentheses.
Regardless all of the methods suggested are terrible. If you don't have access to #embed, just write a trivial python script.
You can apply `#` to __VA_ARGS__, which won’t preserve the exact whitespace, but for many languages it’s good enough. biggest issue is you can’t have `#` in the text.
How is `xxd -i' terrible?
It's still lacking content that goes before/after the output.
Just write a Python script that does the whole thing.
Don't know what you mean, it works fine here. Python is too large and unreliable a dependency for something so trivial (which can be accomplished using standard POSIX utilities if need be).
Indeed, even writing this utility in C is trivial and has 0 extra dependency for a pure C/C++ project. Avoiding #embed also removes the dependency to a C++23 capable compiler, which might not be available in uncommon scenarios.
Python is pretty much mandatory for Linux systems nowadays, unless you're dealing with something really minimalist or trying to be very portable it's safe to rely on.
> it's safe to rely on
Is there any guarantee they won't break backwards compatibility again?
I wrote this almost ten years ago and it still works fine: https://github.com/jcalvinowens/diveutils/blob/master/consta...
Arguably it could be a little C helper, but I wanted this particular piece of the project to be more accessible so I used a scripting language.
I know people who wont even write perl and will instead write shell scripts, because perl might not be available. And I don't think I've seen a unix-like system without perl.
Perl is obsolete and was replaced by Python for all usages in practice.
Right... except for the use case I pointed out. That being that perl is available on virtually every unix like operating system, not just Linux, and python isn't.
Also, perl has other uses. Perl is much more competent at awk/sed/grep like tasks than python, and it can also be much faster. More people should be writing perl, imo. There's a ton of programs written in C that should just be perl.
> a unix-like system without perl
QNX was one. Don't know if they started to include it in newer versions than 6.5 though
Why don't you #embed?
Because the linked article is from 2013.