> Spans and slice-like structures in are the future of safe memory operations in modern programming languages. Embrace them.
I've been using Span<T> very aggressively and it makes a massive difference in cases where you need logical views into the same physical memory. All of my code has been rewritten to operate in terms of spans instead of arrays where possible.
It can be easy to overlook ToArray() or (more likely) code that implies its use in a large codebase. Even small, occasional allocations are all it takes to move your working set out of the happy place and get the GC cranking. The difference in performance can be unreasonable in some cases.
You can even do things like:
var arena = stackalloc byte[1024];
var segment0 = arena.Slice(10);
var segment1 = arena.Slice(10, 200);
...
The above will incur no GC pressure/activity at all. Everything happens on the stack.
Too many stack accesses kill performance also, because the stack is always relative. For any variable access you need an offset, whilst for a malloced access you don't.
Just a few days ago I changed stack access to heap access in a larger generated perfect hash and got 100x better performance. It really hit the stack too hard, and the compiler didn't do the obvious optimizations.
Not trying to mock here, but did you program assembly in the 80s/90s and/or look at compilers involved with CPU's of comparable complexity to the 68000 series or older?
Yes, relative addressing on older machines like that does hurt a ton (I was coding a jam-game of the GameBoy recently and had to re-orient some of my code away from stack-usage since I've gotten a tad "lazy" over the years and didn't care to program 100% assembly).
On modern machines (since circa Pentium generation CPU's) that can do one cycle-multiplications memory-offset accesses are often irrelevant for the performance of access operations.
More importantly, at about 500mhz-1ghz the sheer latency of main-memory accesses vs cached accesses will start to become your main worry since cache-misses start to approach _hundreds of cycles_ (this is why we have the entire data-oriented-design philosophy growing), while on the other hand the modern CPU's can pipeline many instructions to even make the impact of bounds checks more or less insignificant compared cache misses.
That looks more like the effect of a bad compiler.
On most modern CPUs, including Intel/AMD & ARM-based, memory accesses with relative-addressing to the stack have at least the same performance if not much better performance than memory accesses through pointers to the heap. The stack is also normally cached automatically by the memory prefetcher.
So whichever was the cause of your performance increase was not an inefficiency of the stack accesses, but some other difference in the code generated by the compiler. Such an unexpected and big performance difference could normally be caused only by a compiler bug.
A special case when variables allocated in the stack can lead to low performance is when a huge amount of variables are allocated or many arrays are allocated, and then they are only sparsely used and the initial stack is much smaller. Then any new allocation of an array or big bunch of variables may exceed the stack size, which will cause a page fault, so that the operating system will grow the stack by one memory page.
This kind of transparent memory allocation by page faults can be much slower than the explicit memory allocation done by malloc or new. This is why big arrays should normally be allocated either statically or in the heap.
I do not understand this comment. A stack access should never be more expensive than access to malloced data. This is easy to see: Malloc gives you a pointer. An address computation to find a variable on the stack also gives you a pointer but is much cheaper than malloc. If the compiler recomputes the address to a stack variable then this must be because it is deemed cheaper than wasting a register to cache the pointer. Nowadays compilers can transform malloc to stack allocations variables in certain cases.
And I remember better now. I changed stack access to immediate global access to const arrays. And insane was the compilation time, not so much the runtime. -O2 timed out, and if it compiled it was 10x slower run-time.
Now compilation is immediate.
This compiles large perfect hashes to C code. Like gperf, just better and for huge arrays.
In this case, as I understand, you switched from a local array which is re-initialized with values upon each function invocation to a global array that is pre-initialized before even the program starts. So my understanding is you're measuring the speed between repeated re-initialization of your local array and no initialization at all — not between "relative" stack access and "absolute" heap or global memory access.
As I understand, to do it the way you wanted initially, your local array must be `static const` instead of just const (so that there's a single copy of it, essentially making it a global variable as well). To be honest, I am not sure why C compiler doesn't optimize it this way automatically, likely I don't know C spec well enough to know the reasons.
This is why Virgil has support for ranges in the language, which are better than slices. They are value types that represent a subset of a larger array. They can also be off-heap, which allows a Range<byte> to safely refer to memory-mapped buffers.
For working with arrays elements without bound checks, this is the modern alternative to pointers, without object pinning for GC and "unsafe" keyword: MemoryMarshal. GetArrayDataReference<T>(T[]). This is still totally unsafe, but is "modern safer unsafe" that works with `ref`s and makes friends with System.Runtime.CompileServices.Unfafe.
Funny point: the verbosity of this method and SRCS.Unsafe ones make them look slower vs pointers at subconscious level for me, but they are as fast if not faster to juggle with knifes in C#.
The `fixed` keyword is mostly for fast transient pinning of data. Raw pointers from `fixed` remain handy in some cases, e.g. for alignment when working with AVX, but even this can be done with `ref`s, which can reference an already pinned array from Pinned Object Heap or native memory. Most APIs accept `ref`s and GC continues tracking underlying objects.
I truly appreciate articles like this. I am using the Umbraco CMS, and have written code to use lower than the recommended requirements to keep the entire system running. While I don't see a use for using a Span<T> yet, I could definitely see it being useful for a website with an enormous amount of content.
I am currently looking into making use of "public readonly record struct" for the models that I create for my views. Of course, I need to performance profile the code versus using standard classes with readonly properties where appropriate, but since most of my code is short-lived for pulling from the CMS to hydrate classes for the views, I'm not sure how much of a benefit I will get. Luckily I'm in a position to work on squeezing as much performance as possible between major projects.
I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget? I have spent years (since 2013) trying to squeeze every ounce of performance from the code, as I work with quite a few smaller businesses, and even the rest of my team are starting to look into Wix or Squarespace, since it doesn't require a "me" to be involved to get a site up and running.
To my credit and/or surprise, I haven't dealt with a breach to my knowledge, and I read logs and am constantly reviewing code as it is my passion (at least working within the confines of the Umbraco CMS, although it isn't my only place of knowledge). I used to work with PHP and CodeIgniter pre-2013 (then Kohana a bit while making the jump from PHP to .NET). I enjoy C#, and feel like I am able to gain quite a bit of performance from it, but if anyone has any ideas for me on how to create even more value from this, I would be extremely interested.
Like others have mentioned, in a CMS like project your bottlenecks are more likely in terms of database and/or caching.
Span<T> , stackalloc and value-structs will matter more when writing heavy data/number crunching scenarios like imageprocessing, games, "AI"/vector queries or things like _implementing_ database engines (see yesterdays discussion on the guys announcing they're using C++ where Rust, Go, Erlang, Java and C# was discussed for comparisons https://news.ycombinator.com/item?id=45389744 ).
I'm often spending my days on writing applications that are reminiscent of CMS workloads and while I sometimes do structs, I've not really bought out my lowlevel optimization skills more than a few times in the past 6 years, 95% of the time it's bad usage of DB's.
For a CMS I'd usually suspect the major bottlenecks to be in the DB queries. Especially when the language is already pretty fast by default like C#.
You really need to measure before going to low level optimizations like this. Odds are in this case that the overhead is in the framework/CMS, and you gain the most by understanding how it works and how to use it better.
Span<T> is really more of an optimization you should pay attention to when you write lower level library code.
Usually CMS performance problems are related to the database, or how rendering components are being used, or wrongly cached.
The two .NET CMS I have experience with, Sitecore and Optimizely, something like Span would hardly bring any improvement, rather check the way their ORM is being used, do some direct SQL, cache some renderings in a different way, cross check if the CMS APIs are being correctly used.
> I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS
This response is not directly answering that "in a .NET CMS" part of your question. I'm just trying to say how to think about when to worry about optimizations.
These sorts of micro optimizations are best considered when your are trying to solve a particular performance problem, particularly when you are dealing with a site that is not getting a lot of hits. I've experienced using small business ecommerce websites where each page load takes 5 seconds and given up trying to buy something. In that case profiling the site and figuring out the problem is very worth while.
When you have a site getting a lot of hits, these sorts of performance optimizations can help you save cost. If your service takes 100 servers to run and you can find some performance tweaks to get down to 75 server, that may be worth the engineering effort.
My recommendation is to use a profiler of some type. Either on your application in aggregate to identify hot spots in in search of the source of a particular performance problem. Once you identify a hot spot, construct a micro benchmark of the problem in BenchmarkDotNet and try to use tools like Span<T> to fix the problem.
For a CMS or any similar situation, you can get huge performance improvements from higher level changes than Span<T>. Using the HTTP cache-control headers correctly in conjunction with a CDN can provide an order of magnitude improvement. Simply sending less HTML/CSS/JS by using a more efficient layout template can similarly have a multiplier effect on the entire site.
In my experience, the biggest wins by far were achieved by using the network tab of the browser F12 tools. The next biggest was Azure Application Insights profiler running in production. Look at the top ten most expensive database queries and tune them to death.
The use of Span<T> and the like is much more important for the authors of shared libraries more than "end users" writing a web app. Speaking of which, you can increase your usage of it by simply updating your NuGet package versions, .NET framework version to 9 or 10, etc... This will provide thousands of such micro optimisations for very little effort!
C# is GC'd, the system will protect memory in use and while it can allow things like Span<>, Memory<>,etc there are some constraints due to sloppy lifetime reasoning but in general "easy" usage since you do not need to care about lifetime.
Rust has lifetime semantics built down to the core of the language, the compiler will know much better about what's safe but also forbid you early from things that are safe but not provable (they're improving the checkers based on experience though), due to it's knowledge it will be better at handling allocations more exactly.
Personally as someone with an assembly, C,C++,etc background, while I see the allure of Rust and do see it as a plus if less experienced devs that really need perf go for Rust for critical components, and thinking I'm going to try to do some project in Rust...
I've so far not seen a project where the slight performance improvement of Rust will outweight the "productivity" gain from being able to do a bit "sloppy" connections that C# allows for.
The restriction in C# comes from its ability to reference stack allocated memory. I'm not familiar with Rust but it probably figures it out based on the lifetime of T.
This is a good example to learn how to use the tools a programming language offers, just saying a programing language has a GC thus bad is meaningless, without understanding what is actually available.
Regarding,
> Spans and slice-like structures in are the future of safe memory operations in modern programming languages.
It is sad how long stuff takes to reach mainstream technology, in Oberon the equivalent declaration to partition would be,
PROCEDURE partition(span: ARRAY OF INTEGER): INTEGER
And if the type is the special case of ARRAY OF BYTE (need to import SYSTEM for that), then any type representation can be mapped into a span of bytes.
You will find similar capabilities in Cedar, Modula-2+, Modula-3, among several others.
Modern safe memory langaguage are finally catching up with the 1990's research, pity it always takes this much for adoption of cool ideas.
Having said this, I feel modern .NET has all the features that made me like Modula-3 back in the day, even if some are a bit convoluted like inline arrays in structs.
Even on the contrary, once you introduce GC usage into languages without direct support you're basically always running a substandard GC because so many advances(enabled by read/write barriers) aren't really available without language support.
Then people go around shouting GC's are bad because they used them with a language that made them use some of the worst ones out there.
For the op, awesome article. Quick question: what’s the definition of your `swap(array, i, pivotIndex)` function? Am I missing something? Or just assumes it’s the standard set temp to a, set a to b, and set b to temp?
I recall a specific project involving a network appliance that generated large log streams. Our bottleneck was the log parser, which was aggressively using string.Substring() to isolate fields. This approach continuously allocated new string objects on the heap, which led to excessive pressure on the GC.
The transition to using ReadOnlySpan<char> immediately addressed the allocation issue. We were able to represent slices of the incoming buffer without any heap allocations and the parser logic was simplified significantly.
I don't think that's quite a 1-to-1 match for what's described in the article. Both C#'s Span<T> and your span type are type- and bounds-safe, but the former has additional restrictions placed on its usage thanks to the `ref` keyword that guarantee that it will be free of lifetime errors as well without needing to involve the runtime.
Anecdote: 9 years ago I was at MSFT. Hands forced by long GC pauses, eventually many teams turned to hand-rolling their flavor of string_view in C#. It was literally xkcd.com/927 back then when you tried to interface with some other team's packages and each side has the same but different string_view classes. Glad to see that finally enjoying language and stdlib support.
> Spans and slice-like structures in are the future of safe memory operations in modern programming languages. Embrace them.
I've been using Span<T> very aggressively and it makes a massive difference in cases where you need logical views into the same physical memory. All of my code has been rewritten to operate in terms of spans instead of arrays where possible.
It can be easy to overlook ToArray() or (more likely) code that implies its use in a large codebase. Even small, occasional allocations are all it takes to move your working set out of the happy place and get the GC cranking. The difference in performance can be unreasonable in some cases.
You can even do things like:
The above will incur no GC pressure/activity at all. Everything happens on the stack.Too many stack accesses kill performance also, because the stack is always relative. For any variable access you need an offset, whilst for a malloced access you don't.
Just a few days ago I changed stack access to heap access in a larger generated perfect hash and got 100x better performance. It really hit the stack too hard, and the compiler didn't do the obvious optimizations.
Not trying to mock here, but did you program assembly in the 80s/90s and/or look at compilers involved with CPU's of comparable complexity to the 68000 series or older?
Yes, relative addressing on older machines like that does hurt a ton (I was coding a jam-game of the GameBoy recently and had to re-orient some of my code away from stack-usage since I've gotten a tad "lazy" over the years and didn't care to program 100% assembly).
On modern machines (since circa Pentium generation CPU's) that can do one cycle-multiplications memory-offset accesses are often irrelevant for the performance of access operations.
More importantly, at about 500mhz-1ghz the sheer latency of main-memory accesses vs cached accesses will start to become your main worry since cache-misses start to approach _hundreds of cycles_ (this is why we have the entire data-oriented-design philosophy growing), while on the other hand the modern CPU's can pipeline many instructions to even make the impact of bounds checks more or less insignificant compared cache misses.
That looks more like the effect of a bad compiler.
On most modern CPUs, including Intel/AMD & ARM-based, memory accesses with relative-addressing to the stack have at least the same performance if not much better performance than memory accesses through pointers to the heap. The stack is also normally cached automatically by the memory prefetcher.
So whichever was the cause of your performance increase was not an inefficiency of the stack accesses, but some other difference in the code generated by the compiler. Such an unexpected and big performance difference could normally be caused only by a compiler bug.
A special case when variables allocated in the stack can lead to low performance is when a huge amount of variables are allocated or many arrays are allocated, and then they are only sparsely used and the initial stack is much smaller. Then any new allocation of an array or big bunch of variables may exceed the stack size, which will cause a page fault, so that the operating system will grow the stack by one memory page.
This kind of transparent memory allocation by page faults can be much slower than the explicit memory allocation done by malloc or new. This is why big arrays should normally be allocated either statically or in the heap.
I do not understand this comment. A stack access should never be more expensive than access to malloced data. This is easy to see: Malloc gives you a pointer. An address computation to find a variable on the stack also gives you a pointer but is much cheaper than malloc. If the compiler recomputes the address to a stack variable then this must be because it is deemed cheaper than wasting a register to cache the pointer. Nowadays compilers can transform malloc to stack allocations variables in certain cases.
> Just a few days ago I changed stack access to heap access in a larger generated perfect hash and got 100x better performance
Link me the code so that I can show what you did wrong
https://github.com/rurban/cmph/commit/4bb99882cc21fd44e86602...
I did not nothing wrong :)
And I remember better now. I changed stack access to immediate global access to const arrays. And insane was the compilation time, not so much the runtime. -O2 timed out, and if it compiled it was 10x slower run-time. Now compilation is immediate.
This compiles large perfect hashes to C code. Like gperf, just better and for huge arrays.
Thank you for linking.
In this case, as I understand, you switched from a local array which is re-initialized with values upon each function invocation to a global array that is pre-initialized before even the program starts. So my understanding is you're measuring the speed between repeated re-initialization of your local array and no initialization at all — not between "relative" stack access and "absolute" heap or global memory access.
As I understand, to do it the way you wanted initially, your local array must be `static const` instead of just const (so that there's a single copy of it, essentially making it a global variable as well). To be honest, I am not sure why C compiler doesn't optimize it this way automatically, likely I don't know C spec well enough to know the reasons.
Super interesting! Was this C# or something? is there a write-up/mini-blogpost about this somewhere?
This is why Virgil has support for ranges in the language, which are better than slices. They are value types that represent a subset of a larger array. They can also be off-heap, which allows a Range<byte> to safely refer to memory-mapped buffers.
https://github.com/titzer/virgil/blob/master/doc/tutorial/Ra...
For working with arrays elements without bound checks, this is the modern alternative to pointers, without object pinning for GC and "unsafe" keyword: MemoryMarshal. GetArrayDataReference<T>(T[]). This is still totally unsafe, but is "modern safer unsafe" that works with `ref`s and makes friends with System.Runtime.CompileServices.Unfafe.
Funny point: the verbosity of this method and SRCS.Unsafe ones make them look slower vs pointers at subconscious level for me, but they are as fast if not faster to juggle with knifes in C#.
The `fixed` keyword is mostly for fast transient pinning of data. Raw pointers from `fixed` remain handy in some cases, e.g. for alignment when working with AVX, but even this can be done with `ref`s, which can reference an already pinned array from Pinned Object Heap or native memory. Most APIs accept `ref`s and GC continues tracking underlying objects.
See the subtle difference here for common misuse of fixed to get array data pointer: https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...
Spans are great, but sometimes raw `ref`s are a better fit for a task, to get the last bits of performance.
I truly appreciate articles like this. I am using the Umbraco CMS, and have written code to use lower than the recommended requirements to keep the entire system running. While I don't see a use for using a Span<T> yet, I could definitely see it being useful for a website with an enormous amount of content.
I am currently looking into making use of "public readonly record struct" for the models that I create for my views. Of course, I need to performance profile the code versus using standard classes with readonly properties where appropriate, but since most of my code is short-lived for pulling from the CMS to hydrate classes for the views, I'm not sure how much of a benefit I will get. Luckily I'm in a position to work on squeezing as much performance as possible between major projects.
I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget? I have spent years (since 2013) trying to squeeze every ounce of performance from the code, as I work with quite a few smaller businesses, and even the rest of my team are starting to look into Wix or Squarespace, since it doesn't require a "me" to be involved to get a site up and running.
To my credit and/or surprise, I haven't dealt with a breach to my knowledge, and I read logs and am constantly reviewing code as it is my passion (at least working within the confines of the Umbraco CMS, although it isn't my only place of knowledge). I used to work with PHP and CodeIgniter pre-2013 (then Kohana a bit while making the jump from PHP to .NET). I enjoy C#, and feel like I am able to gain quite a bit of performance from it, but if anyone has any ideas for me on how to create even more value from this, I would be extremely interested.
Like others have mentioned, in a CMS like project your bottlenecks are more likely in terms of database and/or caching.
Span<T> , stackalloc and value-structs will matter more when writing heavy data/number crunching scenarios like imageprocessing, games, "AI"/vector queries or things like _implementing_ database engines (see yesterdays discussion on the guys announcing they're using C++ where Rust, Go, Erlang, Java and C# was discussed for comparisons https://news.ycombinator.com/item?id=45389744 ).
I'm often spending my days on writing applications that are reminiscent of CMS workloads and while I sometimes do structs, I've not really bought out my lowlevel optimization skills more than a few times in the past 6 years, 95% of the time it's bad usage of DB's.
For a CMS I'd usually suspect the major bottlenecks to be in the DB queries. Especially when the language is already pretty fast by default like C#.
You really need to measure before going to low level optimizations like this. Odds are in this case that the overhead is in the framework/CMS, and you gain the most by understanding how it works and how to use it better.
Span<T> is really more of an optimization you should pay attention to when you write lower level library code.
Usually CMS performance problems are related to the database, or how rendering components are being used, or wrongly cached.
The two .NET CMS I have experience with, Sitecore and Optimizely, something like Span would hardly bring any improvement, rather check the way their ORM is being used, do some direct SQL, cache some renderings in a different way, cross check if the CMS APIs are being correctly used.
> I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS
This response is not directly answering that "in a .NET CMS" part of your question. I'm just trying to say how to think about when to worry about optimizations.
These sorts of micro optimizations are best considered when your are trying to solve a particular performance problem, particularly when you are dealing with a site that is not getting a lot of hits. I've experienced using small business ecommerce websites where each page load takes 5 seconds and given up trying to buy something. In that case profiling the site and figuring out the problem is very worth while.
When you have a site getting a lot of hits, these sorts of performance optimizations can help you save cost. If your service takes 100 servers to run and you can find some performance tweaks to get down to 75 server, that may be worth the engineering effort.
My recommendation is to use a profiler of some type. Either on your application in aggregate to identify hot spots in in search of the source of a particular performance problem. Once you identify a hot spot, construct a micro benchmark of the problem in BenchmarkDotNet and try to use tools like Span<T> to fix the problem.
For a CMS or any similar situation, you can get huge performance improvements from higher level changes than Span<T>. Using the HTTP cache-control headers correctly in conjunction with a CDN can provide an order of magnitude improvement. Simply sending less HTML/CSS/JS by using a more efficient layout template can similarly have a multiplier effect on the entire site.
In my experience, the biggest wins by far were achieved by using the network tab of the browser F12 tools. The next biggest was Azure Application Insights profiler running in production. Look at the top ten most expensive database queries and tune them to death.
The use of Span<T> and the like is much more important for the authors of shared libraries more than "end users" writing a web app. Speaking of which, you can increase your usage of it by simply updating your NuGet package versions, .NET framework version to 9 or 10, etc... This will provide thousands of such micro optimisations for very little effort!
I'm a C# dev main and love spans.
I understand it's not in the same realm as Rust, but how comparable is this to some of the power that Rust gives you?
I think the main differentiator is this:
C# is GC'd, the system will protect memory in use and while it can allow things like Span<>, Memory<>,etc there are some constraints due to sloppy lifetime reasoning but in general "easy" usage since you do not need to care about lifetime.
Rust has lifetime semantics built down to the core of the language, the compiler will know much better about what's safe but also forbid you early from things that are safe but not provable (they're improving the checkers based on experience though), due to it's knowledge it will be better at handling allocations more exactly.
Personally as someone with an assembly, C,C++,etc background, while I see the allure of Rust and do see it as a plus if less experienced devs that really need perf go for Rust for critical components, and thinking I'm going to try to do some project in Rust...
I've so far not seen a project where the slight performance improvement of Rust will outweight the "productivity" gain from being able to do a bit "sloppy" connections that C# allows for.
This should help:
“A comparison of Rust’s borrow checker to the one in C#”
https://em-tg.github.io/csborrow/
It's a taste. C# allows the more common patterns, but Rust allows much more.
For example, Rust allows the equivalent of storing `Span<T>` (called slice in Rust) everywhere (including on the heap, although this is rare).
C# has a separate heap storable span equivalent called Memory<T>.
AFAIK it's less efficient though.
The restriction in C# comes from its ability to reference stack allocated memory. I'm not familiar with Rust but it probably figures it out based on the lifetime of T.
It's not the lifetime of T, but the lifetime of whatever is storing T.
This is a good example to learn how to use the tools a programming language offers, just saying a programing language has a GC thus bad is meaningless, without understanding what is actually available.
Regarding,
> Spans and slice-like structures in are the future of safe memory operations in modern programming languages.
It is sad how long stuff takes to reach mainstream technology, in Oberon the equivalent declaration to partition would be,
And if the type is the special case of ARRAY OF BYTE (need to import SYSTEM for that), then any type representation can be mapped into a span of bytes.You will find similar capabilities in Cedar, Modula-2+, Modula-3, among several others.
Modern safe memory langaguage are finally catching up with the 1990's research, pity it always takes this much for adoption of cool ideas.
Having said this, I feel modern .NET has all the features that made me like Modula-3 back in the day, even if some are a bit convoluted like inline arrays in structs.
Even on the contrary, once you introduce GC usage into languages without direct support you're basically always running a substandard GC because so many advances(enabled by read/write barriers) aren't really available without language support.
Then people go around shouting GC's are bad because they used them with a language that made them use some of the worst ones out there.
For the op, awesome article. Quick question: what’s the definition of your `swap(array, i, pivotIndex)` function? Am I missing something? Or just assumes it’s the standard set temp to a, set a to b, and set b to temp?
I recall a specific project involving a network appliance that generated large log streams. Our bottleneck was the log parser, which was aggressively using string.Substring() to isolate fields. This approach continuously allocated new string objects on the heap, which led to excessive pressure on the GC.
The transition to using ReadOnlySpan<char> immediately addressed the allocation issue. We were able to represent slices of the incoming buffer without any heap allocations and the parser logic was simplified significantly.
Also works in C: https://uecker.codeberg.page/2025-07-02.html
I don't think that's quite a 1-to-1 match for what's described in the article. Both C#'s Span<T> and your span type are type- and bounds-safe, but the former has additional restrictions placed on its usage thanks to the `ref` keyword that guarantee that it will be free of lifetime errors as well without needing to involve the runtime.
True, that part is different and not safe in C. Lifetime is something that will need new language extensions / annotations.
Anecdote: 9 years ago I was at MSFT. Hands forced by long GC pauses, eventually many teams turned to hand-rolling their flavor of string_view in C#. It was literally xkcd.com/927 back then when you tried to interface with some other team's packages and each side has the same but different string_view classes. Glad to see that finally enjoying language and stdlib support.
(ReadOnly)Span<T> has been available for 8 years now, and even before that, in the legacy Framework, there were common readonly-ref string slicers.
ArraySegment for example.