You can get another 56% speedup on the reserve()+emplace_back() version just by replacing (Widget widget) with (const Widget& widget) in the function declaration [0], more of a difference than any of the loops. As well as a further 39% speedup by moving transformed_data into repsonse.data instead of copying it [1]. My takeaway is that C++ makes it extremely easy to copy data around unnecessarily. Also, you should always profile your code for easy wins before making drastic changes.
This also isn't the way C++20 wants us to work with ranges and views, either -- this is basically just std::transform as you'd write it 20 years ago, except with a container as the argument instead of your begin / end iterators (so, basically absl::c_transform).
The difference is in your file you still have the `Widget widget` parameter, whereas your faster version has the `const Widget& widget` parameter. That's what matters.
ETA: there is a slight advantage for your function body using GNU libstdc++, but there is no advantage when using LLVM libc++. Under libc++ all three are exactly as fast, and faster across the board than any of the GNU results.
The author's pull requests sound exhausting! Opening a PRs to accommodate a "pet peeve" is a mark of inexperience. Does it solve a problem or fix a bug? Doesn't look like it. It doesn't even obviously improve performance. I just see a gratuitously new abstraction that will decrease readability for the rest of the team.
As a team you're going to drown in C++ if you let everyone add their favourite coding conventions, standards or libraries to the code - it's the worst language for that sort of sprawl.
Yes, this is the sort of thing that I'd reject. Why are we churning code (assuming it's working) to potentially introduce stylistic inconsistencies without some existing motivation, especially as a new person on the team who's apparently attempting to have a code style conversation via PR?
It's not even that I disagree with his premise, in the abstract, and if he were including this as part of a larger change in that area of code, I could see it being reasonable. But as a "I don't like the way this code looks so I'm going to rewrite it" PR, no.
Motivation: new people can read the code ane figure out what the loop is doing. I've seen too many loops that seemed to be some standard cs101 algorthym but something was subtilly differet and so it took a long time to figure it out. I was sometimes the person who wrote the original - 10+ years ago.
plus I'm now getting old enough to realize early retirement might be possible. That means I need to make sure new people understand my code so I'm not called back inside from my (whatever my next life is)
just rewriting it because isn't good. However rewriting it in a more expressive way is better.
I had a boss call me a tumbleweed dev (when I was much younger). I would clean up bits, but this would mean the QA had to re-test code that was already tested and in production. I never forgot that term, and now if it an't broke, I don't touch it. (even if I don't like reading it)
In isolation, sure writing code more expressively isn't a bad thing, and might be fine as an exercise for becoming more familiar with some part of the codebase, but I don't think it's a good use of time for the author or reviewers to go and do this without some other motivation or if you're touching the code already.
I had a job interview where I said almost exactly this.. I was shown a bit of code and asked how I'd improve it. I said I wouldn't touch it. When asked why not I answered. "I'd assume the dev who wrote it did so for a reason, and that it's been through QA. If they picked this approach I'd leave it alone. Unless there was a bug." I got the job.
My personal style has to take a back seat to the style of the team (even if I'm the team lead) It is better to get used to not using a turnery, if that's how the team works. Changing a standard on an older code base is silly because then you'll have a different standard in different parts of the code.
I let people do this (and mildly encourage them to) all the time, if their pet peeve makes sense to enough of the (small) team. Often, hard-won experience manifests as getting annoyed by trivial-seeming things!
I think I agree that, in projects dominated by C++ or similar languages, I would be less positive about it, just because of how complicated they can get.
It isnft clear what they are but if he replaces a loop with an algorithm (that implements the loop) it makes it easier to understand the code because the name of the algorithm is clearer than the loop. Normally the speed is the same.
The first code block (with the raw loop) also has an unnecessary copy in the form of `response.data = transformed_data;` -- that should be std::move(transformed_data), or else you're copying every element in the vector.
That further eliminates the copies outside of the loop. At that point, you basically have created every element just once, with no moves or copies.
Meanwhile, the ranges approach incurs a move followed by a copy for every element. It's not obvious to me how to eliminate those.
So what exactly is the difference? Hard to parse the assembly with so many C++-isms, but I'm gonna guess that while the raw loop version is able to std::move transformed_data into response.data, the views version must traverse the view and copy the dumb way. There are some delete calls in the latter version that don't appear in the first one. I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view. After all, the only reason there is a vector there to begin with is that the raw-loop version needed somewhere to store the result.
> I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view.
That would be catastrophic, since the view is only storing a bunch of references to the (Widget widget) argument that gets deleted once the function returns. In general, you have to be wary regarding how long references are valid for.
Anyway, I'm not entirely sure what the purpose of the transformed_data vector is. If you're already going to put it into response.data, just clear it out and start adding elements directly. Or if that's infeasible, at least move it into response.data instead of copying it.
That widget argument should really be const Widget&. Then my statement above stands. I was going under the assumption that the entire response is some function of the input, and the input outlives everything else.
> I'm not entirely sure what the purpose of the transformed_data vector is.
I suppose that in the real application, they want something like 'response = f(g(h(...(input)...))', where f . g . h is some non-trivial composition of a bunch of view transforms. So then it kinda makes sense. Basically, construct a lazy list, then eval everything at the end and store the result somewhere.
If so, then that 'f . g . h' composition might compile to something faster than the equivalent of 3 loops and vectors for intermediate results, since the compiler can fuse the whole thing into a tight loop. They might have made the benchmark misleading by constructing a too-minimal example.
You can get another 56% speedup on the reserve()+emplace_back() version just by replacing (Widget widget) with (const Widget& widget) in the function declaration [0], more of a difference than any of the loops. As well as a further 39% speedup by moving transformed_data into repsonse.data instead of copying it [1]. My takeaway is that C++ makes it extremely easy to copy data around unnecessarily. Also, you should always profile your code for easy wins before making drastic changes.
[0] https://quick-bench.com/q/1Lcfho3hZW2DVgse3xkn82hZsFg
[1] https://quick-bench.com/q/r0DxLEPK68VhBPect3qAhkNIVnA
In this degenerate example, the completely straightforward way is tied for fastest as well. It's clear this has nothing to do with ranges.
> the completely straightforward way is tied for fastest as well.
Is it, though? back_inserter should replicate push_back(), which incurs a move at the very minimum. The approach with emplace_back() does not.
https://quick-bench.com/q/1zwemArsS0gtKwveR9FZsuRwT34
This also isn't the way C++20 wants us to work with ranges and views, either -- this is basically just std::transform as you'd write it 20 years ago, except with a container as the argument instead of your begin / end iterators (so, basically absl::c_transform).
The difference is in your file you still have the `Widget widget` parameter, whereas your faster version has the `const Widget& widget` parameter. That's what matters.
ETA: there is a slight advantage for your function body using GNU libstdc++, but there is no advantage when using LLVM libc++. Under libc++ all three are exactly as fast, and faster across the board than any of the GNU results.
https://quick-bench.com/q/Ot1AK-q7D0CNVDeHrKCRtJ-dmA0
The author's pull requests sound exhausting! Opening a PRs to accommodate a "pet peeve" is a mark of inexperience. Does it solve a problem or fix a bug? Doesn't look like it. It doesn't even obviously improve performance. I just see a gratuitously new abstraction that will decrease readability for the rest of the team.
As a team you're going to drown in C++ if you let everyone add their favourite coding conventions, standards or libraries to the code - it's the worst language for that sort of sprawl.
Yes, this is the sort of thing that I'd reject. Why are we churning code (assuming it's working) to potentially introduce stylistic inconsistencies without some existing motivation, especially as a new person on the team who's apparently attempting to have a code style conversation via PR?
It's not even that I disagree with his premise, in the abstract, and if he were including this as part of a larger change in that area of code, I could see it being reasonable. But as a "I don't like the way this code looks so I'm going to rewrite it" PR, no.
Motivation: new people can read the code ane figure out what the loop is doing. I've seen too many loops that seemed to be some standard cs101 algorthym but something was subtilly differet and so it took a long time to figure it out. I was sometimes the person who wrote the original - 10+ years ago.
plus I'm now getting old enough to realize early retirement might be possible. That means I need to make sure new people understand my code so I'm not called back inside from my (whatever my next life is)
just rewriting it because isn't good. However rewriting it in a more expressive way is better.
I had a boss call me a tumbleweed dev (when I was much younger). I would clean up bits, but this would mean the QA had to re-test code that was already tested and in production. I never forgot that term, and now if it an't broke, I don't touch it. (even if I don't like reading it)
In isolation, sure writing code more expressively isn't a bad thing, and might be fine as an exercise for becoming more familiar with some part of the codebase, but I don't think it's a good use of time for the author or reviewers to go and do this without some other motivation or if you're touching the code already.
I had a job interview where I said almost exactly this.. I was shown a bit of code and asked how I'd improve it. I said I wouldn't touch it. When asked why not I answered. "I'd assume the dev who wrote it did so for a reason, and that it's been through QA. If they picked this approach I'd leave it alone. Unless there was a bug." I got the job. My personal style has to take a back seat to the style of the team (even if I'm the team lead) It is better to get used to not using a turnery, if that's how the team works. Changing a standard on an older code base is silly because then you'll have a different standard in different parts of the code.
I let people do this (and mildly encourage them to) all the time, if their pet peeve makes sense to enough of the (small) team. Often, hard-won experience manifests as getting annoyed by trivial-seeming things!
I think I agree that, in projects dominated by C++ or similar languages, I would be less positive about it, just because of how complicated they can get.
It isnft clear what they are but if he replaces a loop with an algorithm (that implements the loop) it makes it easier to understand the code because the name of the algorithm is clearer than the loop. Normally the speed is the same.
The first code block (with the raw loop) also has an unnecessary copy in the form of `response.data = transformed_data;` -- that should be std::move(transformed_data), or else you're copying every element in the vector.
That further eliminates the copies outside of the loop. At that point, you basically have created every element just once, with no moves or copies.
Meanwhile, the ranges approach incurs a move followed by a copy for every element. It's not obvious to me how to eliminate those.
So what exactly is the difference? Hard to parse the assembly with so many C++-isms, but I'm gonna guess that while the raw loop version is able to std::move transformed_data into response.data, the views version must traverse the view and copy the dumb way. There are some delete calls in the latter version that don't appear in the first one. I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view. After all, the only reason there is a vector there to begin with is that the raw-loop version needed somewhere to store the result.
> I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view.
That would be catastrophic, since the view is only storing a bunch of references to the (Widget widget) argument that gets deleted once the function returns. In general, you have to be wary regarding how long references are valid for.
Anyway, I'm not entirely sure what the purpose of the transformed_data vector is. If you're already going to put it into response.data, just clear it out and start adding elements directly. Or if that's infeasible, at least move it into response.data instead of copying it.
That widget argument should really be const Widget&. Then my statement above stands. I was going under the assumption that the entire response is some function of the input, and the input outlives everything else.
> I'm not entirely sure what the purpose of the transformed_data vector is.
I suppose that in the real application, they want something like 'response = f(g(h(...(input)...))', where f . g . h is some non-trivial composition of a bunch of view transforms. So then it kinda makes sense. Basically, construct a lazy list, then eval everything at the end and store the result somewhere.
If so, then that 'f . g . h' composition might compile to something faster than the equivalent of 3 loops and vectors for intermediate results, since the compiler can fuse the whole thing into a tight loop. They might have made the benchmark misleading by constructing a too-minimal example.