Benchmarks should be like a scientific paper: they should describe all the choices made and why for the configurations. At least that will show if the people doing it really understand what they’re comparing.
Have you ever read a paper? You can consider yourself lucky if they have error bars and repeated their measurements more than once. The quality of “benchmarking papers” is comically bad (on average).
I don’t know and I won’t pretend it does have any statistical significance. I will just say that I have read dozens of papers and anecdotally, the results were questionable in almost all cases. And not because of the possibility that they might have missed something, but because of basic shortcomings. Some don’t even state how often they repeated their experiments, software versions, whether they accounted for caching effects, (system) temperature, hardware characteristics, you name it.
That’s why I wouldn’t name papers a prime example for clean benchmarking. The quality on YT news outlets like Gamers Nexus or Hardware Unboxed is higher than most of them by far.
Yeah I have to second UnfortunateShort. Benchmarking papers are on average very bad, often because they’re trying to push a particular idea or product and are very biased, or because they’re like “my first benchmark” and done by people who don’t know what they’re doing.
A classic one that gets referenced a lot is “Energy Efficiency Across Programming Languages” I which the authors seriously benchmarked programs from the very heavily gamed Computer Language Benchmarks Game, and concluded among other things that JavaScript is much more energy efficient than Typescript.
The only realistic way to benchmark different languages is to take implementations that weren’t written to be fast in a benchmark. For example Rosetta Code, or maybe leetcode.com solutions.
Or to do it yourself. But that requires you to be experienced in many languages.
Benchmarks should be like a scientific paper: they should describe all the choices made and why for the configurations. At least that will show if the people doing it really understand what they’re comparing.
Have you ever read a paper? You can consider yourself lucky if they have error bars and repeated their measurements more than once. The quality of “benchmarking papers” is comically bad (on average).
What’s the error bar on that statement of yours?
I don’t know and I won’t pretend it does have any statistical significance. I will just say that I have read dozens of papers and anecdotally, the results were questionable in almost all cases. And not because of the possibility that they might have missed something, but because of basic shortcomings. Some don’t even state how often they repeated their experiments, software versions, whether they accounted for caching effects, (system) temperature, hardware characteristics, you name it.
That’s why I wouldn’t name papers a prime example for clean benchmarking. The quality on YT news outlets like Gamers Nexus or Hardware Unboxed is higher than most of them by far.
Yeah I have to second UnfortunateShort. Benchmarking papers are on average very bad, often because they’re trying to push a particular idea or product and are very biased, or because they’re like “my first benchmark” and done by people who don’t know what they’re doing.
A classic one that gets referenced a lot is “Energy Efficiency Across Programming Languages” I which the authors seriously benchmarked programs from the very heavily gamed Computer Language Benchmarks Game, and concluded among other things that JavaScript is much more energy efficient than Typescript.
The only realistic way to benchmark different languages is to take implementations that weren’t written to be fast in a benchmark. For example Rosetta Code, or maybe leetcode.com solutions.
Or to do it yourself. But that requires you to be experienced in many languages.
Difficult for obvious reasons.