Wir wir bereits berichteten, gibt es massive Änderungen in den Single-Thread-Benchmark-Charts von Passmark, die zu einem kompletten Wechsel an der Spitze der Benchmarks von AMD in Richtung Intel geführt haben. (siehe: Passmark: CPU-Benchmarks zugunsten von Intel manipuliert?)
Wir haben bei Passmark nachgefragt, wie diese großen Verwerfungen zustande kommen und dazu ein dann auch ein Feedback erhalten.
Bereits am 7.3.2020 hat Passmark zu den durchgeführten Änderungen ein Statement im eigenen Forum veröffentlicht, das man uns zugeschickt hat. So wie es aussieht, ist man sich des Problems bei Passmark also durchaus schon bewusst und wird eventuell noch weitere Änderungen vornehmen. Die aktuellen Ergebniss könnten sich in den nächsten Tagen also noch einmal ändern, weil der komplette Benchmark aktualisiert wurde. Das betrifft nicht nur den Benchmark selbst, sondern auch die Änderung des genutzen Compilers und libraries, die ebenfalls Einfluss auf die Ergebnisse haben.
We released a new version of PerformanceTest a few days ago, version 10.
Yesterday we started to switch over the graphs on the web site to start to use results from PerformanceTest V10 (PT10)
For the single threaded result there were huge differences, but we are going to fix that up in a couple of days.
For the CPUMark result there is more of a dilemma.
We collected millions of benchmark results (baselines) that people sent in over the last few years from PerformanceTest V9 & V8. With millions of results we were able to get a pretty accurate average for each CPU model.
But for PerformanceTest V10 we did really major changes to the CPU test algorithms. These changes included
- Using new CPU instructions (e.g. AVX512) only available in modern CPUs.
- Use a more up to date compiler (Visual Studio 2019 instead of 2013) which also brings some code optimization.
- Have better support for out of order execution, which is a feature of newer CPUs.
- Updated the 3rd libraries we use for some of the tests (including more modern versions of GZip, Crypto++ and Bullet Physics.
- Fixed up a bunch of bugs that hurt performance (like some variable alignment issues and compiler optimization flags).
- Completely rewrote some of the tests. e.g. removed TwoFish encryption and replaced it with the more common Elliptic curve encryption.
- Improving the algorithms to push more data through the CPU also results in more load on the cache and memory subsystem. So older CPUs, those with inadequate cache or memory bandwidth are expected not to perform so well with PT10.
So the new individual PT10 results can't at least on the surface be compared to the PT9 results. They are really different. Probably the biggest algorithm overhaul in 20 years.
BUT for the CPUMark value, which is a combination result derived from the results of all the individual tests, we scaled it back to PT9 levels. So the PT10 CPUMark is somewhat comparable to the PT9 CPUMark.
Obviously we want to start using the PT10 results on our graphs. But if we wait until we have a million PT10 results, that might take a year. And in the meantime we have no results for any new CPUs on PT9, as noboby will be using PT9 anymore.
So the solution we selected (the least worst solution from the collection bad solutions) was to take all the average CPUMark values from PT9 (one value per CPU model) and then start averaging that with all the new PT10 results as they come in. So what this means, especially for the first few weeks is a lot of volatility as the graphs slowly move to reflect more of the PT10 result and less of the PT9 results. Initial PT10 results have a big impact, but each additional PT10 result has less impact as a new average is found.
If anyone notices any really extreme moves, let us know, we can manually fix them up until the average gets better.
Also if you want to look at the old V9 results, you can find them here
https://www.cpubenchmark.net/pt9_cpu_list.php
Also another effect of a new database being used for PerformanceTest V10 is the the percentile figures are going to change.
We had V8 and V9 running collecting results for many years. So a new modern machine will compare very well against the average machine from the last 5 years.
PerformanceTest V10 has only been collecting results for a week. So even a brand new machine will look kind of average against all the other relatively new machines that have been submitted in the last week.
Here is a screen shot from the same machine running PerformanceTest V9 and V10.
You can see that the percentile figures are down across the board, as now with PT10 you will be comparing your machine against a newer more powerful group of machines.