Many cores, high performance: AMD's Threadripper in the first tests
For the original German article, see here.
With the Threadripper models, AMD Ryzen adapts to extremely demanding users. With 12 and 16 cores respectively, multi-threading, and a 32 MByte L3 cache, the 1920X and 1950X models are not targeted towards the usual PC users and definitely not at video gamers.
As so-called HEDT processors (high-end desktop), potential buyers are for example programmers, image and video processing professionals, and analysts of medium amounts of data. The basic requirement for a meaningful use of the Threadripper CPUs is being able to run the corresponding programs in parallel to a high degree, which is not necessary a given in video games.
Accordingly, the competition corresponds to the intended use and price: In the direct comparison, the Threadripper models are measured against the Skylake X, in particular against the Intel Core i9-7900X costing 1000 Euros (~$1180) which we already tested.
Threadripper 1950X in Detail
In this article, we would like to look in particular at the top model in form of the 1950X. Basically, a Threadripper is made from two Ryzen 7 dies and two dummies, which results in a total of 16 active cores in the case of the 1950X. They each have a 512-KByte L2 cache available. The L3-Cache is 16 Megabytes per die, so altogether there are 32 Megabytes.
The basic clock speed of the Ryzen Threadripper 1950X is 3.4 GHz, with the maximum clock speed at 4.2 GHz (with XFR). If full load is put on more than 4 cores, the Turbo clock speed lies at 3.7 GHz. However, then the power consumption as well as the cooling have to play along.
The quasi doubling of a Ryzen 7 CPU is also accompanied by an extremely high amount of PCIe 3.0 lanes. Of the 64 lanes, all together 60 are available, which offers a high flexibility. Thus for example triple GPUs are possible, which can additionally be combined with multiple NMVe SSDs or RAID controllers. For comparison, the Core i9-7900X is equipped with 44 lanes, the Core i7-7740X with barely 16.
One particularity: The new TR4 socket is very large, which results in a currently still low availability of fitting cooling systems among other things. At least according to tests, the processor can also be cooled only partially covered and it can even be overclocked. In order to avoid damaging any of the 4096 pins, it can be mounted with the help of a rail, and a screwdriver with a fixed rpm limitation is included, which is recommendable.
Threadripper Characteristics: Latency, NUMA and UMA
Connecting two more or less independent dies is neither trivial nor free from disadvantages and can have a negative effect on the communication latencies of multiple cores. With that, the Threadripper models also benefit from fast working memory, as the PC Perspective benchmarks confirm. We can distinguish four different latency steps: first for the communication within a core, then within a CCX (each CCX consists of 4 cores), within one die, and finally between the two dies. At 250 nanoseconds, the latency between two dies is more than 10 times as high as within a core (thread to thread), and with this it is on the level of dual socket systems. For the communication of two cores in different CCXs, about 143 nanoseconds are necessary. The competing Skylake-X Core i9-7900X needs barely more than 100 nanoseconds in the worst case.
As mentioned, PC Perspective was able to confirm that a faster RAM results in a significant improvement of the latency. With a RAM frequency of 3200 MHz, this is improved by 14% (CCX to CCX) and 23% (die to die) respectively. Although the Threadripper officially also only supports a maximum of DDR4-2666, in practice a higher clock speed should rarely present a problem. However, in the case of applications that strongly bet on a low latency (such as video games), the structure of the Threadripper can indeed lead to performance losses.
Apparently AMD is aware of the problem and offers two different operating modes and one compatibility mode. To users of multi-socket systems, UMA and NUMA might mean something. To simplify, NUMA (Non-Uniform Memory Access) tries to keep the storage latency as small as possible, while UMA aims for an overall bandwidth as high as possible. Basically, in typical HEDT applications UMA is probably the more sensible choice. However, according to the initial tests, the correct choice of the mode in non-HEDT applications (mostly games) is connected with some trial and error. In addition, as a crowbar AMD offers the so-called Legacy Compatibility Mode, which simply deactivates half of the cores. According to AMD, this should make sens in some (particular) games. According to Computerbase, the performance can be increased by up to 12%. The settings can be adjusted via the Ryzen Master desktop software, so a too elaborate configuration is not necessary.
Top 10 Smartphones
Smartphones, Phablets, ≤5-inch, Camera SmartphonesNotebookcheck's Top 10 Smartphones under 160 Euros
Before analyzing multiple benchmarks, we would like to point out again that ultimately gaming benchmarks are a bit unsuitable on an HEDT plattform, since optimization of even modern video games for multiple cores is questionable. Moreover, the current AMD as well as Intel models offer more than enough performance to drive every game with a realistic combination of resolution and detail level reliably up to the limitation of the graphics card(s) even in multi-GPU systems. The targeted professional buyer segment of the Threadripper should bet on optimized applications for parallel operation in particular, such as virtualizations.
Synthetic benchmarks allow a quick appraisal of the performance in various application situations. In some way, the results (in good benchmarks) show a kind of optimal upper limit, which can only be achieved by actual applications in optimal situations.
Without a doubt, one of the most well-known benchmark programs is CineBench, which quantifies the performance capabilities by rendering more than 300,000 polygons with one or all threads. According to PC Perspective, the 1950X only manages 161 points in the single-thread benchmark, remaining clearly behind the Intel competition. Even a Core i5-7600K is significantly faster at 179 points, and the Core i9-7900X finally achieves almost 200 points, being 25% faster.
As expected, the result changes completely if all the cores are used. With a score of 3008, the 1950X outclasses all the Intel desktop CPUs, being a quarter faster than the Core i9-7900X (2186 points). Compared to the Ryzen 7 1800X, the performance capabilities lie at 186%. Due to the similar Turbo clock speed on all cores, in a first approximation the (purely mathematical) difference of 14% could be considered a performance loss based on the architecture.
Encryption and PDF Performance
Decryption and encryption are scenarios that in practice are also not irrelevant for the average user. Here, some possible disadvantages due to a less than ideal implementation on the software side already become visible. So we could call them half-synthetic benchmarks. When using the AES algorithm (Rijndael), the Threadripper 1950X achieves 19.1 GBytes per second according to AnandTech. This is 6.8 GByte/s more than the Core i9-7900X. Compared to the Ryzen 7 1800X, the 1950X encrypts around 93% faster, so it scales excellently.
In the practically relevant opening of a (very large) PDF document with Adobe Reader DC, the Ryzen models do quite badly. The Core i7-7740X, which merely needs 2.2 seconds for opening it, fares the best. The 1950X needs almost 3 seconds, and with this is even (slightly) slower than the Ryzen 7 1800X. The direct Threadripper competition in form of the Core i9-7900X is slightly faster at 2.5 seconds.
In practice, encoding does not only occur during professional video processing, but also during packing and unpacking of compressed archives, which can become a game of patience with a weak CPU. Naturally, it depends on the quality of the software used in this realistic benchmark as well, but in general the speed of encoding benefits from a fast clock speed to a high degree.
In the test, AnandTech uses 7-Zip as well as WinRar, where the Threadripper 1950X fares best. Even though the Core i9-7900X packs a little faster in 7-Zip, while unpacking it only achieves 66% of the performance of the Threadripper. In packing with WinRAR 5.40, the 1950X fares only slightly better than the competition, which might be caused by the bad multi-threading support of the program, though.
When converting video files via Handbrake, the picture is more nuanced and the concrete results are very dependent on the selected settings. For example, during the LQ conversion of a video (H264, 640x266 pixels), the 1950X only shows mediocre results and is surpassed by the Core i9-7900X by about 16%. On the other hand, with high detail settings, the 1950X is slightly faster, and when using the HEVC instead of the H264 codec, the results of the Threadripper and the i9-7900X are practically identical.
As earlier mentioned, the Threadripper is simply not a Gaming CPU and is not targeted for video gamers, for whom the higher latencies are more of a disadvantage and a too high amount of threads can even negatively effect the overall performance in the extreme case. According to AMD and media reports, there are also some rare examples of games that will refuse to run at all on the great Threadripper, for example "F1 2016" and "Far Cry Primal." In such cases, at least the Legacy Compatibility Mode can be a remedy.
As a result, the Threadripper has much more trouble to merely keep up with the Intel competition in the gaming benchmarks. For example, according to Tom's Hardware, already in the delivery state the distance in "GTA V" is considerable. While the Core i9-7900X can deliver 92.8 frames per second on average, the 1950X manages only 83.1 frames which corresponds to a lag of about 10%. The frame rate stability is also slightly less. On average, the performance of the 1950X remains very similar to its basis, the Ryzen 7 1800X, when the correct settings are selected and technical problems are not leading to any non-typical performance losses. Here we point out again, that Hyperthreading can also have a negative influence on the gaming performance.
In addition to the demanding gaming benchmarks, Tom's Hardware also uses professional workloads, which should be the closest to the actual usage areas of the Threadrippers. For example, Solidworks 2015 is a widely used CAD tool also used for the construction of 3D models and to simulate particular processes, such as a crash test. This is based on solving complex differential equations in particular. The benchmark shows that Solidworks for example scales only badly with a high thread count. Even a Core i7-7700K clearly beats the 1950X.
The example of Blender, a free 3D graphics suite, shows that if in doubt, other software is optimized differently or better. Here the 1950X acts more than two and a half times as fast as the i7-7700K, and the advantage to the Core i9-7900X is a respectable 26%. Other benchmarks also confirm that not only does the Threadripper 1950X have potentially high performance reserves, but it can also use them with the right software.
In contrast to the Core i9-7900X, which is plagued by thermal problems in particular situations due to the bad heat passage from the actual CPU to the heat spreader, AMD generally solders all the Ryzen derivatives and so also the Threadripper. Nonetheless, the Threadripper is also not free of startup problems in terms of cooling. A completely fitting cooling system is not (yet) available for the new processor. Although AMD currently includes some sort of adapter for All-in-One water-cooling systems by Asete, the heat spreader is only partially covered by this. Currently you cannot mount any air-cooling systems yet.
Astonishingly, this does not create any problems for the corresponding testers, and even under full load in the standard core speed, the core temperature remains consistently clearly below 70°C (158 °F). This leaves room for overclocking that should not be underestimated, and even with the crude AiO cooling solution, Computerbase achieved already 4 GHz on all 16 cores. However, then the power consumption of the whole system jumped from 265 up to 400 watts. Compared to the Core i9-7900X, the efficiency of the 1950X is considerably higher, at least when looking at CineBench and applications that are optimized for parallel processing.
By the way, like Ryzen, the Threadripper also works with a thermal offset of now 27 Kelvin. To express it positively, AMD apparently has a strong interest in an efficient cooling and the long life of its own product. To express it negatively, the company uses a sort-of dirty trick to improve the cooling situation. However, neither for Ryzen nor the Threadripper, this is really problematic, since the rotational speed curve of the fan can be adjusted in general.
With the Threadripper 1950X, AMD has finally shown that the scalability of the Ryzen reaches up to the HEDT area. Although the two combined Ryzen-7 dies have certain disadvantages in the latencies, at the end of the day they offer the better performance and with that also a much better price performance ratio for applications that are optimized for many threads. Compared to the Core i9-7900X for practically the same price, there is a performance advantage of 25% in CineBench, which can also be used in that magnitude by optimized applications.
Professional users will probably buy a new system with concrete ideas in terms of the necessary programs. If the program can handle many threads well, the recommendation can only be the Threadripper. In this case, the higher efficiency is a positive side effect. The EEC-RAM support can be a must-have for some particular applications. With Intel, this is reserved for Xeon processors.
In our opinion, the Threadripper models are simply too much of a good thing for end users. Even half a Threadripper in form of a Ryzen 7 offers more than enough performance at a far more affordable price, in particular since the performance of the 1950X can be lower at times than that of the processors for end users when unfavorable storage settings (UMA/NUMA) are selected.