SoC Shootout: x86 vs. ARM
For the original German article, see here.
x86 = performance, ARM = low power? Not anymore: While x86 peak performance hasn't seen much of an increase lately, ARM-based SoCs have gained steadily and quickly in the past few years, closing the gap. Four years ago, single-core CPUs with 1.0 GHz were the absolute maximum. Today's CPUs come with four cores, clock speeds of around 2.0 GHz and a drastically improved micro-architecture, yielding 10 times the performance. At the same time though, power consumption has risen too, resulting in throttling issues in smartphones and tablets becoming more and more common.
While ARM chips gain performance increases, ultra-mobile x86 CPUs take the opposite way. Starting with the original Intel Atom CPU way back in 2008, Intel and AMD have managed to construct ever thinner and longer-lasting laptops thanks to increased integration densities and ever smaller power consumption levels. Thus, x86 systems have finally become capable of being used in compact tablets and smartphones.
The end of this development is not hard to see: Instead of existing next to one another as before, ARM and x86 will increasingly have to compete for the same markets. More than enough reason for us to analyze the current state with this article and to even take a glance at possible future trends.
Low-power architecture under the magnifying glass
Both the x86 and the ARM segment offer a number of different architectures for different market segments in terms of crucial criteria such as performance, power consumption and price. IP-core specialist ARM currently offers its customers three main designs (omitting the extreme low-end solution, the Cortex-A5):
The Cortex-A7 is basically a renewed and improved successor to the older Cortex-A8 which has been used in Apple's iPhone 4 (Apple-A4 SoC) among others. Typically, the Cortex-A7 is offered as a dual-core or quad-core chip with clock speeds of 1.0 to 1.5 GHz while still counting - performance-wise - as an entry-level chip. The simple pipeline (just 8 steps, (partly) dual super-scalar) is based on the in-order execution principle, handling all commands one after the other, resulting in a rather lowly performance-to-MHz ratio. The main advantage of the Cortex-A7 is the smallish space requirement (just 0.5 mm² per core when manufactured using the 28 nm process) as well as its low power requirements. This is why the A7 is also being used for ARM's so-called big.LITTLE concept (such as in the Samsung Exynos 5410), where it works on simple tasks or background processes while being combined with much faster, but also much more power-hungry Cortex-A15 cores.
One to two years ago, the Cortex-A9 was an absolute high-end chip. Nowadays, it can only be found in entry-level or mid-tier SoCs. Despite being much larger and being capable for out-of-order task processing, the A9 is usually not much faster than the cheaper Cortex-A7 (when used with the same clock speed). ARM has understood this problem, releasing much-improved variants such as the A9r4 which is supposed to be used in Nvidia's upcoming Tegra 4i. Thanks to clock speeds of up to 2.3 GHz and many detail enhancements these versions of the A9 may well become popular again.
Currently, the Cortex-A15 is definitely one of the most powerful ARM implementations. The fast, threefold super-scalar out-of-order design comes with an improved prediction routine for jumps and powerful multi-level caches, leading to a performance boost of up to 40% when compared to a Cortex-A9 with similar clock speed. The main disadvantage: Very high power consumption, making it difficult to implement the A15 in compact smartphones. Important A15-based chips are Nvidia's Tegra 4 as well as the afore-mentioned Samsung Exynos 5410.
Manufacturers of ARM SoCs do not necessarily have to license the entire core. It is also possible to develop an ARM-compatible design on one's own (currently, the ARMv7-ISA), a route taken by both Qualcomm (with the older Scorpion architecture, somewhere between the Cortex-A8 and the Cortex-A9, as well as the more recent Krait architecture, somewhere between the Cortex-A9 and the Cortex-A15) and Apple (with the Swift architecture, between the Cortex-A9 and the Cortex-A15 as well). These special designs come with a number of advantages: Finding the subjective sweet spot between performance and power consumption, adding additional features, bringing it sooner to the market (in some cases) and paying less licensing fees.
Only two x86-based low-power architectures exist at the time of writing: The Intel Atom and AMD's Jaguar (which is not really well-suited for too compact devices). Still, the Intel Atom with its IPC-weak in-order-design has barely seen any advancement since its introduction in 2008, being far less powerful than the competition from AMD when benchmarked. Still, the Atom is extremely economical - no other modern x86 chip can actually be used in smartphones.
We have chosen to compare 7 different SoCs which have been introduced to the market within the last two years. Some of these are entry-level systems, others are better-suited for delivering high-end performance. Similarly, usage scenarios (smartphone / tablet / laptop) and power consumption vary wildly, something which we will of course consider in our analysis.
With a TDP of 15 watts, the A4-5000 quad-core (28 nm, ~107 mm²) chip clocked at 1.5 GHz is far more power-hungry than the rest of the competition. Still, we want to know how the Jaguar architecture fares when compared to the alternatives made by ARM. Are its performance levels as high as the power consumption levels suggest? How well does the Radeon HD 8330 work? Our test device is a Reference Design Laptop from AMD.
More or less comparable with the competition of ARM in terms of TDP is the Intel Atom Z2670. The dual-core chip (32 nm, 65 mm²) with hyper-threading and a clock speed of 1.8 GHz comes with a PowerVR-SGX545 GPU, being tested with Windows 8 (as is the AMD chip) on the Acer Iconia W3-810 as well as on the Lenovo IdeaTab Lynx.
Finally, it has arrived: Later than expected, the Nvidia Tegra 4 (28 nm, ~80 mm²) has been introduced to the market. The ARM SoC comes with 4 Cortex-A15 cores clocked at 1.8 GHz, an economical companion core and a massively-improved GeForce ULP GPU. So how does the Toshiba Excite Pro fare in comparison with the x86-competition?
The Samsung Galaxy S4 with its S600 quad-core (28 nm, ~80 mm²) clocked at 1.9 GHz and an Adreno-320 GPU is one of the fastest smartphones currently available - although an even faster SoC will soon be introduced: The Snapdragon 800.
As a former high-end chip, the Cortex-A9 quad-core Tegra 3 (40 nm, ~80 mm²) still offers solid performance levels, but it is not fighting for the crown anymore. The SoC used in the Asus Transformer Pad TF300T is clocked at 1.2 to 1.3 GHz (Tegra 3 T30L) while the fastest variant (Tegra 3 T33) reaches up to 1.6 to 1.7 GHz.
Sales of affordable tablets such as the Asus Memo Pad HD 7 are sky-rocketing, not least due to attractive Cortex-A7 chips such as the MT8125 (28 nm, size unknown) made by Mediatek. What performance can be expected from this supposedly weak 1.2 GHz quad-core and its PowerVR-SGX544 GPU? Chiefly the duel with Nvidia's Tegra 3 could become very exciting.
Although just a little more than two years old, the HTC-built Evo 3D feels ancient when compared to the other devices here. The Scorpion dual-core CPU and the Adreno-220 GPU of the Snapdragon S3 (45 nm, size unknown) clocked at 1.2 GHz are included as a reference point for the fast development during the last few years.
One more thing before we finally arrive at the actual measurements: Benchmarks between different devices (and classes of devices) as well as different operating systems are to be taken with a huge grain of salt as the potential for significant errors and deviations is high (e.g. due to different display resolutions, but chiefly due to different compilers and optimizations of the cross-platform benchmarks we used), even when extreme caution is exercised. To ensure maximum comparability, we have performed all browser benchmarks with the most recent version of Google Chrome for Android and Windows respectively. Especially in the case of the more power-hungry SoCs, we have taken care to allow for long periods of rest in between different benchmarks in order to minimize the danger of throttling issues coming into play.
We start by taking a look at the CPU performance for two synthetic cross-platform benchmarks. Especially the Geekbench 2 benchmark leads to extremely theoretical results and thus its importance shouldn't be overstated. Still, the results are very interesting: Both the Tegra 4 and the Snapdragon 600 manage to beat the A4-5000, mainly due to better results in the floating point tests where both ARM SoCs overtake the A4-5000 by 57 and 32% respectively. However, in many real-life applications the integer performance levels are more important. Here, AMD's x86 APU takes the lead.
The Intel Atom Z2760 and the two Cortex-A9 and Cortex-A7 SoCs fall a little bit behind. The most surprising result here is the tie between the two Cortex chips, as the Cortex-A9 design should be strongly superior in theory.
The physics test of the current 3DMark benchmark yields more or less the same results - with the main exception being that the AMD A4-5000 now takes the lead (which feels more realistic than the Geekbench results). Again, the MT8125 offers excellent performance levels, way ahead of Nvidia's Tegra 3. This is probably thanks to its low power consumption, allowing the MT8125 to drive all four cores to their maximum clock speed even under full load.
In contrast to the Geekbench benchmark and the 3DMark physics test, most browser benchmarks only use one or two CPU cores, which help the dual-core Atom Z2760 chip to win three out of five tests (staying ahead of the Snapdragon 600).
It is impressive how far the Cortex-A15-based Tegra 4 SoC stays ahead of the rest of the ARM SoC crowd. In some tests, it even comes close to the performance of an A4-5000. Even when its high clock speed (20% difference) is deducted, this still comes down to an impressive performance-to-MHz ratio for an ARM design. It shouldn't be forgotten though, that the Cortex-A15 is not that far away from AMD's Jaguar cores in terms of power consumption: As can be read in our review of the Toshiba Excite Pro, the Tegra 4 quickly starts to show signs of throttling under full load - there is a reason for the active cooling system deployed in Nvidia's Shield games console (which sports an even higher clocked variant of this chip).
Smartphones and tablets are used more and more often as mobile games consoles, resulting in ever increasing hardware requirements in order to render those detailed 3D graphics. The differences in performance between various SoCs are gigantic: During GLBenchmark and 3DMark (2013) the Qualcomm S600 with its Adreno-320 GPU is 4 to 6 times faster than Mediatek's low-cost MT8125 with a PowerVR-SGX544 GPU. Even more surprising might be the fact that the Tegra 3 and the Intel Atom Z2760 don't manage to beat the MT8125 by much, thus remaining more or less on par with the Adreno 220 of the old Snapdragon S3. With such performance levels, demanding Android games such as Real Racing 3 can only be rendered smoothly with low details.
We also weren't impressed with the performance levels of the Tegra 4. While it reaches the same fps rates as the Snapdragon 600, the latter runs on a mobile phone instead of a tablet (where one might expect a difference due to far higher power consumption levels of the tablet). Still, the Radeon HD 8330 of the AMD A4-5000 beats all of them by at least a factor of ~2. In case of an ultrabook with a Haswell CPU and an HD Graphics 4400 this factor would increase to 3 or 4.
Addition (08/14/2013): Our colleagues from mobilegeeks.de have gotten their hands on a Sony Xperia Z Ultra with Qualcomm's new Snapdragon 800. According to the benchmarks performed by them, the strongly-improved Adreno 330 GPU seems to pull ahead of both its predecessor and the Tegra 4 tablet by a long shot.
We have already talked a little bit about power consumption - which is important, as performance levels can only be classified when put into context with the associated power requirements. Unfortunately, exact TDP values are hard to come by for ARM SoCs (and measurements of the entire system are unreliable due to the large differences between various devices and even classes of devices).
Luckily, our colleagues from Anandtech have managed to determine pretty exact power consumption values for the CPU and GPU parts of a number of SoCs. Using these numbers as well as our own observations, we are trying to establish a few rough classifications.
The first limiting factor in almost every mobile device is the cooling system. Depending on the ambient temperature and the construction of the device, a passive tablet is capable of dissipating around 4 watts constantly while a smartphone manages to dissipate 2.5 to 3 watts (depending on its size). Still, any SoC is capable of much higher rates of power consumption, if only for short amounts of time. Apparently, this is what happens in the case of Nvidia's Tegra 4: Under full load, the maximum CPU clock speed of 1.8 GHz is only upheld for a few minutes at most meaning that this threshold of around 4 watts must have been exceeded. Without throttling (and with the GPU being under full load too), we expect the chip to require at least twice as much power. A similar thing (although to a lesser extent) happens to the Tegra 3 and the Snapdragon 600 as well, forcing them to downclock. We estimate their theoretical maximum power consumption without throttling to come to lie at approximately 6 and 4 watts respectively.
The Intel Atom Z2760 (between 1.7 and 3 watts of TDP, according to different sources) and the Mediatek MT8125 are much less power-hungry. In the case of the MT8125, we estimate the TDP to be around 1.5 watts (its four Cortex-A7 cores should need approximately as much power as one Cortex-A15 core, and the GPU is no power hog either), just one tenth of the fastest SoC, AMD's A4-5000 with a TDP of 15 watts. Another crucial factor: Many ARM SoCs include a number of additional features (such as a modem, Wi-Fi connectivity, a camera ISP...) while the x86 SoCs usually don't.
Since real-life applications and even games rarely require all cores to operate at peak performance levels, the actual power consumption rate should be far less during actual usage, mitigating these throttling issues. In addition, a number of energy saving features begin to play a major role: Nvidia's Companion Core (which has been used since the days of the Tegra 3), Qualcomm's system of clocking each core individually (which changes the required voltage), Samsung's big.LITTLE technology developed by ARM. Especially the latter seems to hold a lot of potential, at least once all of its little issues are dealt with.
Summary and Outlook
It is truly impressive to see how far the mobile ARM SoCs have come in the last few years. Not that long ago, just opening a more or less complex website was a painful experience with every mobile phone available - a task which poses no problems for modern smartphones and tablets, and not only the expensive ones: Affordable SoCs based on the Cortex-A7 architecture such as Mediatek's chips which seem to find the sweet spot between price, performance and power consumption are deservedly successful. The MT8125 quad-core clocked at 1.2 GHz, for example, delivers such a great CPU and GPU performance that it easily stays on par with Nvidia's older and much more power-hungry Tegra 3, even coming close to the x86-based Atom Z2760 when all four cores are under full load.
Talking about the Intel Atom: The underlying architecture is 5 years old - and it is starting to show. Thanks to a number of improvements, the use of an integrated chipset and the migration to a 32 nm process, its energy efficiency has stood the test of time, remaining highly competitive, but its absolute performance doesn't even come close to that of the fastest ARM chips. It seems as though Intel is working hard on reducing this gap: In a number of weeks, the new Silvermont cores are supposed to be introduced to the market, promising a significant performance boost thanks to their out-of-order design and higher clock speeds. A year from now, the successor to Silvermont will arrive, being manufactured with a 14 nm process - in a market which is dominated by the performance-to-watts ratio, this should prove to be an extremely valuable advantage as the AMD chips Mullins and Beema (the successors to Temash and Kabini, being expected for 2014 as well) will probably still be manufactured using a 28 nm process.
Most likely, Intel's strongest competitor will be Qualcomm. The company which used to be relatively unknown until recently is already reporting record profits in the billions, reaching higher market capitalization levels than Intel due to Qualcomm's dominance in the mid-tier and high-end segments of the booming smartphone markets. And while Intel's market share is tiny, the next blow for the inventors of the Pentium lies right ahead: The Qualcomm Snapdragon 800, which is already available in South Korea, is the direct successor to the Snapdragon 600. Clocked at 2.3 GHz, it promises both an impressive increase in terms of CPU and GPU performance and improved energy efficiency, making it viable for use in smartphones while Nvidia's Tegra 4 might have serious throttling issues, despite reaching similar performance levels. Nvidia's answer to these problems might be the upcoming Tegra 4i which also comes with an integrated LTE modem.
Still, all of this is just a snapshot of an extremely fast moving market. Many other SoCs are expected to be released during the next few months, among them not only the Qualcomm Snapdragon 800, but also the new Samsung Exynos 5420, Mediatek's octa-core MT6592 as well as the Tegra 5 (Logan), which has already been announced for the beginning of 2014. Possible game-changers might be the first chips based on the 64-bit ARMv8 architecture, which will likely be released in 2014 or 2015, the Cortex-A53 and Cortex-A57 as successors to the Cortex-A7 and Cortex-A15). These might lead to a whole range of new opportunities: AMD has already announced that it wants to offer ARM-based server CPUs as well. How long it takes until ARM chips will be commonplace in notebooks or desktop computers might only be a question of a few years - and of the operating system: Right now, the ill success of Windows RT is probably one of the major hindrances to an increased expansion into this segment, although this might change with a number of alternatives such as Google's Chrome OS.