Improving The Exynos 9810 Galaxy S9: Part 1

Last week we published our Galaxy S9 and S9+ review, and the biggest story (besides the device) was the difference between the processors used inside the two variants: the Snapdragon 845 SoC and Exynos 9810 SoC. Based on earlier announcements, we had large expectations from the Exynos 9810 as it promised to be the first “very large core” Android SoC. However, our initial testing put a lot of questions on the table. Unfortunately the synthetic performance of the Exynos 9810 did not translate well at all into real life performance. As part of the testing, I had discovered that the device’s scheduler and DVFS (dynamic voltage-frequency scaling) configurations were atrociously tuned.

For the full review I had opted not to go modify the device, as that initial review was aimed as a consumer oriented article, and it would serve little purpose to readers (that and time constraints). Still I promised I would follow up in the coming weeks and this is the first part of what’s hopefully a series where I try to extract as much as possible out of the Exynos 9810 and alleviate its driver situation.

After rooting the device and getting a custom kernel up and running, one of the first things I did was to play around with the hotplugging mechanism that I explained in the review article was what is in my opinion the most damaging to performance. 

One of the benefits the custom kernel is pushing the frequencies, and so after a bit of fun playing around a bit I achieved GeekBench multi-core records at 4x 2496MHz on the M3 cores. Trying to run the M3 cores at 2704 MHz consistently crashed the phone, so I decided it’s better to start at the bottom and work our way up the performance scale.

Configuration Overview And SPEC

Samsung Galaxy S9 (E9810)
Kernel Comparison and Changelog
VersionChanges and Notes
Official FirmwareAs Shipped- Stock setup and behavior
- Single Core M3 at 2704 MHz
- Dual Core M3 at 2314 MHz
- Quad Core M3 at 1794 MHz
'CPU Limited
Mode'
- Optional Samsung-defined CPU Mode in Settings
- CPU limited to 1469 MHz
- Memory controller at half-speed
- Conservative Scheduler
Custom Config 1- Start with 'As Shipped' Firmware
- Remove hotplugging mechanism
- Limit M3 frequency peak to 1794MHz at any loading

For the first round of testing, I judged the hotplugging mechanism as a whole is better off to be completely disabled, as it just doesn’t manage to serve its purpose. The frequency for the M3 cores was set fixed at 1794 MHz, the same frequency as the shipping kernel gives when all four M3 cores are in play. Technically, based on frequency alone, this should theoretically result in the same multi-threaded performance, and worse single core performance.

And here’s where the interesting things start. At 1794MHz the Exynos M3 performed extremely well and ended up above my expectations in synthetic benchmarks. Starting off with SPEC:

 

In the Galaxy S9 review article I had made the hypothesis that for the equivalent peak performance of the Snapdragon 845, the Exynos 9810 would lose in efficiency. In the review, I extrapolated the energy increase with as the frequency rises using the CPU Limiter figures at 1469MHz. Based on those first results, I suggested that the Exynos 9810 would need to be at 2.1-2.3GHz to match the Snapdragon 845 in SPEC.

To my surprise, the results at 1794MHz closely matched the Snapdragon 845’s peak figures within a couple of percentage points. Not only did the 9810 match the 845 in performance, but the efficiency metrics were also beating expectations. In SPECint, the Exynos now matches the energy usage of the Snapdragon. In SPECfp, the Exynos even now retains a 15% efficiency lead.

This time I included back a second graph for SPEC – I’m still conflicted to show the data here this way, as the metric is 'work per time per energy' or 'performance per energy' and I’m still struggling to come up with an immediately intuitive explanation as what is being showcased, I ended up calling it 'Performance Efficiency' as the most fitting term. Readers here need to carefully disambiguate the term “efficiency” between total energy usage (representative of battery usage) and this “performance efficiency” metric which is more ethereal in its meaning. 

How this difference between the three setups is happening is something that I at the moment attribute to three factors: the removal of the hotplugging mechanism removes any ST performance noise from other threads, the M3 cores are being better utilized at the lower frequencies, and we’re just seeing some gains in thermal limited workloads such as libquantum.

The second reason is I think the most important here as the M3 cores are better served by the conservative memory controller DVFS. This is extremely noticeable in the SPECfp results, as the 1794 MHz scores end up with a 'SPECspeed per GHz' ratio (a measure of performance efficiency) of 12.37 - this is compared to a ratio of 9.94 for the stock kernel runs. This translates to a 24% increase in measured IPC efficiency. The SPECint results also improve that performance/GHz ratio, rising from 8.05 to 9.53, correlating to an 18% increase. 

In the earlier SPECspeed/Joules (Performance Efficiency) graph, we see how the 1794MHz results fare better than the stock and CPU limiter results. This difference isn’t displayed in the total energy or perf/W metrics so there is credit to showcasing the results this way. 

Overall, with these current best-case results, the M3 cores offer a 51% IPC advantage over the Kryo 385 Gold cores in the Snapdragon. This is a lot nearer to what we expected from the core based on the initial announcements and microarchitecture.

System Performance

I argued in the Galaxy S9 review that the real-world performance of the Exynos variant was hampered by software and that workloads never really got to make use of the raw performance. Let’s quickly revisit the benchmarks and see what the results of the first changes in the kernel running at 1794 MHz.

PCMark Work 2.0 - Web Browsing 2.0

PCMark’s Web Browsing 2.0 test sees a significant 22% boost in performance over the stock configuration. Again I argued that the hotplugging mechanism was in many cases extremely detrimental to real-world performance as its coarse nature simply did not manage to efficiently control the finer-grained behaviour of threads in many workloads. The result here speaks for itself, as even though the new custom kernel significantly limits raw single threaded CPU performance compared to stock, the web test just does so much better.

PCMark Work 2.0 - Video Editing

The video editing test saw a small performance degradation here, I will try to investigate this more later on.

PCMark Work 2.0 - Photo Editing 2.0

Photo editing test saw an increase in performance.

PCMark Work 2.0 - Writing 2.0

PCMark Work 2.0 - Data Manipulation

The Writing 2.0 and Data Manipulation tests saw degradations, with the latter one seeing the biggest downside of the frequency limiting. I’m extremely confident that I’ll be able to recoup these degradations with further tuning as I haven’t yet touched the scheduler and DVFS behaviours of the phone.

Speedometer 2.0 - OS WebView WebXPRT 3 - OS WebView

Finally, in the web tests the performance of this first custom configuration again provides a better score than the stock behaviour, which is ironic, as web workloads are meant to be the one scenario where the ST performance is meant to manifest itself the most. I’ll refer back to my “The Mobile CPU Core-Count Debate: Analyzing The Real World” from a few years back where we saw that this actually might not be the case and today’s browsers are actually more multi-threaded as one might think.

The issue again with the Exynos 9810’s current mechanism in the S9 is that it’s trying too hard, and biasing too much towards ST performance at the cost of MT performance. As a result, as our results show, the whole thing falls down like a house of cards.

Battery Life Regained

Finally, the most important metric is to see how the S9 fares in terms of battery life under these settings.

Web Browsing Battery Life 2016 (WiFi)

The Exynos S9 was able to improve by 20% and regain 1h23m in our web browsing test which is a good improvement given the low effort. Given the fact that the changes made with the this first custom kernel were, in the vast majority of scenarios, favourable for web browsing performance, this feels to me like another nail to the coffin towards Samsung’s mechanisms. The higher frequencies on the M3 CPU as provided in the shipping kernel simply aren’t beneficial in real-world scenarios, as the software just doesn’t (and likely just can’t) make proper use of them.

Today’s results are just meant as a quick preview of the Exynos S9’s behaviour when removing what I believed the most offending parts of Samsung’s mechanisms. The results are very promising as we are essentially getting real-world performance boosts while at the same time seeing efficiency improvements and battery life improvements. 

There is still a lot to be done - I haven’t really touched the scheduler or DVFS scaling logic, but I’m confident that it’s possible to improve things further. It’s unlikely that we’ll end up matching the Snapdragon 845 variant in battery life, as there are efficiency curve considerations at lower performance points that simply cannot be changed through software. But closing the gap as much as possible seems to be an attainable short-term goal.

We have not yet had official commentary from Samsung on the situation yet, but the likelihood that we’ll see significant changes introduced through firmware updates seem slim to me. For now, unless you’re one of the very few enthusiast people into modifying your mobile devices, then the above results are relatively meaningless for a consumer and should not be used for purchasing guidance, for which I refer back to my conclusions in the S9 and S9+ review.

Original carousel image credit to Terry King

Related Reading

ncG1vNJzZmivp6x7orrAp5utnZOde6S7zGiqoaenZH5zgpBuZqKloKe8t7XNoGSesKmjvLR5mHFoaWWXlrmixNhmqnJloJa%2FtXmQ