I was wondering why we didn't see anything from ImgTec late last year. They had usually released their newest uarch in november or so each year, so now we know why, an iterative expansion. My guess is that the 7 series is plenty powerful for most mobiles going forward and Apple still has room to experiment - or perhaps they've taken the leap altogether and finally got their own in-house GPU and ready to show it off with the iPhone 7.
The S820 benchmarks showed the Adreno 530 to be more or less on equal standing with the 7 series, so that gives them some time to refine even more for 8 series, which will probably be a significant jump, later this year.
The Adreno 530 is just a bit quicker than GT7600 on iPhone 6s, but still far from the unknown GPU (still waiting for Anand review) inside the iPad Pro.
The non S releases haven't brought major architecture changes in a while. 7 will have new externals, upclocked and tweaked internals. The S releases are where major architecture changes come.
I suspect we probably will see a GT7600 Plus in the Apple A10, not unlike the A7 -> A8 which switched from the G6430 to the GX6450 (both 4 cluster designs). Since the focus for the iPhone 7 will be a new design, if past non S cycle iPhones is any indication the internals will mostly be conservative tweaks from what is found in the 6s, which will include an A10 which likely offers only modest improvements over the A9.
While i do agree that iPhone S-es are much more agressive with chip upgrades, iPhone 6 was limited by so called "poor" 20nm process while the next gen iPhone will supposedly go from dual-sourcing 1'st gen FInFets to single source 16nm FF+ which could leave quite a bit of room to optimize.
Good points, there is probably some additional gains to be made from not having to optimize for 2 processes as well as working with a more mature 2nd gen process from TSMC. That said I still think performance improvements will be modest (closer to the A7 to A8 improvement) rather than aggressive (like the A8 to A9, or the A6 to A7 improvement). I've been wrong before though!
But there are lots of ways to improve without requiring substantial GPU architecture changes. Apple could go to 8 cores for A10 and 16 cores for A10X. And/or they could clock the GPU higher on a more optimized process (especially for 2D compositing, now that they have a more optimized pipe for that). It's also possible that, as they move to fully HSA-compliant hardware, we'll see them announce an interesting framework for exploiting that --- something more modern and easier to use than OpenCL. (Or maybe just OpenCL will become a lot more practical on smaller kernels if the cost of moving data between CPU and GPU is a lot less.)
But yeah, I agree that we should expect (on CPU and GPU side) an A7 to A8 level of transition --- a nice boost, but not performance doubling.
Why? It doesn't support the latest tech, nor is it fast enough to replace the GPUs in modern CPUs. At best it does the same work for a bit less power, which is negligible on a desktop. More likely it is providing worse performance for a bit less power than a desktop architecture. Not sure why you would put a Smartphone GPU isn't a desktop addin card, but do you dude.
We have reached the point where no more features are required for AAA games, unlike the Direct X 7 - 10 era. The mobile Open GL ES cherry pick features we have on desktop for performance / watt / quality reason.
GPU's today are limited by 3 things, Memory Bandwidth, Processing Node and Drivers Quality. Memory Bandwidth is the easiest one, you can put an 512 bit Memory controller and call it a day with GDDR5. You still have to option for GDDR5x. It is merely an cost issues. Now that we have HBM and coming HBM2. It is unlikely memory bandwidth will be an issues for a few more years down the road.
The only reason why AMD Polaris has an jump in performance / watt is purely because of the jump in 28nm to 14nm FinFET. You can mix and match, optimize the GCN only so much.
Then there is the biggest and hardest problem in town, Drivers. Drivers for GPU has gotten so god damn complex even the GPU makers sometime fail to understand how they even got to where they are. Optimization and shortcut are placed every where trying squeeze every ounce of performance for special paths, or specific engines. It is the sole reason why we have got back to Metal, Mantel, Vulkan and even next gen Direct X.
There is no reason why a top end Smartphone GPU Arch like the Series &XT Plus here cant work on a laptop. Given the same memory bandwidth allowance and die space. GPU are hugely parallel unit, you will need some specific design for 32, 64 or even 128 Cluster. And they will perform on par with those from AMD and Nvidia.
Do you have a link to the verilog files for the Polaris gpu? Obviously you have access to it since you're able to state, definitively, where all their efficiency gains are coming from.
You can't just slap GDDR5 on a mobile chip and suddenly have desktop grade graphics. The entire chip architecture is designed around a low bandwidth low power use case. With desktop GPU memory bandwidth there would have to be an architecture overhaul.
@djgandy: "You can't just slap GDDR5 on a mobile chip and suddenly have desktop grade graphics. The entire chip architecture is designed around a low bandwidth low power use case."
Not only that, but it is more challenging to keep the GPU busy as the chip gets larger. Significant die space has to go into making sure the correct signals arrive at the correct time on chips approaching 600mm2. Just distributing clocks across the chip is a challenging undertaking. All this extra circuitry takes power and reduces efficiency compared to a smaller chip (everything else equal), but that drop in efficiency is necessary if performance is a priority. Some architectural considerations are made based on the effective routing delay. For instance, it makes more sense to put more logic in between registers, increasing the IPCs, but reducing the clock rate, when you have larger routing delays to cover up (often found in larger chips). If all routing delays are small, it may be better to reduce logic between registers, lowering IPCs, but allowing clock rates to ramp up or voltages to be dropped at the same clock rate.
I agree. GPUs for desktop class are quite complex and require a significant investment with a possibility to get a low profit or total failure. That's why IMG has focused their products inside the mobile space. I also think that PowerVr architecture becomes meaningless when you apply GDDR memory or HBM, they just lose the advantage of efficient bandwidth (due title approach) comparing to other competitors that use high bandwidth memory.
@iwod: "GPU's today are limited by 3 things, Memory Bandwidth, Processing Node and Drivers Quality."
I'm going to add in size, power, and thermal constraints. Even if you are on the last gen processing node, you can achieve better performance if your application allows for a chip that is four times the size, 10 times the thermal envelop, and 100 times the power draw. I'm going to assume you considered this and didn't type it out.
@iwod: "We have reached the point where no more features are required for AAA games, unlike the Direct X 7 - 10 era. The mobile Open GL ES cherry pick features we have on desktop for performance / watt / quality reason."
OpenGL ES is a subset of OpenGL. For simplicity, we will assume OpenGL is equal, but different to DirectX. DirectX is the baseline for most PC and XBox games. Your comment suggests that either nobody who makes AAA games uses the features that are not in common with OpenGL ES, or that use of these features is not beneficial. The first is quantifiably false. I'm not a game dev, but I'm skeptical that the difference in detail and graphical immersion between phones/tablets and console/PCs is entirely due to framerate capabilities.
@iwod: "The only reason why AMD Polaris has an jump in performance / watt is purely because of the jump in 28nm to 14nm FinFET."
Clearly there is nothing to be gain from architectural tweeks. It all comes down to the process node. After all past history shows that all 28nm processors had the same performance per watt. No improvement at all in the last three generations of GPUs. Certainly no difference between vendors to suggest that there are architectural improvements to be had that might improve efficiency. Naturally, phones and tablets are just as inefficient as laptop GPUs on the same node, which are themselves no better than their desktop GPU counterparts. WAIT WHAT?!?
As a complete GPU, taking on Nvidia and AMD directly? I kind of doubt it. I wouldn't mind seeing them try, however. I remember my Kyro cards fondly.
But it would be neat to see them build a Raytracing accelerator, if it was cheap enough and drew very little power. Unfortunately then you've got the chicken and egg problem... engine support vs userbase.
When the mythical ARM Mac arrives ... (Maybe at 10nm we'll see an A11P [three cores, 3GHz] and an A11D [6 cores, 3.5GHz] for running Macs:-) One day it will happen. Why not in 2017 or 2018? :-) )
I don't think anyone is. Intel uses ImgTec in the Phone targeted Atom's, but pretty much everyone else uses Mali, and then there is Qualcomm using their own Adreno.
The g6430 used in the Atom Z3580 is plenty fast for anything out there in the android universe. It matches 330 in most metrics and even outperforms it.
The 330 is the 'gold standard' for smartphone GPUs and is still good enough to handle anything and everything.
@LiverpoolFC5903: "The g6430 used in the Atom Z3580 is plenty fast for anything out there in the android universe."
Doesn't mean I don't want faster. Frankly speaking, nothing I've played on android has been all that impressive to me. Also, the Z3580 is not limited to just android.
"The 330 is the 'gold standard' for smartphone GPUs and is still good enough to handle anything and everything. "
I could argue the merit of the statement, but I'll simply ask how long you want it to remain your "gold standard". Put a different way, how long do you want smartphone applications to be limited to the performance levels of the 330?
That is because Android developers have to be cognizant of the fact that 90% plus android devices are low/lower mid range with low end processors/gpus. In order to maximize sales/downloads, they have to develop for the lowest common denominator or face miniscule adoption levels.
I would love smartphone games to be as good as their console and PC cousins, but the fact of the matter is very few users (relatively speaking) have access to cutting edge hardware. The Adreno 320 rev2 in the SD 600 came out like 3 years ago, but can run almost every game even now (about 85 gflops).
I wonder what is the reason for relatively low adoption of Imagination tec GPUs among Android smartphone vendors. Difficulties in integration perhaps? Cost?
I think they are better options than the standard Mali XXX used in most non Qualcomm chipsets. My wife's Zenfone 2 is a potent gaming machine, even though it has a GPU which is 2 years old at least.
My bet (and speculation) is that since Apple is one of major shareholder of IMG has priority and exclusive of high spec PowerVr GPU then IMG cannot license same products to Android Markets. Or at least not in the same year. So for example if Apple incorporate series 7xt this year in the iPhones we will see the same in the android phone one or two years later.
"Pragmatically speaking I’m not sure how much computer vision work that phone SoCs will actually be subjected to – it’s still a field looking for its killer apps"
Thats a pretty narrow view to take on the subject. ARM SoC's are in millions of disparate devices, from phones to industrial controllers and people desiging IP blocks like this don't strictly design them for one application. Obviously phone GPU is one application, but machine vision is huge in anything where automation is involved. One simply has to look at the developing field of autonomous cars to see why improvements to machine vision (and compute) are important.
"Pragmatically speaking I’m not sure how much computer vision work that phone SoCs will actually be subjected to – it’s still a field looking for its killer apps"
Part of the problem is a simple design issue --- and I'm curious to see who will be first to fix it. Right now to use your phone for vision (or vision-like-stuff, like AR) requires that you hold it vertical so that the camera can see, and this is neither comfortable nor easy to use. You can make it look cool in demos, but try it personally, and it sucks!
One answer would be to add a third camera at the very top of the phone (ie the strip at the top where the iPhone 5 used to have its power button), perhaps using a fisheye lens. This would allow you to use the phone with its face mostly parallel to the ground (ie standard usage orientation) while still looking in front of you. Things like this are less sexy than designing the GPUs and algorithms for computer vision, but are just as essential for success.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
35 Comments
Back to Article
Mondozai - Wednesday, January 6, 2016 - link
I was wondering why we didn't see anything from ImgTec late last year. They had usually released their newest uarch in november or so each year, so now we know why, an iterative expansion. My guess is that the 7 series is plenty powerful for most mobiles going forward and Apple still has room to experiment - or perhaps they've taken the leap altogether and finally got their own in-house GPU and ready to show it off with the iPhone 7.The S820 benchmarks showed the Adreno 530 to be more or less on equal standing with the 7 series, so that gives them some time to refine even more for 8 series, which will probably be a significant jump, later this year.
lucam - Wednesday, January 6, 2016 - link
The Adreno 530 is just a bit quicker than GT7600 on iPhone 6s, but still far from the unknown GPU (still waiting for Anand review) inside the iPad Pro.osxandwindows - Wednesday, January 6, 2016 - link
This is probably not the GPU for the iPhone 7.tipoo - Wednesday, January 6, 2016 - link
The non S releases haven't brought major architecture changes in a while. 7 will have new externals, upclocked and tweaked internals. The S releases are where major architecture changes come.bodonnell - Wednesday, January 6, 2016 - link
I suspect we probably will see a GT7600 Plus in the Apple A10, not unlike the A7 -> A8 which switched from the G6430 to the GX6450 (both 4 cluster designs). Since the focus for the iPhone 7 will be a new design, if past non S cycle iPhones is any indication the internals will mostly be conservative tweaks from what is found in the 6s, which will include an A10 which likely offers only modest improvements over the A9.GC2:CS - Wednesday, January 6, 2016 - link
While i do agree that iPhone S-es are much more agressive with chip upgrades, iPhone 6 was limited by so called "poor" 20nm process while the next gen iPhone will supposedly go from dual-sourcing 1'st gen FInFets to single source 16nm FF+ which could leave quite a bit of room to optimize.bodonnell - Wednesday, January 6, 2016 - link
Good points, there is probably some additional gains to be made from not having to optimize for 2 processes as well as working with a more mature 2nd gen process from TSMC. That said I still think performance improvements will be modest (closer to the A7 to A8 improvement) rather than aggressive (like the A8 to A9, or the A6 to A7 improvement). I've been wrong before though!name99 - Saturday, January 30, 2016 - link
But there are lots of ways to improve without requiring substantial GPU architecture changes.Apple could go to 8 cores for A10 and 16 cores for A10X. And/or they could clock the GPU higher on a more optimized process (especially for 2D compositing, now that they have a more optimized pipe for that).
It's also possible that, as they move to fully HSA-compliant hardware, we'll see them announce an interesting framework for exploiting that --- something more modern and easier to use than OpenCL. (Or maybe just OpenCL will become a lot more practical on smaller kernels if the cost of moving data between CPU and GPU is a lot less.)
But yeah, I agree that we should expect (on CPU and GPU side) an A7 to A8 level of transition --- a nice boost, but not performance doubling.
iwod - Wednesday, January 6, 2016 - link
Still waiting for this to arrive on Desktop.blaktron - Wednesday, January 6, 2016 - link
Why? It doesn't support the latest tech, nor is it fast enough to replace the GPUs in modern CPUs. At best it does the same work for a bit less power, which is negligible on a desktop. More likely it is providing worse performance for a bit less power than a desktop architecture. Not sure why you would put a Smartphone GPU isn't a desktop addin card, but do you dude.iwod - Thursday, January 7, 2016 - link
We have reached the point where no more features are required for AAA games, unlike the Direct X 7 - 10 era. The mobile Open GL ES cherry pick features we have on desktop for performance / watt / quality reason.GPU's today are limited by 3 things, Memory Bandwidth, Processing Node and Drivers Quality.
Memory Bandwidth is the easiest one, you can put an 512 bit Memory controller and call it a day with GDDR5. You still have to option for GDDR5x. It is merely an cost issues. Now that we have HBM and coming HBM2. It is unlikely memory bandwidth will be an issues for a few more years down the road.
The only reason why AMD Polaris has an jump in performance / watt is purely because of the jump in 28nm to 14nm FinFET. You can mix and match, optimize the GCN only so much.
Then there is the biggest and hardest problem in town, Drivers. Drivers for GPU has gotten so god damn complex even the GPU makers sometime fail to understand how they even got to where they are. Optimization and shortcut are placed every where trying squeeze every ounce of performance for special paths, or specific engines. It is the sole reason why we have got back to Metal, Mantel, Vulkan and even next gen Direct X.
There is no reason why a top end Smartphone GPU Arch like the Series &XT Plus here cant work on a laptop. Given the same memory bandwidth allowance and die space. GPU are hugely parallel unit, you will need some specific design for 32, 64 or even 128 Cluster. And they will perform on par with those from AMD and Nvidia.
tuxRoller - Thursday, January 7, 2016 - link
Do you have a link to the verilog files for the Polaris gpu? Obviously you have access to it since you're able to state, definitively, where all their efficiency gains are coming from.djgandy - Thursday, January 7, 2016 - link
You can't just slap GDDR5 on a mobile chip and suddenly have desktop grade graphics. The entire chip architecture is designed around a low bandwidth low power use case. With desktop GPU memory bandwidth there would have to be an architecture overhaul.BurntMyBacon - Thursday, January 7, 2016 - link
@djgandy: "You can't just slap GDDR5 on a mobile chip and suddenly have desktop grade graphics. The entire chip architecture is designed around a low bandwidth low power use case."Not only that, but it is more challenging to keep the GPU busy as the chip gets larger. Significant die space has to go into making sure the correct signals arrive at the correct time on chips approaching 600mm2. Just distributing clocks across the chip is a challenging undertaking. All this extra circuitry takes power and reduces efficiency compared to a smaller chip (everything else equal), but that drop in efficiency is necessary if performance is a priority. Some architectural considerations are made based on the effective routing delay. For instance, it makes more sense to put more logic in between registers, increasing the IPCs, but reducing the clock rate, when you have larger routing delays to cover up (often found in larger chips). If all routing delays are small, it may be better to reduce logic between registers, lowering IPCs, but allowing clock rates to ramp up or voltages to be dropped at the same clock rate.
lucam - Saturday, January 9, 2016 - link
I agree. GPUs for desktop class are quite complex and require a significant investment with a possibility to get a low profit or total failure. That's why IMG has focused their products inside the mobile space.I also think that PowerVr architecture becomes meaningless when you apply GDDR memory or HBM, they just lose the advantage of efficient bandwidth (due title approach) comparing to other competitors that use high bandwidth memory.
name99 - Saturday, January 30, 2016 - link
Of course high bandwidth lower power RAM like HMC and HBM changes the equation...No-one is talking about doing this in a GDDR world.
BurntMyBacon - Thursday, January 7, 2016 - link
@iwod: "GPU's today are limited by 3 things, Memory Bandwidth, Processing Node and Drivers Quality."I'm going to add in size, power, and thermal constraints. Even if you are on the last gen processing node, you can achieve better performance if your application allows for a chip that is four times the size, 10 times the thermal envelop, and 100 times the power draw. I'm going to assume you considered this and didn't type it out.
@iwod: "We have reached the point where no more features are required for AAA games, unlike the Direct X 7 - 10 era. The mobile Open GL ES cherry pick features we have on desktop for performance / watt / quality reason."
OpenGL ES is a subset of OpenGL. For simplicity, we will assume OpenGL is equal, but different to DirectX. DirectX is the baseline for most PC and XBox games. Your comment suggests that either nobody who makes AAA games uses the features that are not in common with OpenGL ES, or that use of these features is not beneficial. The first is quantifiably false. I'm not a game dev, but I'm skeptical that the difference in detail and graphical immersion between phones/tablets and console/PCs is entirely due to framerate capabilities.
@iwod: "The only reason why AMD Polaris has an jump in performance / watt is purely because of the jump in 28nm to 14nm FinFET."
Clearly there is nothing to be gain from architectural tweeks. It all comes down to the process node. After all past history shows that all 28nm processors had the same performance per watt. No improvement at all in the last three generations of GPUs. Certainly no difference between vendors to suggest that there are architectural improvements to be had that might improve efficiency. Naturally, phones and tablets are just as inefficient as laptop GPUs on the same node, which are themselves no better than their desktop GPU counterparts. WAIT WHAT?!?
babadivad - Wednesday, January 6, 2016 - link
You forgot the S/domboy - Thursday, January 7, 2016 - link
A resurrection of the PowerVR Kyro series perhaps??Alexvrb - Friday, January 8, 2016 - link
As a complete GPU, taking on Nvidia and AMD directly? I kind of doubt it. I wouldn't mind seeing them try, however. I remember my Kyro cards fondly.But it would be neat to see them build a Raytracing accelerator, if it was cheap enough and drew very little power. Unfortunately then you've got the chicken and egg problem... engine support vs userbase.
name99 - Saturday, January 30, 2016 - link
When the mythical ARM Mac arrives ...(Maybe at 10nm we'll see an A11P [three cores, 3GHz] and an A11D [6 cores, 3.5GHz] for running Macs:-) One day it will happen. Why not in 2017 or 2018? :-) )
ToTTenTranz - Wednesday, January 6, 2016 - link
7th paragraph:"Though it should also be noted that Rouge has far fewer integer ALUs than FP ALUs"
"Rouge" would indeed be a far classier name for the architecture, but you probably meant "Rogue".
hyno111 - Wednesday, January 6, 2016 - link
I suddenly forgot which company is using high-end ImgTec GPU on Android...lucam - Wednesday, January 6, 2016 - link
Good question!!extide - Wednesday, January 6, 2016 - link
I don't think anyone is. Intel uses ImgTec in the Phone targeted Atom's, but pretty much everyone else uses Mali, and then there is Qualcomm using their own Adreno.extide - Wednesday, January 6, 2016 - link
I forgot to mention, the ImgTec designs that Intel uses in the phone-Atom's are typically not large implementations either, which is kind of a shame.lucam - Wednesday, January 6, 2016 - link
Totally agree! Franky I would like to see more high end solutions from IMG implemented into android phones/tablets this year...LiverpoolFC5903 - Thursday, January 7, 2016 - link
The g6430 used in the Atom Z3580 is plenty fast for anything out there in the android universe. It matches 330 in most metrics and even outperforms it.The 330 is the 'gold standard' for smartphone GPUs and is still good enough to handle anything and everything.
BurntMyBacon - Thursday, January 7, 2016 - link
@LiverpoolFC5903: "The g6430 used in the Atom Z3580 is plenty fast for anything out there in the android universe."Doesn't mean I don't want faster. Frankly speaking, nothing I've played on android has been all that impressive to me. Also, the Z3580 is not limited to just android.
"The 330 is the 'gold standard' for smartphone GPUs and is still good enough to handle anything and everything. "
I could argue the merit of the statement, but I'll simply ask how long you want it to remain your "gold standard". Put a different way, how long do you want smartphone applications to be limited to the performance levels of the 330?
LiverpoolFC5903 - Friday, January 8, 2016 - link
That is because Android developers have to be cognizant of the fact that 90% plus android devices are low/lower mid range with low end processors/gpus. In order to maximize sales/downloads, they have to develop for the lowest common denominator or face miniscule adoption levels.I would love smartphone games to be as good as their console and PC cousins, but the fact of the matter is very few users (relatively speaking) have access to cutting edge hardware. The Adreno 320 rev2 in the SD 600 came out like 3 years ago, but can run almost every game even now (about 85 gflops).
lucam - Thursday, January 7, 2016 - link
Life is moving on...we can't stick with a G6430 forever (although is stil a good GPU), we would like to see some 7XT in Android market this year :)LiverpoolFC5903 - Friday, January 8, 2016 - link
I wonder what is the reason for relatively low adoption of Imagination tec GPUs among Android smartphone vendors. Difficulties in integration perhaps? Cost?I think they are better options than the standard Mali XXX used in most non Qualcomm chipsets. My wife's Zenfone 2 is a potent gaming machine, even though it has a GPU which is 2 years old at least.
lucam - Saturday, January 9, 2016 - link
My bet (and speculation) is that since Apple is one of major shareholder of IMG has priority and exclusive of high spec PowerVr GPU then IMG cannot license same products to Android Markets. Or at least not in the same year.So for example if Apple incorporate series 7xt this year in the iPhones we will see the same in the android phone one or two years later.
owan - Thursday, January 7, 2016 - link
"Pragmatically speaking I’m not sure how much computer vision work that phone SoCs will actually be subjected to – it’s still a field looking for its killer apps"Thats a pretty narrow view to take on the subject. ARM SoC's are in millions of disparate devices, from phones to industrial controllers and people desiging IP blocks like this don't strictly design them for one application. Obviously phone GPU is one application, but machine vision is huge in anything where automation is involved. One simply has to look at the developing field of autonomous cars to see why improvements to machine vision (and compute) are important.
name99 - Saturday, January 30, 2016 - link
"Pragmatically speaking I’m not sure how much computer vision work that phone SoCs will actually be subjected to – it’s still a field looking for its killer apps"Part of the problem is a simple design issue --- and I'm curious to see who will be first to fix it.
Right now to use your phone for vision (or vision-like-stuff, like AR) requires that you hold it vertical so that the camera can see, and this is neither comfortable nor easy to use. You can make it look cool in demos, but try it personally, and it sucks!
One answer would be to add a third camera at the very top of the phone (ie the strip at the top where the iPhone 5 used to have its power button), perhaps using a fisheye lens. This would allow you to use the phone with its face mostly parallel to the ground (ie standard usage orientation) while still looking in front of you.
Things like this are less sexy than designing the GPUs and algorithms for computer vision, but are just as essential for success.