NVIDIA has disclosed some architectural details on our Project Denver CPU, their first custom-designed CPU. Earlier this year, NVIDIA announced two versions of our Tegra K1 processor, both with the 192-core Kepler-based GPU - one with a 32-bit quad-core CPU with battery saver core, and the custom-designed, 64-bit, dual-core "Project Denver" CPU, which is fully ARMv8 architecture compatible. With its exceptional performance and superior energy efficiency, the 64-bit Tegra K1 is the world's first 64-bit ARM processor for Android, and outpaces other ARM-based mobile processors. Here's what's been disclosed by NVIDIA:
"Denver is designed for the highest single-core CPU throughput, and also delivers industry-leading dual-core performance. Each of the two Denver cores implements a 7-way superscalar microarchitecture (up to 7 concurrent micro-ops can be executed per clock), and includes a 128KB 4-way L1 instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization, which optimizes frequently used software routines at runtime into dense, highly tuned microcode-equivalent routines. These are stored in a dedicated, 128MB main-memory-based optimization cache. After being read into the instruction cache, the optimized micro-ops are executed, re-fetched and executed from the instruction cache as long as needed and capacity allows.
Effectively, this reduces the need to re-optimize the software routines. Instead of using hardware to extract the instruction-level parallelism (ILP) inherent in the code, Denver extracts the ILP once via software techniques, and then executes those routines repeatedly, thus amortizing the cost of ILP extraction over the many execution instances.
As part of the Dynamic Code Optimization process, Denver looks across a window of hundreds of instructions and unrolls loops, renames registers, removes unused instructions, and reorders the code in various ways for optimal speed. This effectively doubles the performance of the base-level hardware through the conversion of ARM code to highly optimized microcode routines and increases the execution energy efficiency.
The slight overhead of the dynamic optimization process is outweighed by the performance gains of already having optimized code ready to execute. In cases where code may not be frequently reused, Denver can process those ARM instructions directly without going through the dynamic optimization process, delivering the best of both worlds!
Dynamic Code Optimization works with all standard ARM-based applications, requiring no customization from developers, and without added power consumption versus other ARM mobile processors. That's because the 7-wide superscalar design allows faster throughput than would otherwise be possible at the same clock speed.
Denver's remarkable design delivers great performance for both single- and multi-threaded applications, as well as multitasking scenarios. The dual-CPU cores can attain significantly higher performance than existing four- to eight-core mobile CPUs on most mobile workloads.
Denver also features new low latency power-state transitions, in addition to extensive power-gating and dynamic voltage and clock scaling based on workloads. Combining Dynamic Code Optimization, 7-way superscalar design and efficient power usage, Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.
This means that future mobile devices using our 64-bit Tegra K1 chip can offer PC-class performance for standard apps, extended battery life and the best web browsing experience - all while opening new possibilities for gaming, content creation and enterprise apps.
Look forward later this year to some amazing mobile devices based on the 64-bit Tegra K1 from our partners. And for hard-core Android fans, take note that we're already developing the next version of Android - "L" - on the 64-bit Tegra K1."