Raspberry Pi 2 significantly improves on original model
The Raspberry Pi 2 significantly increases performance when compared to the original Raspberry Pi. It corrects deficiencies in the design of the original SoC used inside the Raspberry Pi by integrating four more modern and faster Cortex-A7 ARMv7 CPU cores in a quad-core configuration, as opposed to the single ARM11 core in the original SoC, all within the constraints of a similar 40 nm manufacturing process. Whereas the CPU inside the BCM2835 processor of the original Raspberry Pi effectively ran without a L2 cache (which was tied to the GPU), the new Broadcom BCM2836 SoC contains a dedicated 512 KB CPU cache, improving memory performance and performance in general. The amount of RAM has also doubled to 1 GB. Other changes include more USB ports and a MicroSD card slot for storage instead of SD.
Compatibility with Raspbian
Otherwise the new SoC as well as the device itself has been engineered to maintain hardware and software compatibility with the original Raspberry Pi, while running considerably faster. When using the Raspbian OS, an ARM11 compatible Debian-based distribution using armhf specifically maintained for the Raspberry Pi, only the kernel is specific to the Raspberry Pi 2 with the entire userland being 100% compatible. Although this misses out on some of the advantages of the newer ARMv7 instruction set (such as the reduced code size of Thumb2 instructions, which are used in ARMv7 Debian), applications that can take advantage of, for example, NEON SIMD instructions usually do so on a run-time detection basis (as they do in ARMv7 Debian), so that the most critical gains from the new instruction set can in theory be taken advantage of in Raspbian.
Nevertheless, the new device can run an OS specifically configured for ARMv7, such as Debian armhf and derived distributions such as Ubuntu, which take advantage of the reduced-size Thumb2 instruction set. An example of such a distribution that has been applied to the Raspberry Pi 2 is Ubuntu Snappy Core.
Components of Raspberry Pi 2 SoC clocked conservatively out of the box
The maximum CPU clock of the Cortex-A7 cores in the Raspberry Pi 2 is 900 MHz, while the L2 cache appears to be clocked at only 250 MHz by default, inheriting the clock rate of the original Pi's GPU cache. SDRAM is clocked at 450 MHz by default. The GPU is clocked at 250 MHz, similar the original Raspberry Pi.
The configured speed of the L2 cache is particularly low, as we will see, since speeds up to 600 MHz seem to be stable when overclocking, resulting in a large performance increase. The CPU clock speed can also be bumped up somewhat.
The raspi-config utility in Raspbian at the time of writing contains just one overclocking option for the Raspberry Pi 2, which clocks the CPU at 1000 MHz, doubles L2 cache speed to 500 MHz and clocks SDRAM also at 500 MHz. Unfortunately, this setting turned out to be unstable on my device. This appears to be due to the SDRAM clock speed being set too high and causing problems. Bumping the SDRAM speed down to 483 MHz results in a stable system.
Overclocking test set-up
I have performed a number of overclocking tests with different clock configurations. The test set-up was as follows.
To prevent corruption of the root file system, I modified /etc/fstab to mount the root filesystem read-only at boot by adding "ro" to the mount flags. To remount with read-write capability when necessary after boot (on a stable system), I ran "sudo mount -o remount,rw /dev/mmcblk0p2 /".
The main stability test was performed using the single-threaded memtester package (available in Raspbian and Debian) using the command line "memtester 16M 10" (16 MB memory region, 10 loops). In several cases four of these commands were run in parallel to fully occupy the CPU and provide reliable stability information. In unstable configurations, this test almost always shows errors.
Memory performance was tested using a slightly modified version of the fastarm package (https://www.github.com/hglm/fastarm) with the command line "for x in 0 1 2 3 4 5 6 7 8 9; do ./benchmark --duration 1 --repeat 1 --memcpy e --test 0; done". Because of result variation due to cache allocation effects, I took the best result out of ten. Tests number 0 (memcpy of varying size, aligned, depends on CPU as well as memory) and 43 (4K page-aligned memcpy, a more pure memory subsystem test) were used.
For a real-world CPU performance indication I used the command line "time zcat bullet3-Bullet-2.83-alpha.tar.gz >/dev/null" performed multiple times, which is effectively gzip decompression of a large file out of buffer cache memory.
Table with stability testing results
The following table shows stability testing results for a large number of CPU clock, core clock (L2 cache clock), and SDRAM clock configurations. Also included are some benchmark scores, including memory performance and CPU performance.
The stable configurations show "OK (multi-test)" in the stability column, meaning they were stable during a test with multiple memtester processes running concurrently. Most unstable configurations have an SDRAM clock speed of 500 MHz or higher, or a CPU speed higher than 1100 MHz.
CPU +Volt Core SDRAM +Volt Stability Memcpy perf. p i c (memtester) Varied 4K zcat Default: 900 ? 250 450 0 0 0 OK (slow) 716 1015 2.388s Standard overclock (raspi-config "Pi 2" option): 1000 2 500 500 0 0 0 Fail Other settings: 900 0 450 450 0 0 0 OK 778 1270 2.380s 900 0 600 467 0 0 0 Almost 804 1431 2.379s 900 2 600 467 0 0 0 OK (multi-test) 1000 0 467 467 0 0 0 OK (multi-test) 867 1410 2.146s 1000 0 500 483 0 0 0 OK (multi-test) 880 1502 2.146s 1000 0 500 483 2 0 0 OK (multi-test) 878 1502 2.169s 1000 2 500 500 0 0 0 Almost 1000 4 500 500 0 0 0 Almost 1000 0 500 500 2 2 0 Almost 1000 0 500 500 4 4 0 Almost? 1000 0 500 500 4 0 0 Fail 886 1415 2.143s 1000 2 500 500 4 0 0 Fail 1000 4 500 500 4 4 0 Fail (multi) 1000 0 500 500 6 6 6 ? 1000 2 600 467 0 0 0 OK (multi-test) 885 1518 2.145s 1000 2 600 500 4 0 0 OK (multi-test) 890 1553 2.142s 1000 2 667 500 4 0 0 Fail (freeze) 1000 6 667 500 6 0 0 Fail (freeze) 1050 0 466 466 4 4 4 OK 1050 0 466 533 4 4 4 Fail 1050 0 466 533 6 6 6 Fail (bitspr.) 1050 4 600 450 0 0 0 OK (multi-test) 916 1528 2.045s 1050 4 600 483 2 0 0 OK (multi-test) 924 1571 2.041s 1067 6 533 533 6 6 6 Fail 1067 4 533 533 8 8 0 Fail (bitflip) 1067 6 533 533 8 8 0 Fail (bitflip) 1067 6 533 500 4 4 0 Almost 1067 4 533 466 0 0 0 OK (multi test) 925 1521 2.010s 1100 0 466 466 0 0 0 Fail (boot) 1100 4 466 466 0 0 0 OK? 1100 4 600 467 0 0 0 Fail 1100 4 500 500 6 6 6 OK? 1100 4 500 500 6 6 0 OK? 1100 4 500 500 4 0 0 Almost 1100 4 500 500 6 0 0 OK? 950 1532 1.950s 1100 6 500 500 6 0 0 Almost 1100 4 533 533 6 0 4 Fail 962 1593 1.948s 1100 4 550 483 0 0 0 OK (multi-test) 944 1549 1.951s 1133 4 567 466 0 0 0 Almost 974 1578 1.893s 1133 4 567 467 4 0 0 Almost 1133 5 567 453 0 0 0 Almost 971 1571 1.896s 1133 8 567 453 0 0 0 Fail 1166 4 466 466 0 0 0 Almost 960 1451 1.841s 1167 4 466 466 2 2 4 Fail 1166 6 466 466 0 0 0 Fail 962 1451 1.841s 1167 8 500 500 4 0 0 Fail 1.839s 1167 8 500 500 8 8 8 Fail 1200 8 600 450 4 0 0 Fail
CPU frequency corresponds with the "arm_freq=" setting in /boot/config.txt. The CPU/main SoC voltage is set with over_voltage setting. The core clock (the L2 cache speed on the Raspberry Pi 2) is set with core_freq. The SDRAM frequency is set with sdram_freq, while voltage settings for the SDRAM physical layer, I/O and controller are set using over_voltage_sdram_p, over_voltage_sdram_i and over_voltage_sdram_c, of which the physical layer voltage seems to be the most relevant to overclocking. An example of the relevant lines in /boot/config.txt for a particular overclocking configuration (1000 MHz CPU, with stable 483 MHz SDRAM, as well as 256 MB memory reserved for GPU) follows.
See the official documentation for more details.arm_freq=1000 over_voltage=0 core_freq=500 sdram_freq=483 over_voltage_sdram_p=0 over_voltage_sdram_i=0 over_voltage_sdram_c=0 gpu_mem=256
Observations based on stability testing
The following is apparent from testing my device:
- The core_freq setting seems to be directly correlated with the L2 CPU cache in the new SoC, which has a large effect on performance. Depending on other frequencies, core_freq frequencies up to 600 MHz seem to be stable, giving a significant performance boost over the default configuration of 250 MHz.
- When increasing CPU speed beyond roughly 1000 MHz, the CPU core voltage has to be bumped up.
- Increasing SDRAM speed beyond about 483 MHz seems to cause instability on my device. Bumping up the SDRAM voltage (in particular the physical layer voltage, but not the I/O voltage or SDRAM controller voltage) may help a little for potential stability. However, SDRAM speeds of 500 MHz and higher tend to cause stability problems regardless of voltages on my device.
- Certain divisor relationships between CPU clock and core (L2 cache) clock (such as 2:1) seem to enhance stability and performance.
CPU overclocking conclusions
- The default Raspberry Pi 2 core_freq (L2 CPU cache) setting of 250 MHz appears to be extremely conservative. At the default CPU frequency of 900 MHz, 450 MHz (which has a nice divisor of two) appears to be very stable and even 600 MHz can be stable.
- Unfortunately, the standard Raspberry Pi 2 overclocking setting available in raspi-config at the time of writing (1000 MHz CPU, 500 MHz core clock, 500 MHz SDRAM) appears to be unstable on my device due to a SDRAM clock speed that is slightly too high. Instead of bumping the CPU voltage as performed by this setting, increasing the SDRAM voltage (primarily the physical layer voltage) may improve stability, but clocking the SDRAM slightly lower at 483 or 467 MHz seems to be the best solution.
- It seems likely that certain SDRAM parameters (CAS delay, etc) are set to fixed values by the kernel and that higher SDRAM speeds will be possible when these parameters are configurable or appropriately adjusted by the kernel for higher SDRAM clock speeds. However, the actual RAM chip used is an Elpida/Micron EDB8132B4PB-8D-F LPDDR2-800 chip specified for 400 MHz clock frequency, so the overclocking headroom may not be that high.
Table with stable high-performance clock configurations
The following table shows stable high-performance clock configurations tested on my device and their clock frequency ratios:
However, I may have to retest the configuration with an SDRAM frequency of 500 MHz because other configurations show such a setting to be unstable after extensive testing. Additionally, the 1100 MHz CPU frequency setting turned out not be completely stable.CPU Over- Core Base clock volt clock Clock CPU : Core SDRAM Overv. 1067 +4 533 533 2 : 1 467 1050 +4 600 150 7 : 4 483 +2 1000 +2 600 100 5 : 3 500 +4 1000 500 500 2 : 1 483 +2 900 +2 600 133 3 : 2 467 900 450 450 2 : 1 450
Overclocking the GPU
By default, the Raspberry Pi as well as the Raspberry Pi 2 will use dynamic clocking, whereby the CPU speed, "core_freq" speed and SDRAM frequency are dynamically ajdusted based on CPU load. Any GPU frequency settings, as governed by the "v3d", "h264_freq" and "isp_freq" settings in config.txt, are ignored by default.
Using "force_turbo=1" allows overclocking of the GPU using the "v3d_freq", "h264_freq" and "isp_freq" options. "v3d_freq" corresponds to the frequency of the 3D block (the most relevant for overclocking), while "h264_freq" is the H.264 video block and "isp_freq" governs the camera interface. However, "force_turbo=1" also disables dynamic clocking, locking the CPU, core and SDRAM speeds to fixed maximum values, which is highly undesirable. Also note that using "force_turbo=1" may void the warranty of the device.
There is another setting, "avoid_pwm_pll=1", that allows "core_freq" to be set independently from that of the GPU on the original Raspberry Pi, at the cost of slightly reducing analog audio output quality. However, "force_turbo=1" is still required to be able to modify the GPU clock frequencies.
Because the Raspberry Pi 2 has an independent GPU with its own independent L2 cache seperate from the L2 cache of the CPU, some of these limitations may have become unnecessary (in particular the requirement that the CPU is locked at a high speed with "force_turbo=1" in order to be able to overclock the GPU), and if that is the case these restrictions will hopefully be removed in the future.
When running 3D benchmarks, the following CPU and SDRAM settings were used (note that when using of "force_turbo=1" to overclock the GPU, these frequencies are locked and do not scale down when the CPU is idle):
When running 3D GPU benchmarks without overclocking the GPU (force_turbo=0), it looks like the CPU / L2 cache frequencies are scaled down quickly because the CPU load is relatively low, negatively affecting the throughput of the 3D benchmarks because of a CPU bottleneck, resulting in an initial peak in fps dropping to a lower base. To avoid this, we modify the sampling_down_factor of the ondemand cpufreq governor from 50 to 1000:cpu_freq=900 over_voltage=0 core_freq=450 sdram_freq=483
The following settings overclock the 3D block (V3D) of the GPU from 250 MHz to 300 MHz:sudo sh -c "echo 1000 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor"
These are the results of benchmark testing with different V3D clock speeds:force_turbo=1 avoid_pwm_pll=1 v3d_freq=300
Although the clock frequency of either the CPU or the 3D block seemed to be scaled down in some cases at higher V3D speeds (presumably due to temperature measurements or voltage readings resulting in throttling), there were actually never any signs of stability issues when overclocking the GPU, up the maximum tested speed of 450 MHz. Thev3d_freq demo1 demo1 demo2 demo2 demo2 demo5 demo9 game lights lights shadows default 81.1 20.5 26.1 8.87 0.98 50.5 46.4 112 300 95.3 28.4 9.88 1.12 56.7 49.3 130 350 109* 27.4 29.9 10.9 1.24 62.3 51.6 148 400 120* 30.6 31.4 11.7 1.35 40-52* 53.5 108* 450 80* 33.7 20.2* 12.3 1.45 40-56* 55.0 111*
Regular dynamic downclocking of the CPU can occur due to USB power supply/cable issue
Initially, downclocking by the Raspberry Pi 2 kernel's under-voltage monitor seemed to be triggered a lot of more frequently than it is on the original Raspberry Pi. This results in a rainbow-colored icon being displayed in the top-right corner of the screen. This even happens briefly during boot. At such occasions, presumably the CPU and other components are downclocked in order to ensure stability.
The rainbow-colored square suggests a power supply issue since it indicates a voltage that is too low. As it turns out, replacing the USB power cable I was using with a shorter one that is better insulated eliminates the under-voltage warnings, with the same 5V/2A power supply.
Updated 1 March 2015 (update explanation for CPU speed throttling).
Updated 25 March 2015 (update with USB power cable findings).