| FFmpeg & evaluating performance on the PowerPC Architecture HOWTO |
| |
| (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org> |
| |
| |
| |
| I - Introduction |
| |
| The PowerPC architecture and its SIMD extension AltiVec offer some |
| interesting tools to evaluate performance and improve the code. |
| This document tries to explain how to use those tools with FFmpeg. |
| |
| The architecture itself offers two ways to evaluate the performance of |
| a given piece of code: |
| |
| 1) The Time Base Registers (TBL) |
| 2) The Performance Monitor Counter Registers (PMC) |
| |
| The first ones are always available, always active, but they're not very |
| accurate: the registers increment by one every four *bus* cycles. On |
| my 667 Mhz tiBook (ppc7450), this means once every twenty *processor* |
| cycles. So we won't use that. |
| |
| The PMC are much more useful: not only can they report cycle-accurate |
| timing, but they can also be used to monitor many other parameters, |
| such as the number of AltiVec stalls for every kind of instruction, |
| or instruction cache misses. The downside is that not all processors |
| support the PMC (all G3, all G4 and the 970 do support them), and |
| they're inactive by default - you need to activate them with a |
| dedicated tool. Also, the number of available PMC depends on the |
| procesor: the various 604 have 2, the various 75x (aka. G3) have 4, |
| and the various 74xx (aka G4) have 6. |
| |
| *WARNING*: The PowerPC 970 is not very well documented, and its PMC |
| registers are 64 bits wide. To properly notify the code, you *must* |
| tune for the 970 (using --tune=970), or the code will assume 32 bit |
| registers. |
| |
| |
| II - Enabling FFmpeg PowerPC performance support |
| |
| This needs to be done by hand. First, you need to configure FFmpeg as |
| usual, but add the "--powerpc-perf-enable" option. For instance: |
| |
| ##### |
| ./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable |
| ##### |
| |
| This will configure FFmpeg to install inside /usr/local/ffmpeg-svn, |
| compiling with gcc-3.3 (you should try to use this one or a newer |
| gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of |
| thumb, those at 550Mhz and more). It will also enable the PMC. |
| |
| You may also edit the file "config.h" to enable the following line: |
| |
| ##### |
| // #define ALTIVEC_USE_REFERENCE_C_CODE 1 |
| ##### |
| |
| If you enable this line, then the code will not make use of AltiVec, |
| but will use the reference C code instead. This is useful to compare |
| performance between two versions of the code. |
| |
| Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h": |
| |
| ##### |
| #define POWERPC_NUM_PMC_ENABLED 4 |
| ##### |
| |
| If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more |
| PMC than available on your CPU! |
| |
| Then, simply compile FFmpeg as usual (make && make install). |
| |
| |
| |
| III - Using FFmpeg PowerPC performance support |
| |
| This FFmeg can be used exactly as usual. But before exiting, FFmpeg |
| will dump a per-function report that looks like this: |
| |
| ##### |
| PowerPC performance report |
| Values are from the PMC registers, and represent whatever the |
| registers are set to record. |
| Function "gmc1_altivec" (pmc1): |
| min: 231 |
| max: 1339867 |
| avg: 558.25 (255302) |
| Function "gmc1_altivec" (pmc2): |
| min: 93 |
| max: 2164 |
| avg: 267.31 (255302) |
| Function "gmc1_altivec" (pmc3): |
| min: 72 |
| max: 1987 |
| avg: 276.20 (255302) |
| (...) |
| ##### |
| |
| In this example, PMC1 was set to record CPU cycles, PMC2 was set to |
| record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec |
| Issue Stalls. |
| |
| The function "gmc1_altivec" was monitored 255302 times, and the |
| minimum execution time was 231 processor cycles. The max and average |
| aren't much use, as it's very likely the OS interrupted execution for |
| reasons of its own :-( |
| |
| With the exact same settings and source file, but using the reference C |
| code we get: |
| |
| ##### |
| PowerPC performance report |
| Values are from the PMC registers, and represent whatever the |
| registers are set to record. |
| Function "gmc1_altivec" (pmc1): |
| min: 592 |
| max: 2532235 |
| avg: 962.88 (255302) |
| Function "gmc1_altivec" (pmc2): |
| min: 0 |
| max: 33 |
| avg: 0.00 (255302) |
| Function "gmc1_altivec" (pmc3): |
| min: 0 |
| max: 350 |
| avg: 0.03 (255302) |
| (...) |
| ##### |
| |
| 592 cycles, so the fastest AltiVec execution is about 2.5x faster than |
| the fastest C execution in this example. It's not perfect but it's not |
| bad (well I wrote this function so I can't say otherwise :-). |
| |
| Once you have that kind of report, you can try to improve things by |
| finding what goes wrong and fixing it; in the example above, one |
| should try to diminish the number of AltiVec stalls, as this *may* |
| improve performance. |
| |
| |
| |
| IV) Enabling the PMC in Mac OS X |
| |
| This is easy. Use "Monster" and "monster". Those tools come from |
| Apple's CHUD package, and can be found hidden in the developer web |
| site & FTP site. "MONster" is the graphical application, use it to |
| generate a config file specifying what each register should |
| monitor. Then use the command-line application "monster" to use that |
| config file, and enjoy the results. |
| |
| Note that "MONster" can be used for many other things, but it's |
| documented by Apple, it's not my subject. |
| |
| If you are using CHUD 4.4.2 or later, you'll notice that MONster is |
| no longer available. It's been superseeded by Shark, where |
| configuration of PMCs is available as a plugin. |
| |
| |
| |
| V) Enabling the PMC on Linux |
| |
| On linux you may use oprofile from http://oprofile.sf.net, depending on the |
| version and the cpu you may need to apply a patch[1] to access a set of the |
| possibile counters from the userspace application. You can always define them |
| using the kernel interface /dev/oprofile/* . |
| |
| [1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch |
| |
| -- |
| Romain Dolbeau <romain@dolbeau.org> |
| Luca Barbato <lu_zero@gentoo.org> |