akaros/Documentation/profiling.txt
<<
>>
Prefs
   1Akaros Profiling
   2===========================
   3
   4Contents:
   5
   6 (*) Perf
   7     - Setup
   8     - Example
   9     - More Complicated Examples
  10     - Differences From Linux
  11
  12 (*) mpstat
  13
  14
  15===========================
  16PERF
  17===========================
  18Akaros has limited support for perf_events.  perf is a tool which utilizes CPU
  19performance counters for performance monitoring and troubleshooting.
  20
  21Akaros has its own version of perf, similar in spirit to Linux's perf, that
  22produces PERFILE2 ABI compliant perf.data files (if not, file a bug!).  The
  23kernel generates traces, under the direction of perf.  You then copy the traces
  24to a Linux host and process using Linux's perf.
  25
  26
  27SETUP
  28--------------------
  29To build Akaros's perf directly:
  30
  31(linux)$ cd tools/dev-libs/elfutils ; make install; cd -
  32(linux)$ cd tools/dev-util/perf ; make install; cd -
  33
  34Or to build it along with all apps:
  35
  36(linux)$ make apps-install
  37
  38You will also need suitable recent Linux perf for the reporting of the data
  39(something that understands PERFILE2 format).  Unpatched Linux 4.5 perf did the
  40trick.  You'll also want libelf and maybe other libraries on your Linux
  41machine.
  42
  43First, install libelf according to your distro.  On ubuntu:
  44(linux) $ sudo apt-get install libelf-dev
  45
  46Then try to just install perf using your Linux distro, and install any needed
  47dependencies.  On ubuntu, you can install linux-tools-common and whatever else
  48it asks for (something particular to your host kernel).
  49
  50Linux perf changes a lot.  Newer versions are usually nicer.  I recommend
  51building one of them:  Download Linux source, then
  52
  53(linux) $ cd tools/perf/
  54(linux) $ make
  55
  56Then use your new perf binary.  This all is just installing a recent perf - it
  57has little to do with Akaros at this point.  If you run into incompatibilities
  58between our perf.data format and the latest Linux, file a bug.
  59
  60
  61BASIC EXAMPLE
  62--------------------
  63Perf on Akaros supports record, stat, and a few custom options.
  64
  65You should be able to do the following:
  66
  67/ $ perf record ls
  68
  69Then scp perf.data to Linux
  70
  71(linux) $ scp AKAROS_MACHINE:perf.data .
  72(linux) $ perf report --kallsyms=obj/kern/ksyms.map --symfs=kern/kfs/
  73
  74Perf will look on your host machine for the kernel symbol table and for
  75binaries.  We need to tell it kallsyms and symfs to override those settings.
  76
  77It can be a hassle to type out the kallsyms and symfs, so we have a script that
  78will automate that.  Use scripts/perf in any place that you'd normally use
  79perf.  Set your $AKAROS_ROOT (default is ".") and optionally override $PERF_CMD
  80("default is "perf").  For most people, this will just be:
  81
  82(linux) $ ./scripts/perf report
  83
  84The perf.data file is implied, so the above command is equivalent to:
  85
  86(linux) $ ./scripts/perf report -i perf.data
  87
  88
  89MORE COMPLICATED EXAMPLES
  90--------------------
  91First, try perf --help for usage.  Then check out
  92https://perf.wiki.kernel.org/index.php/Tutorial.  We strive to be mostly
  93compatible with the usage of Linux perf.
  94
  95perf stat runs a command and reports the count of events during the run of the
  96command.  perf record runs a command and outputs perf.data, which contains
  97backtrace samples from when the event counters overflowed.  For those familiar
  98with other perfmon systems, perf stat is like PAPI and perf record is like
  99Oprofile.
 100
 101perf record and stat both track a set of events with the -e flag.  -e takes a
 102comma-separated list of events.  Events can be expressed in one of three forms:
 103
 104- Generic events (called "pre-defined" events on Linux)
 105- Libpfm events
 106- Raw events
 107
 108Linux's perf only takes Generic and Raw events, so the libpfm4 is an added
 109bonus.
 110
 111Generic events consist of strings like "cycles" or "cache-misses".  Raw events
 112aresimple strings of the form "rXXX", where the X's are hex nibbles.  The hex
 113codes are passed directly to the PMU.  You can actually have 2-4 Xs on Akaros.
 114
 115Libpfm events are strings that correspond to events specific to your machine.
 116Libpfm knows about PMU events for a given machine.  It figures out what machine
 117perf is running on and selects events that should be available.  Check out
 118http://perfmon2.sourceforge.net/ for more info.
 119
 120To see the list of events available, use `perf list [regex]`, supplying an
 121optional search regex.  For example, on a Haswell:
 122
 123/ $ perf list unhalted_reference_cycles
 124#-----------------------------
 125IDX      : 37748738
 126PMU name : ix86arch (Intel X86 architectural PMU)
 127Name     : UNHALTED_REFERENCE_CYCLES
 128Equiv    : None
 129Flags    : None
 130Desc     : count reference clock cycles while the clock signal on the specific core is running. The reference clock operates at a fixed frequency, irrespective of c
 131ore frequency changes due to performance state transitions
 132Code     : 0x13c
 133Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
 134Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
 135Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
 136Modif-03 : 0x03 : PMU : [i] : invert (boolean)
 137Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
 138Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
 139#-----------------------------
 140IDX      : 322961409
 141PMU name : hsw_ep (Intel Haswell EP)
 142Name     : UNHALTED_REFERENCE_CYCLES
 143Equiv    : None
 144Flags    : None
 145Desc     : Unhalted reference cycles
 146Code     : 0x300
 147Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
 148Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
 149Modif-02 : 0x05 : PMU : [t] : measure any thread (boolean)
 150
 151There are two different events for UNHALTED_REFERENCE_CYCLES (case
 152insensitive).  libpfm will select the most appropriate one.  You can override
 153this selection by specifying a PMU:
 154
 155/ $ perf stat -e ix86arch::UNHALTED_REFERENCE_CYCLES ls
 156
 157Here's how to specify multiple events:
 158
 159/ $ perf record -e cycles,instructions ls
 160
 161Events also take a set of modifiers.  For instance, you can specify running
 162counters only in kernel mode or user mode.  Modifiers are separated by a ':'.
 163
 164This will track only user cycles (default is user and kernel):
 165
 166/ $ perf record -e cycles:u ls
 167
 168To use a raw event, you need to know the event number.  You can either look in
 169your favorite copy of the SDM, or you can ask libpfm.  Though if you ask
 170libpfm, you might as well just use its string processing.  For example:
 171
 172/ $ perf list FLUSH
 173#-----------------------------
 174IDX      : 322961462
 175PMU name : hsw_ep (Intel Haswell EP)
 176Name     : TLB_FLUSH
 177Equiv    : None
 178Flags    : None
 179Desc     : TLB flushes
 180Code     : 0xbd
 181Umask-00 : 0x01 : PMU : [DTLB_THREAD] : None : Count number of DTLB flushes of thread-specific entries
 182Umask-01 : 0x20 : PMU : [STLB_ANY] : None : Count number of any STLB flushes
 183Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
 184Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
 185Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
 186Modif-03 : 0x03 : PMU : [i] : invert (boolean)
 187Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
 188Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
 189Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
 190Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
 191
 192The raw code is 0xbd.  So the following are equivalent (but slightly buggy!):
 193
 194/ $ perf stat -e TLB_FLUSH ls
 195/ $ perf stat -e rbd ls
 196
 197If you actually run those, rbd will have zero hits, and TLB_FLUSH will give you
 198the error "Failed to parse event string TLB_FLUSH".
 199
 200Some events actually rather particular to their Umasks, and TLB_FLUSH is one of
 201them.  TLB_FLUSH wants a Umask.  Umasks are selectors for specific sub-types of
 202events.  In the case of TLB_FLUSH, we can choose between DTLB_THREAD and
 203STLB_ANY.  Umasks are not always required - they just happen to be on my
 204Haswell for TLB_FLUSH.  That being said, we can ask for the event like so:
 205
 206/ $ perf stat -e TLB_FLUSH:STLB_ANY ls
 207/ $ perf stat -e r20bd ls
 208
 209Note that the Umask is placed before the Code.  These 16 bits are passed
 210directly to the PMU, and on Intel the format is "umask:event".
 211
 212perf record is based on recording samples when event counters overflow.  The
 213number of events required to trigger a sample is referred to as the
 214sample_period.  You can set it with -c, e.g.
 215
 216/ $ perf record -c 10000 ls
 217
 218
 219DIFFERENCES FROM LINUX
 220--------------------
 221For the most part, Akaros perf is similar to Linux.  A few things are
 222different.
 223
 224The biggest difference is that our perf does not follow processes around.  We
 225count events for cores, not processes.  You can specify certain cores, but not
 226certain processes.  Any options related to tracking specific processes are
 227unsupported.
 228
 229The -F option (frequency) is loosely supported.  The kernel cannot adjust the
 230sampling count dynamically to meet a certain frequencey.  Instead, we guess
 231that -F is used with cycles, and pick a sample period that will generate
 232samples at the desired frequency if the core is unhalted.  YMMV.
 233
 234Akaros currently supports only PMU events.  In the future, we may add events
 235like context-switches.
 236
 237
 238===========================
 239mpstat
 240===========================
 241Akaros has basic support for mpstat.  mpstat gives a high-level glance at where
 242each core is spending its time.
 243
 244For starters, bind kprof somewhere.  The basic ifconfig script binds it to
 245/prof.
 246
 247To see the CPU usage, cat mpstat:
 248
 249/ $ cat /prof/mpstat
 250 CPU:             irq             kern              user                 idle
 251   0: 1.707136 (  0%), 24.978659 (  0%), 0.162845 (  0%), 13856.233909 ( 99%)
 252
 253To reset the count:
 254
 255/ $ echo reset > /prof/mpstat
 256
 257To see the output for a particular command:
 258
 259/ $ echo reset > /prof/mpstat ; COMMAND ; cat /prof/mpstat
 260