Implemented the new profiler
authorDavide Libenzi <dlibenzi@google.com>
Wed, 21 Oct 2015 23:39:04 +0000 (16:39 -0700)
committerBarret Rhoden <brho@cs.berkeley.edu>
Wed, 18 Nov 2015 17:56:34 +0000 (09:56 -0800)
Implemented the new profiler format and added simple userspace
stack trace (waiting for copy_from_user()).

Signed-off-by: Davide Libenzi <dlibenzi@google.com>
[checkpatch touchups]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
15 files changed:
Documentation/profiling.txt
kern/arch/x86/kdebug.c
kern/arch/x86/uaccess.h
kern/drivers/dev/kprof.c
kern/include/kdebug.h
kern/include/kprof.h [new file with mode: 0644]
kern/include/profiler.h
kern/include/ros/profiler_records.h [new file with mode: 0644]
kern/include/stdio.h
kern/include/string.h
kern/src/mm.c
kern/src/process.c
kern/src/profiler.c
kern/src/string.c
kern/src/syscall.c

index 0216597..92f1171 100644 (file)
@@ -4,12 +4,16 @@ Akaros Profiling
 
 Contents
 ---------------------------
-"Oprofile"
+"Kprof"
 
-"Oprofile"
+"Kprof"
 ---------------------------
 Akaros has a very basic sampling profiler, similar to oprofile.  The kernel
-generates traces, which you copy off the machine and process on Linux.
+generates traces, which you copy off the machine and process on Linux using
+Linux perf.
+First build the Akaros kernel:
+
+/ $ make && make xcc-headers-install && make apps-install
 
 To get started, make sure #K is mounted.  The basic ifconfig script will do
 this, as will:
@@ -19,25 +23,34 @@ this, as will:
 You control the profiler with the kpctl file.  The general style is to start
 the events that trigger a sample, such as a timer tick, then you start and stop
 the profiling.  The distinction between the two steps is that one actually
-fires the events (e.g. the timer IRQ), and the other enables *collection* of profiling info when those events occur.
+fires the events (e.g. the timer IRQ), and the other enables *collection*
+of profiling info when those events occur.
 
-The optimer command takes the core id (or "all"), followed by "on" or "off".
+The timer command takes the core id (or "all"), followed by "on" or "off".
 As with all good devices, if you echo garbage, in, you should get the usage as
 an errstr.  That'll be kept up to date more than documentation.
+The profiler accepts a few configuration options.
+There is a queue size limit of 64MB by default, and it is used as circular
+buffer, so old data will be dropped.
+To change its value:
+
+/ $ echo prof_qlimit SIZE_KB > /prof/kpctl
 
-/ $ echo garbage > /prof/kpctl
-echo failed: Unspecified, startclr|start|stop|clear|opstart|opstop|optimer
+This should be run before starting the profiler.
+There is a limit of the maximum call strace dept, by default 16.
+To change it:
 
-/ $ echo optimer garbage > /prof/kpctl
-echo failed: Unspecified, optimer [<0|1|..|n|all> <on|off>] [period USEC]
+/ $ echo prof_btdepth DEPTH > /prof/kpctl
 
-Let's set up the timer on core 0:
+This should be run before starting the profiler as well.
+It is possible to configure the timer period, which defaults to 1000us, though
+it is not suggested to move too far from the default:
 
-/ $ echo optimer 0 on > /prof/kpctl
+/ $ echo timer period 1000 > /prof/kpctl
 
-And then start oprofile system-wide.
+And then start the Akaros profiler system-wide.
 
-/ $ echo opstart > /prof/kpctl
+/ $ echo start > /prof/kpctl
 Enable tracing on 0
 Enable tracing on 1
 Enable tracing on 2
@@ -50,48 +63,44 @@ Enable tracing on 7
 Run whatever command you want, then stop the profiler.
 
 / $ foo
-/ $ echo opstop > /prof/kpctl
-Core 0 has data
-After qibwrite in oprofile_cpubuf_flushone, opq len 303080
+/ $ echo stop > /prof/kpctl
+
+The trace will be then available in the /prof/kpdata file.
+The data will be available until the next start of the profiler.
+Then copy this on your dev box.
+The easiest way is via 9p:
 
-Might as well turn off the timers:
-/ $ echo optimer all off > /prof/kpctl
+/ $ cp /prof/kpdata /mnt/
 
-Now we need to extract the trace.  The easiest way is via 9p.
-/ $ cat /prof/kpoprofile > trace
-/ $ cp trace /mnt/
+Or by using the simple netcat (snc) utility.
+On your dev box:
 
-Once the trace has been read from kpoprofile, it cannot be read again.  The
-read drains the kernel's trace buffer.
+/ $ nc -l PORT > kpdata.data
 
-The trace that the kernel generates is in an Akaros-specific format.  There is
-a go program at tools/profile/op2.go that translates from the Akaros format to
-pprof format.  You could run this on Akaros, since we support Go programs, but
-since we don't have a port of pprof, it's easier to do it all in Linux.
+On Akaros:
 
-So now we're in linux, and say our 9p ufs server is rooted at mnt/netroot/.  Run op2:
+/ $ scn -s DEVBOX_IP -p PORT -i /prof/kpdata
 
-(linux) $ op2 < mnt/netroot/trace > trace-pp
+In order to process the Akaros kprof file, you need to convert it to the
+Linux perf one.
+You can do that, on your dev box, with:
 
-To get a sense for what the trace holds, you might want to start with looking at the raw addresses to distinguish between the kernel and the user.
+/ $ ./tools/profile/kprof2perf/kprof2perf-linux -k `pwd`/obj/kern/akaros-kernel-64b -i kpdata.data -o perf.data
 
-(linux) $ pprof --addresses trace-pp
-PPROF> top
-       (shows some addresses)
+You then need to build the Akaros specific Linux perf binary.
+First you need to install (if you have not already) libelf-dev:
 
-Say the majority of the addresses are user addresses:
+\ $ sudo apt-get install libelf-dev
 
-(linux) $ pprof obj/tests/foo trace-pp
-PPROF> top
-       (shows some functions)
+Then pull the Linux kernel source code which is closer to the kernel
+version you are running in your dev box, and patch it:
 
-Or you can visualize things:
-(linux) $ pprof --evince obj/tests/foo trace-pp
+/ $ cd linux
+/ $ patch -p 1 < $AKAROS/tools/profile/kprof2perf/perf_patches/perf_patch.diff
+/ $ cd tools/perf
+/ $ make
 
-The visualization is not of much user for user programs, since the kernel does
-not record backtraces for userspace by default.  It's a little dangerous at the
-moment.  In the future, we may have an op option to control whether or not the
-kernel attempts a backtrace.
+Then you should be able to run Linux perf data analysis command on it:
+Example:
 
-For more info on pprof, check out:
-http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
+$ /PATH_TO/perf --root-dir $AKAROS/kern/kfs/ report -g -i perf.data
index 869ee01..3ee432c 100644 (file)
@@ -5,6 +5,7 @@
 #include <pmap.h>
 #include <process.h>
 #include <kmalloc.h>
+#include <arch/uaccess.h>
 
 #include <ros/memlayout.h>
 
@@ -365,6 +366,27 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
        return nr_pcs;
 }
 
+size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
+                                                  size_t nr_slots)
+{
+       int error;
+       size_t nr_pcs = 0;
+       uintptr_t frame[2];
+
+       for (;;) {
+               error = copy_from_user(frame, (const void *) fp, 2 * sizeof(uintptr_t));
+               if (unlikely(error) || unlikely(nr_pcs >= nr_slots))
+                       break;
+
+               /* For now straight memory access, waiting for copy_from_user(). */
+               pcs[nr_pcs++] = pc;
+               pc = frame[1];
+               fp = frame[0];
+       }
+
+       return nr_pcs;
+}
+
 void backtrace_frame(uintptr_t eip, uintptr_t ebp)
 {
        gen_backtrace_frame(eip, ebp, &printk_func, NULL);
index f707b30..17d284f 100644 (file)
@@ -75,7 +75,8 @@ struct extable_ip_fixup {
                                 ".previous\n"                                                                                  \
                                 _ASM_EXTABLE(1b, 3b)                                                                   \
                                 : "=r"(err)                                                                                    \
-                                : "D" (dst), "S" (src), "c" (count), "i" (errret), "0" (err))
+                                : "D" (dst), "S" (src), "c" (count), "i" (errret), "0" (err) \
+                                : "memory")
 
 static inline int __put_user(void *dst, const void *src, unsigned int count)
 {
index 69b05b1..5fe7952 100644 (file)
@@ -7,96 +7,90 @@
  * in the LICENSE file.
  */
 
-// get_fn_name is slowing down the kprocread
-//     have an array of translated fns
-//     or a "next" iterator, since we're walking in order
-//
-// irqsave locks
-//
-// kprof struct should be a ptr, have them per core
-//             we'll probably need to track the length still, so userspace knows how
-//             big it is
-//
-//             will also want more files in the kprof dir for each cpu or something
-//
-// maybe don't use slot 0 and 1 as total and 'not kernel' ticks
-//
-// fix the failed assert XXX
-
 #include <vfs.h>
-#include <kfs.h>
 #include <slab.h>
 #include <kmalloc.h>
 #include <kref.h>
+#include <atomic.h>
+#include <kthread.h>
 #include <string.h>
 #include <stdio.h>
 #include <assert.h>
 #include <error.h>
-#include <cpio.h>
 #include <pmap.h>
 #include <smp.h>
-#include <ip.h>
+#include <circular_buffer.h>
+#include <umem.h>
 #include <profiler.h>
+#include <kprof.h>
 
-struct dev kprofdevtab;
-
-static char *devname(void)
-{
-       return kprofdevtab.name;
-}
-
-#define LRES   3               /* log of PC resolution */
-#define CELLSIZE       8       /* sizeof of count cell */
-
-struct kprof
-{
-       uintptr_t       minpc;
-       uintptr_t       maxpc;
-       int     nbuf;
-       int     time;
-       uint64_t        *buf;   /* keep in sync with cellsize */
-       size_t          buf_sz;
-       spinlock_t lock;
-       struct queue *systrace;
-       bool            mpstat_ipi;
-};
-struct kprof kprof;
+#define KTRACE_BUFFER_SIZE (128 * 1024)
+#define TRACE_PRINTK_BUFFER_SIZE (8 * 1024)
 
-/* output format. Nice fixed size. That makes it seekable.
- * small subtle bit here. You have to convert offset FROM FORMATSIZE units
- * to CELLSIZE units in a few places.
- */
-char *outformat = "%016llx %29.29s %016llx\n";
-#define FORMATSIZE 64
-enum{
+enum {
        Kprofdirqid = 0,
        Kprofdataqid,
        Kprofctlqid,
-       Kprofoprofileqid,
        Kptraceqid,
        Kprintxqid,
        Kmpstatqid,
        Kmpstatrawqid,
 };
 
-struct dirtab kproftab[]={
-       {".",           {Kprofdirqid, 0, QTDIR},0,      DMDIR|0550},
-       {"kpdata",      {Kprofdataqid},         0,      0600},
-       {"kpctl",       {Kprofctlqid},          0,      0600},
-       {"kpoprofile",  {Kprofoprofileqid},     0,      0600},
-       {"kptrace",     {Kptraceqid},           0,      0600},
-       {"kprintx",     {Kprintxqid},           0,      0600},
-       {"mpstat",      {Kmpstatqid},           0,      0600},
-       {"mpstat-raw",  {Kmpstatrawqid},                0,      0600},
+struct trace_printk_buffer {
+       int in_use;
+       char buffer[TRACE_PRINTK_BUFFER_SIZE];
 };
 
-static size_t mpstatraw_len(void);
-static size_t mpstat_len(void);
+struct kprof {
+       struct semaphore lock;
+       struct alarm_waiter *alarms;
+       bool mpstat_ipi;
+       bool profiling;
+       char *pdata;
+       size_t psize;
+};
+
+struct dev kprofdevtab;
+struct dirtab kproftab[] = {
+       {".",                   {Kprofdirqid,           0, QTDIR}, 0,   DMDIR|0550},
+       {"kpdata",              {Kprofdataqid},         0,      0600},
+       {"kpctl",               {Kprofctlqid},          0,      0600},
+       {"kptrace",             {Kptraceqid},           0,      0600},
+       {"kprintx",             {Kprintxqid},           0,      0600},
+       {"mpstat",              {Kmpstatqid},           0,      0600},
+       {"mpstat-raw",  {Kmpstatrawqid},        0,      0600},
+};
+
+extern int booting;
+static struct kprof kprof;
+static bool ktrace_init_done = FALSE;
+static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
+static struct circular_buffer ktrace_data;
+static char ktrace_buffer[KTRACE_BUFFER_SIZE];
+static int oprof_timer_period = 1000;
+
+static size_t mpstat_len(void)
+{
+       size_t each_row = 7 + NR_CPU_STATES * 26;
 
-static struct alarm_waiter *oprof_alarms;
-static unsigned int oprof_timer_period = 1000;
+       return each_row * (num_cores + 1) + 1;
+}
 
-static void oprof_alarm_handler(struct alarm_waiter *waiter,
+static size_t mpstatraw_len(void)
+{
+       size_t header_row = 27 + NR_CPU_STATES * 7 + 1;
+       size_t cpu_row = 7 + NR_CPU_STATES * 17;
+
+       return header_row + cpu_row * num_cores + 1;
+}
+
+static char *devname(void)
+{
+       return kprofdevtab.name;
+}
+
+static void kprof_alarm_handler(struct alarm_waiter *waiter,
                                 struct hw_trapframe *hw_tf)
 {
        int coreid = core_id();
@@ -106,135 +100,194 @@ static void oprof_alarm_handler(struct alarm_waiter *waiter,
        reset_alarm_rel(tchain, waiter, oprof_timer_period);
 }
 
-static struct chan*
-kprofattach(char *spec)
+static struct chan *kprof_attach(char *spec)
 {
-       // Did we initialise completely?
-       if ( !(oprof_alarms && kprof.buf && kprof.systrace) )
+       if (!kprof.alarms)
                error(ENOMEM, NULL);
 
        return devattach(devname(), spec);
 }
 
-static void
-kproftimer(uintptr_t pc)
+static void kprof_enable_timer(int coreid, int on_off)
 {
-       if(kprof.time == 0)
-               return;
+       struct timer_chain *tchain = &per_cpu_info[coreid].tchain;
+       struct alarm_waiter *waiter = &kprof.alarms[coreid];
 
-       /*
-        * if the pc corresponds to the idle loop, don't consider it.
+       if (on_off) {
+               /* Per CPU waiters already inited.  Will set/reset each time (1 ms
+                * default). */
+               reset_alarm_rel(tchain, waiter, oprof_timer_period);
+       } else {
+               /* Since the alarm handler runs and gets reset within IRQ context, then
+                * we should never fail to cancel the alarm if it was already running
+                * (tchain locks synchronize us).  But it might not be set at all, which
+                * is fine. */
+               unset_alarm(tchain, waiter);
+       }
+}
 
-       if(m->inidle)
-               return;
-        */
-       /*
-        *  if the pc is coming out of spllo or splx,
-        *  use the pc saved when we went splhi.
+static void kprof_profdata_clear(void)
+{
+       kfree(kprof.pdata);
+       kprof.pdata = NULL;
+       kprof.psize = 0;
+}
 
-       if(pc>=PTR2UINT(spllo) && pc<=PTR2UINT(spldone))
-               pc = m->splpc;
-        */
+static void kprof_start_profiler(void)
+{
+       ERRSTACK(2);
 
-//     ilock(&kprof);
-       /* this is weird. What we do is assume that all the time since the last
-        * measurement went into this PC. It's the best
-        * we can do I suppose. And we are sampling at 1 ms. for now.
-        * better ideas welcome.
-        */
-       kprof.buf[0] += 1; //Total count of ticks.
-       if(kprof.minpc<=pc && pc<kprof.maxpc){
-               pc -= kprof.minpc;
-               pc >>= LRES;
-               kprof.buf[pc] += 1;
-       }else
-               kprof.buf[1] += 1; // Why?
-//     iunlock(&kprof);
+       sem_down(&kprof.lock);
+       if (waserror()) {
+               sem_up(&kprof.lock);
+               nexterror();
+       }
+       if (!kprof.profiling) {
+               profiler_init();
+               if (waserror()) {
+                       profiler_cleanup();
+                       nexterror();
+               }
+
+               profiler_control_trace(1);
+
+               for (int i = 0; i < num_cores; i++)
+                       kprof_enable_timer(i, 1);
+
+               kprof.profiling = TRUE;
+
+               kprof_profdata_clear();
+       }
+       poperror();
+       poperror();
+       sem_up(&kprof.lock);
 }
 
-static void setup_timers(void)
+static void kprof_fetch_profiler_data(void)
 {
-       void kprof_alarm(struct alarm_waiter *waiter, struct hw_trapframe *hw_tf)
-       {
-               struct timer_chain *tchain = &per_cpu_info[core_id()].tchain;
-               kproftimer(get_hwtf_pc(hw_tf));
-               set_awaiter_rel(waiter, 1000);
-               set_alarm(tchain, waiter);
+       size_t psize = profiler_size();
+
+       kprof.pdata = kmalloc(psize, KMALLOC_WAIT);
+       if (!kprof.pdata)
+               error(ENOMEM, NULL);
+       kprof.psize = 0;
+       while (kprof.psize < psize) {
+               size_t csize = profiler_read(kprof.pdata + kprof.psize,
+                                                                        psize - kprof.psize);
+
+               if (csize == 0)
+                       break;
+               kprof.psize += csize;
        }
-       struct timer_chain *tchain = &per_cpu_info[core_id()].tchain;
-       struct alarm_waiter *waiter = kmalloc(sizeof(struct alarm_waiter), 0);
-       init_awaiter_irq(waiter, kprof_alarm);
-       set_awaiter_rel(waiter, 1000);
-       set_alarm(tchain, waiter);
 }
 
-static void kprofinit(void)
+static void kprof_stop_profiler(void)
 {
-       uint32_t n;
-
-       static_assert(CELLSIZE == sizeof kprof.buf[0]); // kprof size
-
-       /* allocate when first used */
-       kprof.minpc = KERN_LOAD_ADDR;
-       kprof.maxpc = (uintptr_t) &etext;
-       kprof.nbuf = (kprof.maxpc-kprof.minpc) >> LRES;
-       n = kprof.nbuf*CELLSIZE;
-       kprof.buf = kzmalloc(n, KMALLOC_WAIT);
-       if (kprof.buf)
-               kprof.buf_sz = n;
-
-       /* no, i'm not sure how we should do this yet. */
-       profiler_init();
-       oprof_alarms = kzmalloc(sizeof(struct alarm_waiter) * num_cores,
-                               KMALLOC_WAIT);
-       if (!oprof_alarms)
-               error(ENOMEM, NULL);
+       ERRSTACK(1);
 
-       for (int i = 0; i < num_cores; i++)
-               init_awaiter_irq(&oprof_alarms[i], oprof_alarm_handler);
+       sem_down(&kprof.lock);
+       if (waserror()) {
+               sem_up(&kprof.lock);
+               nexterror();
+       }
+       if (kprof.profiling) {
+               for (int i = 0; i < num_cores; i++)
+                       kprof_enable_timer(i, 0);
+               profiler_control_trace(0);
+               kprof_fetch_profiler_data();
+               profiler_cleanup();
+
+               kprof.profiling = FALSE;
+       }
+       poperror();
+       sem_up(&kprof.lock);
+}
 
-       kprof.systrace = qopen(2 << 20, 0, 0, 0);
-       if (!kprof.systrace) {
-               printk("systrace allocate failed. No system call tracing\n");
+static void kprof_init(void)
+{
+       int i;
+       ERRSTACK(1);
+
+       sem_init(&kprof.lock, 1);
+       kprof.profiling = FALSE;
+       kprof.pdata = NULL;
+       kprof.psize = 0;
+
+       kprof.alarms = kzmalloc(sizeof(struct alarm_waiter) * num_cores,
+                                                       KMALLOC_WAIT);
+       if (!kprof.alarms)
+               error(ENOMEM, NULL);
+       if (waserror()) {
+               kfree(kprof.alarms);
+               kprof.alarms = NULL;
+               nexterror();
        }
-       kprof.mpstat_ipi = TRUE;
+       for (i = 0; i < num_cores; i++)
+               init_awaiter_irq(&kprof.alarms[i], kprof_alarm_handler);
+
+       for (i = 0; i < ARRAY_SIZE(kproftab); i++)
+               kproftab[i].length = 0;
 
-       kproftab[Kprofdataqid].length = kprof.nbuf * FORMATSIZE;
+       kprof.mpstat_ipi = TRUE;
        kproftab[Kmpstatqid].length = mpstat_len();
        kproftab[Kmpstatrawqid].length = mpstatraw_len();
+
+       poperror();
 }
 
-static void kprofshutdown(void)
+static void kprof_shutdown(void)
 {
-       kfree(oprof_alarms); oprof_alarms = NULL;
-       kfree(kprof.buf); kprof.buf = NULL;
-       qfree(kprof.systrace); kprof.systrace = NULL;
-       profiler_cleanup();
+       kprof_stop_profiler();
+       kprof_profdata_clear();
+
+       kfree(kprof.alarms);
+       kprof.alarms = NULL;
 }
 
-static struct walkqid*
-kprofwalk(struct chan *c, struct chan *nc, char **name, int nname)
+static void kprofclear(void)
+{
+       sem_down(&kprof.lock);
+       kprof_profdata_clear();
+       sem_up(&kprof.lock);
+}
+
+static struct walkqid *kprof_walk(struct chan *c, struct chan *nc, char **name,
+                                                                int nname)
 {
        return devwalk(c, nc, name, nname, kproftab, ARRAY_SIZE(kproftab), devgen);
 }
 
-static int
-kprofstat(struct chan *c, uint8_t *db, int n)
+static size_t kprof_profdata_size(void)
 {
-       kproftab[Kprofoprofileqid].length = profiler_size();
-       if (kprof.systrace)
-               kproftab[Kptraceqid].length = qlen(kprof.systrace);
-       else
-               kproftab[Kptraceqid].length = 0;
+       return kprof.pdata != NULL ? kprof.psize : profiler_size();
+}
+
+static long kprof_profdata_read(void *dest, long size, int64_t off)
+{
+       sem_down(&kprof.lock);
+       if (kprof.pdata && off < kprof.psize) {
+               size = MIN(kprof.psize - off, size);
+               memcpy(dest, kprof.pdata + off, size);
+       } else {
+               size = 0;
+       }
+       sem_up(&kprof.lock);
+
+       return size;
+}
+
+static int kprof_stat(struct chan *c, uint8_t *db, int n)
+{
+       kproftab[Kprofdataqid].length = kprof_profdata_size();
+       kproftab[Kptraceqid].length = kprof_tracedata_size();
 
        return devstat(c, db, n, kproftab, ARRAY_SIZE(kproftab), devgen);
 }
 
-static struct chan*
-kprofopen(struct chan *c, int omode)
+static struct chan *kprof_open(struct chan *c, int omode)
 {
-       if(c->qid.type & QTDIR){
-               if(openmode(omode) != O_READ)
+       if (c->qid.type & QTDIR) {
+               if (openmode(omode) != O_READ)
                        error(EPERM, NULL);
        }
        c->mode = openmode(omode);
@@ -243,15 +296,8 @@ kprofopen(struct chan *c, int omode)
        return c;
 }
 
-static void
-kprofclose(struct chan*unused)
-{
-}
-
-static size_t mpstat_len(void)
+static void kprof_close(struct chan *c)
 {
-       size_t each_row = 7 + NR_CPU_STATES * 26;
-       return each_row * (num_cores + 1) + 1;
 }
 
 static long mpstat_read(void *va, long n, int64_t off)
@@ -292,13 +338,6 @@ static long mpstat_read(void *va, long n, int64_t off)
        return n;
 }
 
-static size_t mpstatraw_len(void)
-{
-       size_t header_row = 27 + NR_CPU_STATES * 7 + 1;
-       size_t cpu_row = 7 + NR_CPU_STATES * 17;
-       return header_row + cpu_row * num_cores + 1;
-}
-
 static long mpstatraw_read(void *va, long n, int64_t off)
 {
        size_t bufsz = mpstatraw_len();
@@ -330,90 +369,21 @@ static long mpstatraw_read(void *va, long n, int64_t off)
        return n;
 }
 
-static long
-kprofread(struct chan *c, void *va, long n, int64_t off)
+static long kprof_read(struct chan *c, void *va, long n, int64_t off)
 {
        uint64_t w, *bp;
        char *a, *ea;
        uintptr_t offset = off;
        uint64_t pc;
-       int snp_ret, ret = 0;
 
-       switch((int)c->qid.path){
+       switch ((int) c->qid.path) {
        case Kprofdirqid:
                return devdirread(c, va, n, kproftab, ARRAY_SIZE(kproftab), devgen);
-
        case Kprofdataqid:
-
-               if (n < FORMATSIZE){
-                       n = 0;
-                       break;
-               }
-               a = va;
-               ea = a + n;
-
-               /* we check offset later before deref bp.  offset / FORMATSIZE is how
-                * many entries we're skipping/offsetting. */
-               bp = kprof.buf + offset/FORMATSIZE;
-               pc = kprof.minpc + ((offset/FORMATSIZE)<<LRES);
-               while((a < ea) && (n >= FORMATSIZE)){
-                       /* what a pain. We need to manage the
-                        * fact that the *prints all make room for
-                        * \0
-                        */
-                       char print[FORMATSIZE+1];
-                       char *name;
-                       int amt_read;
-
-                       if (pc >= kprof.maxpc)
-                               break;
-                       /* pc is also our exit for bp.  should be in lockstep */
-                       // XXX this assert fails, fix it!
-                       //assert(bp < kprof.buf + kprof.nbuf);
-                       /* do not attempt to filter these results based on w < threshold.
-                        * earlier, we computed bp/pc based on assuming a full-sized file,
-                        * and skipping entries will result in read() calls thinking they
-                        * received earlier entries when they really received later ones.
-                        * imagine a case where there are 1000 skipped items, and read()
-                        * asks for chunks of 32.  it'll get chunks of the next 32 valid
-                        * items, over and over (1000/32 times). */
-                       w = *bp++;
-
-                       if (pc == kprof.minpc)
-                               name = "Total";
-                       else if (pc == kprof.minpc + 8)
-                               name = "User";
-                       else
-                               name = get_fn_name(pc);
-
-                       snp_ret = snprintf(print, sizeof(print), outformat, pc, name, w);
-                       assert(snp_ret == FORMATSIZE);
-                       if ((pc != kprof.minpc) && (pc != kprof.minpc + 8))
-                               kfree(name);
-
-                       amt_read = readmem(offset % FORMATSIZE, a, n, print, FORMATSIZE);
-                       offset = 0;     /* future loops have no offset */
-
-                       a += amt_read;
-                       n -= amt_read;
-                       ret += amt_read;
-
-                       pc += (1 << LRES);
-               }
-               n = ret;
-               break;
-       case Kprofoprofileqid:
-               n = profiler_read(va, n);
+               n = kprof_profdata_read(va, n, off);
                break;
        case Kptraceqid:
-               if (kprof.systrace) {
-                       printd("Kptraceqid: kprof.systrace %p len %p\n", kprof.systrace, qlen(kprof.systrace));
-                       if (qlen(kprof.systrace) > 0)
-                               n = qread(kprof.systrace, va, n);
-                       else
-                               n = 0;
-               } else
-                       error(EFAIL, "no systrace queue");
+               n = kprof_tracedata_read(va, n, off);
                break;
        case Kprintxqid:
                n = readstr(offset, va, n, printx_on ? "on" : "off");
@@ -431,111 +401,101 @@ kprofread(struct chan *c, void *va, long n, int64_t off)
        return n;
 }
 
-static void kprof_clear(struct kprof *kp)
+static void kprof_manage_timer(int coreid, struct cmdbuf *cb)
 {
-       spin_lock(&kp->lock);
-       memset(kp->buf, 0, kp->buf_sz);
-       spin_unlock(&kp->lock);
-}
-
-static void manage_oprof_timer(int coreid, struct cmdbuf *cb)
-{
-       struct timer_chain *tchain = &per_cpu_info[coreid].tchain;
-       struct alarm_waiter *waiter = &oprof_alarms[coreid];
        if (!strcmp(cb->f[2], "on")) {
-               /* pcpu waiters already inited.  will set/reset each time (1 ms
-                * default). */
-               reset_alarm_rel(tchain, waiter, oprof_timer_period);
+               kprof_enable_timer(coreid, 1);
        } else if (!strcmp(cb->f[2], "off")) {
-               /* since the alarm handler runs and gets reset within IRQ context, then
-                * we should never fail to cancel the alarm if it was already running
-                * (tchain locks synchronize us).  but it might not be set at all, which
-                * is fine. */
-               unset_alarm(tchain, waiter);
+               kprof_enable_timer(coreid, 0);
        } else {
-               error(EFAIL, "optimer needs on|off");
+               error(EFAIL, "timer needs on|off");
+       }
+}
+
+static void kprof_usage_fail(void)
+{
+       static const char *ctlstring = "clear|start|stop|timer";
+       const char * const *cmds = profiler_configure_cmds();
+       char msgbuf[128];
+
+       strlcpy(msgbuf, ctlstring, sizeof(msgbuf));
+       for (int i = 0; cmds[i]; i++) {
+               strlcat(msgbuf, "|", sizeof(msgbuf));
+               strlcat(msgbuf, cmds[i], sizeof(msgbuf));
        }
+
+       error(EFAIL, msgbuf);
 }
 
-static long
-kprofwrite(struct chan *c, void *a, long n, int64_t unused)
+static long kprof_write(struct chan *c, void *a, long n, int64_t unused)
 {
        ERRSTACK(1);
-       uintptr_t pc;
-       struct cmdbuf *cb;
-       char *ctlstring = "startclr|start|stop|clear|opstart|opstop|optimer";
-       cb = parsecmd(a, n);
+       struct cmdbuf *cb = parsecmd(a, n);
 
        if (waserror()) {
                kfree(cb);
                nexterror();
        }
-
-       switch((int)(c->qid.path)){
+       switch ((int) c->qid.path) {
        case Kprofctlqid:
                if (cb->nf < 1)
-                       error(EFAIL, ctlstring);
-
-               /* Kprof: a "which kaddr are we at when the timer goes off".  not used
-                * much anymore */
-               if (!strcmp(cb->f[0], "startclr")) {
-                       kprof_clear(&kprof);
-                       kprof.time = 1;
-               } else if (!strcmp(cb->f[0], "start")) {
-                       kprof.time = 1;
-                       /* this sets up the timer on the *calling* core! */
-                       setup_timers();
-               } else if (!strcmp(cb->f[0], "stop")) {
-                       /* TODO: stop the timers! */
-                       kprof.time = 0;
-               } else if (!strcmp(cb->f[0], "clear")) {
-                       kprof_clear(&kprof);
-
-               /* oprof: samples and traces using oprofile */
-               } else if (!strcmp(cb->f[0], "optimer")) {
+                       kprof_usage_fail();
+               if (profiler_configure(cb))
+                       break;
+               if (!strcmp(cb->f[0], "clear")) {
+                       kprofclear();
+               } else if (!strcmp(cb->f[0], "timer")) {
                        if (cb->nf < 3)
-                               error(EFAIL, "optimer [<0|1|..|n|all> <on|off>] [period USEC]");
+                               error(EFAIL, "timer {{all, N} {on, off}, period USEC}");
                        if (!strcmp(cb->f[1], "period")) {
                                oprof_timer_period = strtoul(cb->f[2], 0, 10);
                        } else if (!strcmp(cb->f[1], "all")) {
                                for (int i = 0; i < num_cores; i++)
-                                       manage_oprof_timer(i, cb);
+                                       kprof_manage_timer(i, cb);
                        } else {
                                int pcoreid = strtoul(cb->f[1], 0, 10);
+
                                if (pcoreid >= num_cores)
-                                       error(EFAIL, "no such coreid %d", pcoreid);
-                               manage_oprof_timer(pcoreid, cb);
+                                       error(EFAIL, "No such coreid %d", pcoreid);
+                               kprof_manage_timer(pcoreid, cb);
                        }
-               } else if (!strcmp(cb->f[0], "opstart")) {
-                       profiler_control_trace(1);
-               } else if (!strcmp(cb->f[0], "opstop")) {
-                       profiler_control_trace(0);
+               } else if (!strcmp(cb->f[0], "start")) {
+                       kprof_start_profiler();
+               } else if (!strcmp(cb->f[0], "stop")) {
+                       kprof_stop_profiler();
                } else {
-                       error(EFAIL, ctlstring);
+                       kprof_usage_fail();
                }
                break;
+       case Kprofdataqid:
+               profiler_add_trace((uintptr_t) strtoul(a, 0, 0));
+               break;
+       case Kptraceqid:
+               if (a && (n > 0)) {
+                       char *uptr = user_strdup_errno(current, a, n);
 
-               /* The format is a long as text. We strtoul, and jam it into the
-                * trace buffer.
-                */
-       case Kprofoprofileqid:
-               pc = strtoul(a, 0, 0);
-               profiler_add_trace(pc);
+                       if (uptr) {
+                               trace_printk(false, "%s", uptr);
+                               user_memdup_free(current, uptr);
+                       } else {
+                               n = -1;
+                       }
+               }
                break;
        case Kprintxqid:
                if (!strncmp(a, "on", 2))
                        set_printx(1);
                else if (!strncmp(a, "off", 3))
                        set_printx(0);
-               else if (!strncmp(a, "toggle", 6))      /* why not. */
+               else if (!strncmp(a, "toggle", 6))
                        set_printx(2);
                else
-                       error(EFAIL, "invalid option to Kprintx %s\n", a);
+                       error(EFAIL, "Invalid option to Kprintx %s\n", a);
                break;
        case Kmpstatqid:
        case Kmpstatrawqid:
                if (cb->nf < 1)
-                       error(EFAIL, "mpstat bad option (reset|ipi|on|off)");
+                       error(EFAIL, "Bad mpstat option (reset|ipi|on|off)");
                if (!strcmp(cb->f[0], "reset")) {
                        for (int i = 0; i < num_cores; i++)
                                reset_cpu_state_ticks(i);
@@ -545,7 +505,7 @@ kprofwrite(struct chan *c, void *a, long n, int64_t unused)
                        /* TODO: disable the ticks */ ;
                } else if (!strcmp(cb->f[0], "ipi")) {
                        if (cb->nf < 2)
-                               error(EFAIL, "need another arg: ipi [on|off]");
+                               error(EFAIL, "Need another arg: ipi [on|off]");
                        if (!strcmp(cb->f[1], "on"))
                                kprof.mpstat_ipi = TRUE;
                        else if (!strcmp(cb->f[1], "off"))
@@ -553,7 +513,7 @@ kprofwrite(struct chan *c, void *a, long n, int64_t unused)
                        else
                                error(EFAIL, "ipi [on|off]");
                } else {
-                       error(EFAIL, "mpstat bad option (reset|ipi|on|off)");
+                       error(EFAIL, "Bad mpstat option (reset|ipi|on|off)");
                }
                break;
        default:
@@ -564,63 +524,155 @@ kprofwrite(struct chan *c, void *a, long n, int64_t unused)
        return n;
 }
 
-void kprof_write_sysrecord(char *pretty_buf, size_t len)
+size_t kprof_tracedata_size(void)
+{
+       return circular_buffer_size(&ktrace_data);
+}
+
+size_t kprof_tracedata_read(void *data, size_t size, size_t offset)
 {
-       int wrote;
-       if (kprof.systrace) {
-               /* TODO: need qio work so we can simply add the buf as extra data */
-               wrote = qiwrite(kprof.systrace, pretty_buf, len);
-               /* based on the current queue settings, we only drop when we're running
-                * out of memory.  odds are, we won't make it this far. */
-               if (wrote != len)
-                       printk("DROPPED %s", pretty_buf);
+       spin_lock_irqsave(&ktrace_lock);
+       if (likely(ktrace_init_done))
+               size = circular_buffer_read(&ktrace_data, data, size, offset);
+       else
+               size = 0;
+       spin_unlock_irqsave(&ktrace_lock);
+
+       return size;
+}
+
+void kprof_tracedata_write(const char *pretty_buf, size_t len)
+{
+       spin_lock_irqsave(&ktrace_lock);
+       if (unlikely(!ktrace_init_done)) {
+               circular_buffer_init(&ktrace_data, sizeof(ktrace_buffer),
+                                                        ktrace_buffer);
+               ktrace_init_done = TRUE;
+       }
+       circular_buffer_write(&ktrace_data, pretty_buf, len);
+       spin_unlock_irqsave(&ktrace_lock);
+}
+
+static struct trace_printk_buffer *kprof_get_printk_buffer(void)
+{
+       static struct trace_printk_buffer boot_tpb;
+       static struct trace_printk_buffer *cpu_tpbs;
+
+       if (unlikely(booting))
+               return &boot_tpb;
+       if (unlikely(!cpu_tpbs)) {
+               /* Poor man per-CPU data structure. I really do no like littering global
+                * data structures with module specific data.
+                */
+               spin_lock_irqsave(&ktrace_lock);
+               if (!cpu_tpbs)
+                       cpu_tpbs = kzmalloc(num_cores * sizeof(struct trace_printk_buffer),
+                                                               0);
+               spin_unlock_irqsave(&ktrace_lock);
        }
+
+       return cpu_tpbs + core_id();
 }
 
-void trace_printk(const char *fmt, ...)
+void trace_vprintk(bool btrace, const char *fmt, va_list args)
 {
-       va_list ap;
-       struct timespec ts_now;
-       size_t bufsz = 160;     /* 2x terminal width */
-       size_t len = 0;
-       char *buf = kmalloc(bufsz, 0);
+       struct print_buf {
+               char *ptr;
+               char *top;
+       };
+
+       void emit_print_buf_str(struct print_buf *pb, const char *str, ssize_t size)
+       {
+               if (size < 0) {
+                       for (; *str && (pb->ptr < pb->top); str++)
+                               *(pb->ptr++) = *str;
+               } else {
+                       for (; (size > 0) && (pb->ptr < pb->top); str++, size--)
+                               *(pb->ptr++) = *str;
+               }
+       }
 
-       if (!buf)
+       void bt_print(void *opaque, const char *str)
+       {
+               struct print_buf *pb = (struct print_buf *) opaque;
+
+               emit_print_buf_str(pb, "\t", 1);
+               emit_print_buf_str(pb, str, -1);
+       }
+
+       static const size_t bufsz = TRACE_PRINTK_BUFFER_SIZE;
+       static const size_t usr_bufsz = (3 * bufsz) / 8;
+       static const size_t kp_bufsz = bufsz - usr_bufsz;
+       struct trace_printk_buffer *tpb = kprof_get_printk_buffer();
+       struct timespec ts_now = { 0, 0 };
+       struct print_buf pb;
+       char *usrbuf = tpb->buffer, *kpbuf = tpb->buffer + usr_bufsz;
+       const char *utop, *uptr;
+       char hdr[64];
+
+       if (tpb->in_use)
                return;
-       tsc2timespec(read_tsc(), &ts_now);
-       len += snprintf(buf + len, bufsz - len, "[%7d.%09d] /* ", ts_now.tv_sec,
-                       ts_now.tv_nsec);
-       va_start(ap, fmt);
-       len += vsnprintf(buf + len, bufsz - len, fmt, ap);
-       va_start(ap, fmt);
-       va_end(ap);
-       len += snprintf(buf + len, bufsz - len, " */\n");
-       va_start(ap, fmt);
+       tpb->in_use++;
+       if (likely(!booting))
+               tsc2timespec(read_tsc(), &ts_now);
+       snprintf(hdr, sizeof(hdr), "[%lu.%09lu]:cpu%d: ", ts_now.tv_sec,
+                        ts_now.tv_nsec, core_id());
+
+       pb.ptr = usrbuf + vsnprintf(usrbuf, usr_bufsz, fmt, args);
+       pb.top = usrbuf + usr_bufsz;
+
+       if (pb.ptr[-1] != '\n')
+               emit_print_buf_str(&pb, "\n", 1);
+       if (btrace) {
+               emit_print_buf_str(&pb, "\tBacktrace:\n", -1);
+               gen_backtrace(bt_print, &pb);
+       }
        /* snprintf null terminates the buffer, and does not count that as part of
-        * the len.  if we maxed out the buffer, let's make sure it has a \n */
-       if (len == bufsz - 1) {
-               assert(buf[bufsz - 1] == '\0');
-               buf[bufsz - 2] = '\n';
+        * the len.  If we maxed out the buffer, let's make sure it has a \n.
+        */
+       if (pb.ptr == pb.top)
+               pb.ptr[-1] = '\n';
+       utop = pb.ptr;
+
+       pb.ptr = kpbuf;
+       pb.top = kpbuf + kp_bufsz;
+       for (uptr = usrbuf; uptr < utop;) {
+               const char *nlptr = memchr(uptr, '\n', utop - uptr);
+
+               if (nlptr == NULL)
+                       nlptr = utop;
+               emit_print_buf_str(&pb, hdr, -1);
+               emit_print_buf_str(&pb, uptr, (nlptr - uptr) + 1);
+               uptr = nlptr + 1;
        }
-       kprof_write_sysrecord(buf, len);
-       kfree(buf);
+       kprof_tracedata_write(kpbuf, pb.ptr - kpbuf);
+       tpb->in_use--;
+}
+
+void trace_printk(bool btrace, const char *fmt, ...)
+{
+       va_list args;
+
+       va_start(args, fmt);
+       trace_vprintk(btrace, fmt, args);
+       va_end(args);
 }
 
 struct dev kprofdevtab __devtab = {
        .name = "kprof",
 
        .reset = devreset,
-       .init = kprofinit,
-       .shutdown = kprofshutdown,
-       .attach = kprofattach,
-       .walk = kprofwalk,
-       .stat = kprofstat,
-       .open = kprofopen,
+       .init = kprof_init,
+       .shutdown = kprof_shutdown,
+       .attach = kprof_attach,
+       .walk = kprof_walk,
+       .stat = kprof_stat,
+       .open = kprof_open,
        .create = devcreate,
-       .close = kprofclose,
-       .read = kprofread,
+       .close = kprof_close,
+       .read = kprof_read,
        .bread = devbread,
-       .write = kprofwrite,
+       .write = kprof_write,
        .bwrite = devbwrite,
        .remove = devremove,
        .wstat = devwstat,
index 0d1faa5..6b7afd7 100644 (file)
@@ -10,7 +10,7 @@ struct symtab_entry {
        uintptr_t addr;
 };
 
-#define TRACEME() oprofile_add_backtrace(read_pc(), read_bp())
+#define TRACEME() trace_printk(TRUE, "%s(%d)", __FILE__, __LINE__)
 
 void backtrace(void);
 void gen_backtrace_frame(uintptr_t eip, uintptr_t ebp,
@@ -19,6 +19,8 @@ void gen_backtrace(void (*pfunc)(void *, const char *), void *opaque);
 void backtrace_frame(uintptr_t pc, uintptr_t fp);
 size_t backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
                       size_t nr_slots);
+size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
+                                                  size_t nr_slots);
 void backtrace_kframe(struct hw_trapframe *hw_tf);
 /* for includes */ struct proc;
 void backtrace_user_ctx(struct proc *p, struct user_context *ctx);
@@ -44,8 +46,16 @@ int printdump(char *buf, int buflen, uint8_t *data);
 
 extern bool printx_on;
 void set_printx(int mode);
-#define printx(args...) if (printx_on) printk(args)
-#define trace_printx(args...) if (printx_on) trace_printk(args)
+#define printx(args...)                                                        \
+       do {                                                                                    \
+               if (printx_on)                                                          \
+                       printk(args);                                                   \
+       } while (0)
+#define trace_printx(args...)                                          \
+       do {                                                                                    \
+               if (printx_on)                                                          \
+                       trace_printk(TRUE, args);                               \
+       } while (0)
 
 void debug_addr_proc(struct proc *p, unsigned long addr);
 void debug_addr_pid(int pid, unsigned long addr);
diff --git a/kern/include/kprof.h b/kern/include/kprof.h
new file mode 100644 (file)
index 0000000..3879486
--- /dev/null
@@ -0,0 +1,15 @@
+/* Copyright (c) 2015 Google Inc
+ * Davide Libenzi <dlibenzi@google.com>
+ * See LICENSE for details.
+ */
+
+#pragma once
+
+#include <stdio.h>
+#include <stdarg.h>
+
+size_t kprof_tracedata_size(void);
+size_t kprof_tracedata_read(void *data, size_t size, size_t offset);
+void kprof_tracedata_write(const char *pretty_buf, size_t len);
+void trace_vprintk(bool btrace, const char *fmt, va_list args);
+void trace_printk(bool btrace, const char *fmt, ...);
index 4d13ebc..862c87f 100644 (file)
@@ -1,18 +1,30 @@
+/* Copyright (c) 2015 Google Inc
+ * Davide Libenzi <dlibenzi@google.com>
+ * See LICENSE for details.
+ */
 
-#ifndef ROS_KERN_INC_PROFILER_H
-#define ROS_KERN_INC_PROFILER_H
+#pragma once
 
-#include <sys/types.h>
-#include <trap.h>
+#include <stdio.h>
+#include <ros/profiler_records.h>
 
-int profiler_init(void);
+struct hw_trapframe;
+struct proc;
+struct file;
+struct cmdbuf;
+
+int profiler_configure(struct cmdbuf *cb);
+const char * const *profiler_configure_cmds(void);
+void profiler_init(void);
+void profiler_setup(void);
 void profiler_cleanup(void);
-void profiler_add_backtrace(uintptr_t pc, uintptr_t fp);
-void profiler_add_userpc(uintptr_t pc);
-void profiler_add_trace(uintptr_t eip);
+void profiler_add_kernel_backtrace(uintptr_t pc, uintptr_t fp);
+void profiler_add_user_backtrace(uintptr_t pc, uintptr_t fp);
+void profiler_add_trace(uintptr_t pc);
 void profiler_control_trace(int onoff);
 void profiler_add_hw_sample(struct hw_trapframe *hw_tf);
-int profiler_read(void *va, int);
 int profiler_size(void);
-
-#endif /* ROS_KERN_INC_PROFILER_H */
+int profiler_read(void *va, int n);
+void profiler_notify_mmap(struct proc *p, uintptr_t addr, size_t size, int prot,
+                                                 int flags, struct file *f, size_t offset);
+void profiler_notify_new_process(struct proc *p);
diff --git a/kern/include/ros/profiler_records.h b/kern/include/ros/profiler_records.h
new file mode 100644 (file)
index 0000000..73309a6
--- /dev/null
@@ -0,0 +1,46 @@
+/* Copyright (c) 2015 Google Inc
+ * Davide Libenzi <dlibenzi@google.com>
+ * See LICENSE for details.
+ */
+
+#pragma once
+
+#include <sys/types.h>
+
+#define PROFTYPE_KERN_TRACE64  1
+
+struct proftype_kern_trace64 {
+       uint64_t tstamp;
+       uint16_t cpu;
+       uint16_t num_traces;
+       uint64_t trace[0];
+} __attribute__((packed));
+
+#define PROFTYPE_USER_TRACE64  2
+
+struct proftype_user_trace64 {
+       uint64_t tstamp;
+       uint32_t pid;
+       uint16_t cpu;
+       uint16_t num_traces;
+       uint64_t trace[0];
+} __attribute__((packed));
+
+#define PROFTYPE_PID_MMAP64            3
+
+struct proftype_pid_mmap64 {
+       uint64_t tstamp;
+       uint64_t addr;
+       uint64_t size;
+       uint64_t offset;
+       uint32_t pid;
+       uint8_t path[0];
+} __attribute__((packed));
+
+#define PROFTYPE_NEW_PROCESS   4
+
+struct proftype_new_process {
+       uint64_t tstamp;
+       uint32_t pid;
+       uint8_t path[0];
+} __attribute__((packed));
index aa880e6..1bb7ed9 100644 (file)
@@ -54,7 +54,7 @@ void printipmask(void (*putch)(int, void**), void **putdat, uint8_t *ip);
 void printipv4(void (*putch)(int, void**), void **putdat, uint8_t *ip);
 
 /* #K */
-void trace_printk(const char *fmt, ...);
+void trace_printk(bool btrace, const char *fmt, ...);
 
 /* vsprintf.c (linux) */
 int vsscanf(const char *buf, const char *fmt, va_list args);
index 25b556f..e280389 100644 (file)
@@ -24,7 +24,7 @@ void *memset(void* p, int what, size_t sz);
 int   memcmp(const void* s1, const void* s2, size_t sz);
 void *memcpy(void* dst, const void* src, size_t sz);
 void *memmove(void *dst, const void* src, size_t sz);
-void *memchr(void* mem, int chr, int len);
+void *memchr(const void *mem, int chr, int len);
 
 void *memfind(const void *s, int c, size_t len);
 
index 022f5ad..c1ff7c0 100644 (file)
@@ -24,6 +24,7 @@
 #include <kmalloc.h>
 #include <vfs.h>
 #include <smp.h>
+#include <profiler.h>
 
 struct kmem_cache *vmr_kcache;
 
@@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, size_t len, int prot, int flags,
                }
        }
        spin_unlock(&p->vmr_lock);
+
+       profiler_notify_mmap(p, addr, len, prot, flags, file, offset);
+
        return (void*)addr;
 }
 
index 1f54bb1..948b8d6 100644 (file)
@@ -330,6 +330,8 @@ error_t proc_alloc(struct proc **pp, struct proc *parent, int flags)
                kmem_cache_free(proc_cache, p);
                return -ENOFREEPID;
        }
+       if (parent && parent->binary_path)
+               kstrdup(&p->binary_path, parent->binary_path);
        /* Set the basic status variables. */
        spinlock_init(&p->proc_lock);
        p->exitcode = 1337;     /* so we can see processes killed by the kernel */
index 772390f..2cc8b85 100644 (file)
@@ -1,36 +1,46 @@
+/* Copyright (c) 2015 Google Inc
+ * Davide Libenzi <dlibenzi@google.com>
+ * See LICENSE for details.
+ */
 
 #include <ros/common.h>
+#include <ros/mman.h>
+#include <sys/types.h>
 #include <smp.h>
 #include <trap.h>
 #include <kthread.h>
+#include <env.h>
+#include <process.h>
+#include <mm.h>
+#include <vfs.h>
 #include <kmalloc.h>
+#include <pmap.h>
+#include <kref.h>
 #include <atomic.h>
-#include <sys/types.h>
+#include <umem.h>
+#include <elf.h>
+#include <ns.h>
+#include <err.h>
+#include <string.h>
 #include "profiler.h"
 
-struct op_sample {
-       uint64_t hdr;
-       uint64_t event;
-       uint64_t data[0];
-};
+#define PROFILER_MAX_PRG_PATH  256
+#define PROFILER_BT_DEPTH 16
 
-struct op_entry {
-       struct op_sample *sample;
-       size_t size;
-       uint64_t *data;
-};
+#define VBE_MAX_SIZE(t) ((8 * sizeof(t) + 6) / 7)
 
 struct profiler_cpu_context {
-       spinlock_t lock;
-       int tracing;
        struct block *block;
+    int cpu;
+       int tracing;
+       size_t dropped_data_size;
 };
 
-static int profiler_queue_limit = 1024;
+static int profiler_queue_limit = 64 * 1024 * 1024;
 static size_t profiler_cpu_buffer_size = 65536;
-static size_t profiler_backtrace_depth = 16;
-static struct semaphore mtx = SEMAPHORE_INITIALIZER(mtx, 1);
-static int profiler_users = 0;
+static qlock_t profiler_mtx = QLOCK_INITIALIZER(profiler_mtx);
+static int tracing;
+static struct kref profiler_kref;
 static struct profiler_cpu_context *profiler_percpu_ctx;
 static struct queue *profiler_queue;
 
@@ -39,260 +49,467 @@ static inline struct profiler_cpu_context *profiler_get_cpu_ctx(int cpu)
        return profiler_percpu_ctx + cpu;
 }
 
-static inline uint64_t profiler_create_header(int cpu, size_t nbt)
+static inline char *vb_encode_uint64(char *data, uint64_t n)
+{
+       /* Classical variable bytes encoding. Encodes 7 bits at a time, using bit
+        * number 7 in the byte, as indicator of end of sequence (when zero).
+        */
+       for (; n >= 0x80; n >>= 7)
+               *data++ = (char) (n | 0x80);
+       *data++ = (char) n;
+
+       return data;
+}
+
+static struct block *profiler_buffer_write(struct profiler_cpu_context *cpu_buf,
+                                                                                  struct block *b)
 {
-       return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) |
-               (uint64_t) nbt;
+       if (b) {
+               qibwrite(profiler_queue, b);
+
+               if (qlen(profiler_queue) > profiler_queue_limit) {
+                       b = qget(profiler_queue);
+                       if (likely(b)) {
+                               cpu_buf->dropped_data_size += BLEN(b);
+                               freeb(b);
+                       }
+               }
+       }
+
+       return iallocb(profiler_cpu_buffer_size);
 }
 
-static inline size_t profiler_cpu_buffer_add_data(struct op_entry *entry,
-                                                                                                 const uintptr_t *values,
-                                                                                                 size_t count)
+static char *profiler_cpu_buffer_write_reserve(
+       struct profiler_cpu_context *cpu_buf, size_t size, struct block **pb)
 {
-       size_t i;
+       struct block *b = cpu_buf->block;
 
-       if (unlikely(count > entry->size))
-               count = entry->size;
-       for (i = 0; i < count; i++)
-               entry->data[i] = (uint64_t) values[i];
-       entry->size -= count;
-       entry->data += count;
+       if (unlikely((!b) || (b->lim - b->wp) < size)) {
+               cpu_buf->block = b = profiler_buffer_write(cpu_buf, b);
+        if (unlikely(!b))
+                       return NULL;
+       }
+       *pb = b;
 
-       return entry->size;
+       return (char *) b->wp;
 }
 
-static void free_cpu_buffers(void)
+static inline void profiler_cpu_buffer_write_commit(
+       struct profiler_cpu_context *cpu_buf, struct block *b, size_t size)
 {
-       kfree(profiler_percpu_ctx);
-       profiler_percpu_ctx = NULL;
+       b->wp += size;
+}
 
-       qclose(profiler_queue);
-       profiler_queue = NULL;
+static inline size_t profiler_max_envelope_size(void)
+{
+       return 2 * VBE_MAX_SIZE(uint64_t);
 }
 
-static int alloc_cpu_buffers(void)
+static void profiler_push_kernel_trace64(struct profiler_cpu_context *cpu_buf,
+                                                                                const uintptr_t *trace, size_t count)
 {
-       int i;
+       size_t i, size = sizeof(struct proftype_kern_trace64) +
+               count * sizeof(uint64_t);
+       struct block *b;
+       char *resptr = profiler_cpu_buffer_write_reserve(
+               cpu_buf, size + profiler_max_envelope_size(), &b);
+       char *ptr = resptr;
 
-       profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
-       if (!profiler_queue)
-               return -ENOMEM;
+       if (likely(ptr)) {
+               struct proftype_kern_trace64 *record;
 
-       qdropoverflow(profiler_queue, 1);
-       qnonblock(profiler_queue, 1);
+               ptr = vb_encode_uint64(ptr, PROFTYPE_KERN_TRACE64);
+               ptr = vb_encode_uint64(ptr, size);
 
-       profiler_percpu_ctx =
-               kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, KMALLOC_WAIT);
-       if (!profiler_percpu_ctx)
-               goto fail;
+               record = (struct proftype_kern_trace64 *) ptr;
+               ptr += size;
 
-       for (i = 0; i < num_cores; i++) {
-               struct profiler_cpu_context *b = &profiler_percpu_ctx[i];
+               record->tstamp = nsec();
+               record->cpu = cpu_buf->cpu;
+               record->num_traces = count;
+               for (i = 0; i < count; i++)
+                       record->trace[i] = (uint64_t) trace[i];
+
+               profiler_cpu_buffer_write_commit(cpu_buf, b, ptr - resptr);
+       }
+}
+
+static void profiler_push_user_trace64(struct profiler_cpu_context *cpu_buf,
+                                                                          struct proc *p, const uintptr_t *trace,
+                                                                          size_t count)
+{
+       size_t i, size = sizeof(struct proftype_user_trace64) +
+               count * sizeof(uint64_t);
+       struct block *b;
+       char *resptr = profiler_cpu_buffer_write_reserve(
+               cpu_buf, size + profiler_max_envelope_size(), &b);
+       char *ptr = resptr;
 
-               b->tracing = 0;
-               spinlock_init_irqsave(&b->lock);
+       if (likely(ptr)) {
+               struct proftype_user_trace64 *record;
+
+               ptr = vb_encode_uint64(ptr, PROFTYPE_USER_TRACE64);
+               ptr = vb_encode_uint64(ptr, size);
+
+               record = (struct proftype_user_trace64 *) ptr;
+               ptr += size;
+
+               record->tstamp = nsec();
+               record->pid = p->pid;
+               record->cpu = cpu_buf->cpu;
+               record->num_traces = count;
+               for (i = 0; i < count; i++)
+                       record->trace[i] = (uint64_t) trace[i];
+
+               profiler_cpu_buffer_write_commit(cpu_buf, b, ptr - resptr);
        }
+}
+
+static void profiler_push_pid_mmap(struct proc *p, uintptr_t addr, size_t msize,
+                                                                  size_t offset, const char *path)
+{
+       size_t i, plen = strlen(path) + 1,
+               size = sizeof(struct proftype_pid_mmap64) + plen;
+       char *resptr = kmalloc(size + profiler_max_envelope_size(), 0);
+
+       if (likely(resptr)) {
+               char *ptr = resptr;
+               struct proftype_pid_mmap64 *record;
+
+               ptr = vb_encode_uint64(ptr, PROFTYPE_PID_MMAP64);
+               ptr = vb_encode_uint64(ptr, size);
 
-       return 0;
+               record = (struct proftype_pid_mmap64 *) ptr;
+               ptr += size;
 
-fail:
-       qclose(profiler_queue);
-       profiler_queue = NULL;
-       return -ENOMEM;
+               record->tstamp = nsec();
+               record->pid = p->pid;
+               record->addr = addr;
+               record->size = msize;
+               record->offset = offset;
+               memcpy(record->path, path, plen);
+
+               qiwrite(profiler_queue, resptr, (int) (ptr - resptr));
+
+               kfree(resptr);
+       }
 }
 
-int profiler_init(void)
+static void profiler_push_new_process(struct proc *p)
 {
-       int error = 0;
+       size_t i, plen = strlen(p->binary_path) + 1,
+               size = sizeof(struct proftype_new_process) + plen;
+       char *resptr = kmalloc(size + profiler_max_envelope_size(), 0);
 
-       sem_down(&mtx);
-       if (!profiler_queue)
-               error = alloc_cpu_buffers();
-       profiler_users++;
-       sem_up(&mtx);
+       if (likely(resptr)) {
+               char *ptr = resptr;
+               struct proftype_new_process *record;
+
+               ptr = vb_encode_uint64(ptr, PROFTYPE_NEW_PROCESS);
+               ptr = vb_encode_uint64(ptr, size);
 
-       return error;
+               record = (struct proftype_new_process *) ptr;
+               ptr += size;
+
+               record->tstamp = nsec();
+               record->pid = p->pid;
+               memcpy(record->path, p->binary_path, plen);
+
+               qiwrite(profiler_queue, resptr, (int) (ptr - resptr));
+
+               kfree(resptr);
+       }
 }
 
-void profiler_cleanup(void)
+static void profiler_emit_current_system_status(void)
 {
-       sem_down(&mtx);
-       profiler_users--;
-       if (profiler_users == 0)
-               free_cpu_buffers();
-       sem_up(&mtx);
+       void enum_proc(struct vm_region *vmr, void *opaque)
+       {
+               struct proc *p = (struct proc *) opaque;
+
+               profiler_notify_mmap(p, vmr->vm_base, vmr->vm_end - vmr->vm_base,
+                                                        vmr->vm_prot, vmr->vm_flags, vmr->vm_file,
+                                                        vmr->vm_foff);
+       }
+
+       ERRSTACK(1);
+       struct process_set pset;
+
+       proc_get_set(&pset);
+       if (waserror()) {
+               proc_free_set(&pset);
+               nexterror();
+       }
+
+       for (size_t i = 0; i < pset.num_processes; i++)
+               enumerate_vmrs(pset.procs[i], enum_proc, pset.procs[i]);
+
+       poperror();
+       proc_free_set(&pset);
 }
 
-static struct block *profiler_cpu_buffer_write_reserve(
-       struct profiler_cpu_context *cpu_buf, struct op_entry *entry, size_t size)
+static inline bool profiler_is_tracing(struct profiler_cpu_context *cpu_buf)
 {
-       struct block *b = cpu_buf->block;
-    size_t totalsize = sizeof(struct op_sample) +
-               size * sizeof(entry->sample->data[0]);
-
-       if (unlikely((!b) || (b->lim - b->wp) < totalsize)) {
-               if (b)
-                       qibwrite(profiler_queue, b);
-               /* For now. Later, we will grab a block off the
-                * emptyblock queue.
-                */
-               cpu_buf->block = b = iallocb(profiler_cpu_buffer_size);
-        if (unlikely(!b)) {
-                       printk("%s: fail\n", __func__);
-                       return NULL;
+       if (unlikely(cpu_buf->tracing < 0)) {
+               if (cpu_buf->block) {
+                       qibwrite(profiler_queue, cpu_buf->block);
+
+                       cpu_buf->block = NULL;
                }
+
+               cpu_buf->tracing = 0;
        }
-       entry->sample = (struct op_sample *) b->wp;
-       entry->size = size;
-       entry->data = entry->sample->data;
 
-       b->wp += totalsize;
+       return (cpu_buf->tracing != 0) ? TRUE : FALSE;
+}
 
-       return b;
+static void free_cpu_buffers(void)
+{
+       kfree(profiler_percpu_ctx);
+       profiler_percpu_ctx = NULL;
+
+       if (profiler_queue) {
+               qclose(profiler_queue);
+               profiler_queue = NULL;
+       }
 }
 
-static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
-                                                                         uintptr_t pc, unsigned long event)
+static void alloc_cpu_buffers(void)
 {
        ERRSTACK(1);
-       struct op_entry entry;
-       struct block *b;
+       int i;
 
+       profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
+       if (!profiler_queue)
+               error(ENOMEM, NULL);
        if (waserror()) {
-               poperror();
-               printk("%s: failed\n", __func__);
-               return 1;
+               free_cpu_buffers();
+               nexterror();
        }
 
-       b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, 0);
-       if (likely(b)) {
-               entry.sample->hdr = profiler_create_header(core_id(), 1);
-               entry.sample->event = (uint64_t) event;
-               profiler_cpu_buffer_add_data(&entry, &pc, 1);
+       qdropoverflow(profiler_queue, TRUE);
+       qnonblock(profiler_queue, TRUE);
+
+       profiler_percpu_ctx =
+               kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, KMALLOC_WAIT);
+
+       for (i = 0; i < num_cores; i++) {
+               struct profiler_cpu_context *b = &profiler_percpu_ctx[i];
+
+               b->cpu = i;
        }
-       poperror();
+}
+
+static long profiler_get_checked_value(const char *value, long k, long minval,
+                                                                          long maxval)
+{
+       long lvalue = strtol(value, NULL, 0) * k;
 
-       return b == NULL;
+       if (lvalue < minval)
+               error(EFAIL, "Value should be greater than %ld", minval);
+       if (lvalue > maxval)
+               error(EFAIL, "Value should be lower than %ld", maxval);
+
+       return lvalue;
 }
 
-static inline void profiler_begin_trace(struct profiler_cpu_context *cpu_buf)
+int profiler_configure(struct cmdbuf *cb)
 {
-       cpu_buf->tracing = 1;
+       if (!strcmp(cb->f[0], "prof_qlimit")) {
+               if (cb->nf < 2)
+                       error(EFAIL, "prof_qlimit KB");
+               if (kref_refcnt(&profiler_kref) > 0)
+                       error(EFAIL, "Profiler already running");
+               profiler_queue_limit = (int) profiler_get_checked_value(
+                       cb->f[1], 1024, 1024 * 1024, max_pmem / 32);
+       } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
+               if (cb->nf < 2)
+                       error(EFAIL, "prof_cpubufsz KB");
+               profiler_cpu_buffer_size = (size_t) profiler_get_checked_value(
+                       cb->f[1], 1024, 16 * 1024, 1024 * 1024);
+       } else {
+               return 0;
+       }
+
+       return 1;
 }
 
-static inline void profiler_end_trace(struct profiler_cpu_context *cpu_buf)
+const char* const *profiler_configure_cmds(void)
 {
-       cpu_buf->tracing = 0;
+       static const char * const cmds[] = {
+               "prof_qlimit", "prof_cpubufsz",
+               NULL
+       };
+
+       return cmds;
 }
 
-static void profiler_cpubuf_flushone(int core, int newbuf)
+static void profiler_release(struct kref *kref)
 {
-       struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(core);
-
-       spin_lock_irqsave(&cpu_buf->lock);
-       if (cpu_buf->block) {
-               printk("Core %d has data\n", core);
-               qibwrite(profiler_queue, cpu_buf->block);
-               printk("After qibwrite in %s, profiler_queue len %d\n",
-                          __func__, qlen(profiler_queue));
-       }
-       if (newbuf)
-               cpu_buf->block = iallocb(profiler_cpu_buffer_size);
+       bool got_reference = FALSE;
+
+       assert(kref == &profiler_kref);
+       qlock(&profiler_mtx);
+       /* Make sure we did not race with profiler_setup(), that got the
+        * profiler_mtx lock just before us, and re-initialized the profiler
+        * for a new user.
+        * If we race here from another profiler_release() (user did a
+        * profiler_setup() immediately followed by a profiler_cleanup()) we are
+        * fine because free_cpu_buffers() can be called multiple times.
+        */
+       if (!kref_get_not_zero(kref, 1))
+               free_cpu_buffers();
        else
-               cpu_buf->block = NULL;
-       spin_unlock_irqsave(&cpu_buf->lock);
+               got_reference = TRUE;
+       qunlock(&profiler_mtx);
+       /* We cannot call kref_put() within the profiler_kref lock, as such call
+        * might trigger anohter call to profiler_release().
+        */
+       if (got_reference)
+               kref_put(kref);
+}
+
+void profiler_init(void)
+{
+       assert(kref_refcnt(&profiler_kref) == 0);
+       kref_init(&profiler_kref, profiler_release, 0);
+}
+
+void profiler_setup(void)
+{
+       ERRSTACK(1);
+
+       qlock(&profiler_mtx);
+       if (waserror()) {
+               qunlock(&profiler_mtx);
+               nexterror();
+       }
+       if (!profiler_queue)
+               alloc_cpu_buffers();
+
+       profiler_emit_current_system_status();
+
+       /* Do this only when everything is initialized (as last init operation).
+        */
+       __kref_get(&profiler_kref, 1);
+
+       poperror();
+       qunlock(&profiler_mtx);
+}
+
+void profiler_cleanup(void)
+{
+       kref_put(&profiler_kref);
 }
 
 void profiler_control_trace(int onoff)
 {
        int core;
 
+       tracing = onoff;
        for (core = 0; core < num_cores; core++) {
                struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(core);
 
-               cpu_buf->tracing = onoff;
-               if (onoff) {
+               /*
+                * We cannot access directly other CPU buffers from here, in order
+                * to issue a flush. So, when disabling, we set tracing = -1, and
+                * we let profiler_is_tracing() to perform it at the next timer tick.
+                */
+               cpu_buf->tracing = onoff ? 1 : -1;
+               if (onoff)
                        printk("Enable tracing on %d\n", core);
-               } else {
+               else
                        printk("Disable tracing on %d\n", core);
-                       profiler_cpubuf_flushone(core, 0);
-               }
        }
 }
 
 void profiler_add_trace(uintptr_t pc)
 {
-       struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(core_id());
-
-       if (profiler_percpu_ctx && cpu_buf->tracing)
-               profiler_add_sample(cpu_buf, pc, nsec());
+       if (is_user_raddr((void *) pc, 1))
+               profiler_add_user_backtrace(pc, 0);
+       else
+               profiler_add_kernel_backtrace(pc, 0);
 }
 
-/* Format for samples:
- * first word:
- * high 8 bits is ee, which is an invalid address on amd64.
- * next 8 bits is protocol version
- * next 16 bits is unused, MBZ. Later, we can make it a packet type.
- * next 16 bits is core id
- * next 8 bits is unused
- * next 8 bits is # PCs following. This should be at least 1, for one EIP.
- *
- * second word is time in ns.
- *
- * Third and following words are PCs, there must be at least one of them.
- */
-void profiler_add_backtrace(uintptr_t pc, uintptr_t fp)
+void profiler_add_kernel_backtrace(uintptr_t pc, uintptr_t fp)
 {
-       int cpu = core_id();
-       struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(cpu);
-
-       if (profiler_percpu_ctx && cpu_buf->tracing) {
-               struct op_entry entry;
-               struct block *b;
-               uintptr_t bt_pcs[profiler_backtrace_depth];
-               size_t n = backtrace_list(pc, fp, bt_pcs, profiler_backtrace_depth);
-
-               b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, n);
-               if (likely(b)) {
-                       entry.sample->hdr = profiler_create_header(cpu, n);
-                       entry.sample->event = nsec();
-                       profiler_cpu_buffer_add_data(&entry, bt_pcs, n);
+       if (kref_get_not_zero(&profiler_kref, 1)) {
+               struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(core_id());
+
+               if (profiler_percpu_ctx && profiler_is_tracing(cpu_buf)) {
+                       uintptr_t trace[PROFILER_BT_DEPTH];
+                       size_t n = 1;
+
+                       trace[0] = pc;
+                       if (likely(fp))
+                               n = backtrace_list(pc, fp, trace + 1,
+                                                                  PROFILER_BT_DEPTH - 1) + 1;
+
+                       profiler_push_kernel_trace64(cpu_buf, trace, n);
                }
+               kref_put(&profiler_kref);
        }
 }
 
-void profiler_add_userpc(uintptr_t pc)
+void profiler_add_user_backtrace(uintptr_t pc, uintptr_t fp)
 {
-       int cpu = core_id();
-       struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(cpu);
-
-       if (profiler_percpu_ctx && cpu_buf->tracing) {
-               struct op_entry entry;
-               struct block *b = profiler_cpu_buffer_write_reserve(cpu_buf,
-                                                                                                                       &entry, 1);
-
-               if (likely(b)) {
-                       entry.sample->hdr = profiler_create_header(cpu, 1);
-                       entry.sample->event = nsec();
-                       profiler_cpu_buffer_add_data(&entry, &pc, 1);
+       if (kref_get_not_zero(&profiler_kref, 1)) {
+               struct proc *p = current;
+               struct profiler_cpu_context *cpu_buf = profiler_get_cpu_ctx(core_id());
+
+               if (p && profiler_percpu_ctx && profiler_is_tracing(cpu_buf)) {
+                       uintptr_t trace[PROFILER_BT_DEPTH];
+                       size_t n = 1;
+
+                       trace[0] = pc;
+                       if (likely(fp))
+                               n = user_backtrace_list(pc, fp, trace + 1,
+                                                                               PROFILER_BT_DEPTH - 1) + 1;
+
+                       profiler_push_user_trace64(cpu_buf, p, trace, n);
                }
+               kref_put(&profiler_kref);
        }
 }
 
 void profiler_add_hw_sample(struct hw_trapframe *hw_tf)
 {
        if (in_kernel(hw_tf))
-               profiler_add_backtrace(get_hwtf_pc(hw_tf), get_hwtf_fp(hw_tf));
+               profiler_add_kernel_backtrace(get_hwtf_pc(hw_tf), get_hwtf_fp(hw_tf));
        else
-               profiler_add_userpc(get_hwtf_pc(hw_tf));
+               profiler_add_user_backtrace(get_hwtf_pc(hw_tf), get_hwtf_fp(hw_tf));
 }
 
 int profiler_size(void)
 {
-       return qlen(profiler_queue);
+       return profiler_queue ? qlen(profiler_queue) : 0;
 }
 
 int profiler_read(void *va, int n)
 {
-       return qread(profiler_queue, va, n);
+       return profiler_queue ? qread(profiler_queue, va, n) : 0;
+}
+
+void profiler_notify_mmap(struct proc *p, uintptr_t addr, size_t size, int prot,
+                                                 int flags, struct file *f, size_t offset)
+{
+       if (kref_get_not_zero(&profiler_kref, 1)) {
+               if (f && (prot & PROT_EXEC) && profiler_percpu_ctx && tracing) {
+                       char path_buf[PROFILER_MAX_PRG_PATH];
+                       char *path = file_abs_path(f, path_buf, sizeof(path_buf));
+
+                       if (likely(path))
+                               profiler_push_pid_mmap(p, addr, size, offset, path);
+               }
+               kref_put(&profiler_kref);
+       }
+}
+
+void profiler_notify_new_process(struct proc *p)
+{
+       if (kref_get_not_zero(&profiler_kref, 1)) {
+               if (profiler_percpu_ctx && tracing && p->binary_path)
+                       profiler_push_new_process(p);
+               kref_put(&profiler_kref);
+       }
 }
index 9d9bf6d..483417e 100644 (file)
@@ -141,13 +141,13 @@ strrchr(const char *s, char c)
        return lastc;
 }
 
-void *
-memchr(void* mem, int chr, int len)
+void *memchr(const void *mem, int chr, int len)
 {
-       char* s = (char*)mem;
-       for(int i = 0; i < len; i++)
-               if(s[i] == (char)chr)
-                       return s+i;
+       char *s = (char*) mem;
+
+       for (int i = 0; i < len; i++)
+               if (s[i] == (char) chr)
+                       return s + i;
        return NULL;
 }
 
index 2f97695..31d6f21 100644 (file)
@@ -21,6 +21,7 @@
 #include <trap.h>
 #include <syscall.h>
 #include <kmalloc.h>
+#include <profiler.h>
 #include <stdio.h>
 #include <frontend.h>
 #include <colored_caches.h>
@@ -31,6 +32,7 @@
 #include <smp.h>
 #include <arsc_server.h>
 #include <event.h>
+#include <kprof.h>
 #include <termios.h>
 #include <manager.h>
 
@@ -41,9 +43,6 @@ uint32_t systrace_bufidx = 0;
 size_t systrace_bufsize = 0;
 spinlock_t systrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
 
-// for now, only want this visible here.
-void kprof_write_sysrecord(char *pretty_buf, size_t len);
-
 static bool __trace_this_proc(struct proc *p)
 {
        return (systrace_flags & SYSTRACE_ON) &&
@@ -139,7 +138,7 @@ static void systrace_finish_trace(struct kthread *kthread, long retval)
                trace->retval = retval;
                kthread->trace = 0;
                pretty_len = systrace_fill_pretty_buf(trace);
-               kprof_write_sysrecord(trace->pretty_buf, pretty_len);
+               kprof_tracedata_write(trace->pretty_buf, pretty_len);
                if (systrace_flags & SYSTRACE_LOUD)
                        printk("EXIT %s", trace->pretty_buf);
                kfree(trace);
@@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char *path, size_t path_l,
        user_memdup_free(p, kargenv);
        __proc_ready(new_p);
        pid = new_p->pid;
+       profiler_notify_new_process(new_p);
        proc_decref(new_p);     /* give up the reference created in proc_create() */
        return pid;
 error_load_elf:
@@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
 
        printd("[PID %d] fork PID %d\n", e->pid, env->pid);
        ret = env->pid;
+       profiler_notify_new_process(env);
        proc_decref(env);       /* give up the reference created in proc_alloc() */
        return ret;
 }
@@ -743,7 +744,7 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
                     char *argenv, size_t argenv_l)
 {
        int ret = -1;
-       char *t_path;
+       char *t_path = NULL;
        struct file *program;
        struct per_cpu_info *pcpui = &per_cpu_info[core_id()];
        int8_t state = 0;
@@ -760,10 +761,6 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
                set_errno(EINVAL);
                return -1;
        }
-       t_path = copy_in_path(p, path, path_l);
-       if (!t_path)
-               return -1;
-       proc_replace_binary_path(p, t_path);
 
        disable_irqsave(&state);        /* protect cur_ctx */
        /* Can't exec if we don't have a current_ctx to restart (if we fail).  This
@@ -805,10 +802,14 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
                set_error(EINVAL, "Failed to unpack the args");
                return -1;
        }
-
+       t_path = copy_in_path(p, path, path_l);
+       if (!t_path) {
+               user_memdup_free(p, kargenv);
+               return -1;
+       }
        /* This could block: */
        /* TODO: 9ns support */
-       program = do_file_open(p->binary_path, O_READ, 0);
+       program = do_file_open(t_path, O_READ, 0);
        if (!program)
                goto early_error;
        if (!is_valid_elf(program)) {
@@ -817,6 +818,7 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
        }
        /* This is the point of no return for the process. */
        /* progname is argv0, which accounts for symlinks */
+       proc_replace_binary_path(p, t_path);
        proc_set_progname(p, argc ? argv[0] : NULL);
        proc_init_procdata(p);
        p->procinfo->heap_bottom = 0;
@@ -847,6 +849,7 @@ mid_error:
         * error value (errno is already set). */
        kref_put(&program->f_kref);
 early_error:
+       free_path(p, t_path);
        finish_current_sysc(-1);
        systrace_finish_trace(pcpui->cur_kthread, -1);
 success: