Barret Rhoden [Mon, 28 Mar 2016 16:45:38 +0000 (12:45 -0400)]
Stop calling qremove() outside qio.c
That function was meant to be called with the queue locked. #mnt doesn't
lock it - nor does it or any other file have a mechanism to do so. So it's
always a bad idea to call qremove. It looks like #mnt wants qget().
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 22:11:36 +0000 (18:11 -0400)]
qio: Consolidate producer functions
There are a bunch of qio functions for adding to a queue:
- qbwrite (append a single block)
- qibwrite (append a single block from IRQ ctx)
- qwrite (wrapper and calls qbwrite)
- qiwrite (mostly the same wrapper, calls qibwrite)
- qpass (append a string of blocks)
- qpassnolim (same, but with no limit)
Anyway, all of these functions do very similar things, but with a few
options. Now all of those functions call the same underlying function
(with the same front-wrapper for qwrite/qiwrite), subject to a few flags.
There are some subtle changes. qpass didn't call kick or bypass before.
Although I could control that with a QIO flag, it seems like if someone
wanted a bypass, then they should always get it.
Of course, kick and bypass seem rather special purpose, and might just need
overhauled at some point.
Part of the motivation for this, other than understandability and ease of
maintenance, was that I'll be adding more QIO functions to control whether
or not we block on an individual call (think chan->flag & O_NONBLOCK).
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 21:42:22 +0000 (17:42 -0400)]
Stop setting a kick for TCP's RQ
We never call it, since the RQ is written to by qpassnolim(), which doesn't
call kick internally. I verified (out-of-tree) with a fake kick method and
netperf that the kick wasn't called, at least in normal operation.
It caused a problem when I tried to clean up qio.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 20:18:54 +0000 (16:18 -0400)]
qio: remove qproduce()
If we ever need it, we can resurrect it.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 15:26:25 +0000 (11:26 -0400)]
qio: Clean up locking
Locking was a bit of a mess, including qlocks and spinlocks.
For the most part, the ordering was qlock->spinlock, but most everything
was protected by the spinlock and the qlock didn't do much.
There are a few issues:
The rlock/wlock qlocks seem mostly to serialize rendez sleepers. Plan 9
needed this; we do not. There might be a 'thundering herd' wakeup effect
if you have a large number of sleepers (wake up just to see the condition,
vs sleeping on the qlock that doesn't get released until at least one
thread woke up). If that's an issue, we can fix it later; maybe in rendez
or at least closer to the rendez. The qlocks might also be 'protecting'
the spinlock, but it don't see much value in that.
The mess with 'should_free' can be ripped out too - we just free the block
in a couple cases, and now we're explicit about it. That was nasty.
qwait() was doing weird things with locks too. It internally unlocks for
you. Surprise! Instead I'll just have it lock for you when it returns -
no mystery.
The rlock might have been protecting things related to bl2mem, where we
briefly let go of the spinlock. The qlock might have been preserving some
invariant (who knows?!). I decided to just hold the spinlock across
bl2mem. I avoided doing that in the past since that function could PF.
But we're busted regardless of whether we had a qlock or a spinlock; either
would deadlock if the PF handler tries to use the device to resolve the
fault.
Plus we have a host of memcpys and memmoves that touch user memory in qio.
We need to fix PFs overall - this patch should have made things no worse in
that regard (and hopefully much clearer).
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 16:45:25 +0000 (12:45 -0400)]
Make all block allocations use the same func [2/2]
Other than ns.h and allocb.c, this was done with
scripts/spatch/malloc.cocci.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 16:01:15 +0000 (12:01 -0400)]
Make iallocb just an _allocb(x, 0) [1/2]
Part 1 of making block allocation controlled by a flag, like all other
memory allocation.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 16:28:55 +0000 (12:28 -0400)]
Rename KMALLOC_* -> MEM_* [2/2]
The rename were done with this:
@@
@@
-KMALLOC_WAIT
+MEM_WAIT
@@
@@
-KMALLOC_ERROR
+MEM_ERROR
except for uses of the names in comments.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 16:13:31 +0000 (12:13 -0400)]
Rework memory allocation flags [1/2]
We might need something for an atomic allocation. We've just been writing
'0' all over the place, which is a little hard to keep track of.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 25 Mar 2016 14:36:58 +0000 (10:36 -0400)]
Remove the O_NONBLOCK fcntl() intercept (XCC)
We need to change how nonblocking works, and make it a flag on the chan.
It's too much of a pain to have special casing for every device type -
let's just support fcntl and have O_NONBLOCK be one of its uses.
This will temporarily break non-blocking #ip conversations from glibc.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 31 Mar 2016 19:10:23 +0000 (15:10 -0400)]
Add a chan_ctl devop; support fcntl on chans
fcntl() works on chans. A device can intercept the operation, do whatever
it needs to do, and optionally error out. I imagine #devmnt will need to
send a new 9p message for this.
If there are no errors, the chan flags get updated (the CEXTERNAL_FLAGS).
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 31 Mar 2016 19:03:20 +0000 (15:03 -0400)]
Fix chan ref leak in fd_setfl()
We could have thrown an error and leaked a ref to the chan.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 24 Mar 2016 16:46:59 +0000 (12:46 -0400)]
Intercept vfprintf() instead of printf() (XCC)
This will protect vcore context from much more of the family of the printf
functions. We still need akaros_printf(), due to the 'multiple libcs'
problem.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 24 Mar 2016 16:45:16 +0000 (12:45 -0400)]
Make akaros_vfprintf() take a stream (XCC)
We were implying stdout, but the caller could use something else.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 24 Mar 2016 16:39:05 +0000 (12:39 -0400)]
Properly align vcore stacks on x86
Glibc _start expects a 16-byte aligned stack, since it is working in asm.
However, the vcore stacks were also getting a 16-byte aligned stack, but
they were C functions and needed to be odd-8-byte aligned.
This commit creates arch-dependent vcore entry functions. x86's is in
assembly, where it can restore the odd-8-byte invariant.
Note that we can't have both vcore.S and vcore.c - both will be built as
vcore.o. I renamed the asm ones with _asm.
Incidentally, this happened on a printf with floats/xmms in vcore context,
and was triggered by the "movaps xmm,rbp" due to the variadic function ABI.
It also shows that we weren't touching xmm's in vcore context, since we
would have GP faulted.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Thu, 24 Mar 2016 13:14:35 +0000 (09:14 -0400)]
Send SIGCHLD to the parent when a process exits
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 22:10:27 +0000 (18:10 -0400)]
Remove SYS_cgetc (XCC)
Reinstall your kernel headers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 22:06:53 +0000 (18:06 -0400)]
Remove SYS_cputs (XCC)
As Ron said, "Alas, SYS_cputs, we knew you well." It was probably our
first syscall, but we no longer need it.
Reinstall your kernel headers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 22:05:51 +0000 (18:05 -0400)]
Use write() in parlib/debug.c
This is the printf that gets called from vcore context. We want those
prints to go to wherever stdout is, instead of always to the kernel
directly.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 20:16:41 +0000 (16:16 -0400)]
Use the POSIX isatty() (XCC)
This is somewhat bullshit, since our tcgetattr() always returns whatever my
Linux terminal said back in 2009.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 19:23:05 +0000 (15:23 -0400)]
Map PTEs for MAP_SHARED | MAP_LOCKED files on fork
If you had a process that forked but did not exec, then the read-only parts
of the binary would not be in its address space. Those parts would be in
the page cache, so long as the parent was still around (which had the VMR
MAP_LOCKED) (or if it left and we flushed the cache).
Then later, the child asks the kernel to perform a syscall on one of its
addresses in the read-only section, e.g. .rodata. The kernel would then
page fault.
Right now, the kernel won't attempt to handle a PF *of its own* by talking
to the page cache. Eventually we'll need to do this. But it's also just
wrong for us to not have MAP_LOCKED VMRs present in a process's page table.
I <3 fork.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 23 Mar 2016 19:21:08 +0000 (15:21 -0400)]
Fix minor leaks in mm.c
If we ever had an error, we'd bail out but forget to decref and free
memory.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 21:45:49 +0000 (17:45 -0400)]
Remove the double-close() warning
It's legal to attempt to close a bad FD. We had a nasty bug that this
caught a long time ago, but now we have programs that do this all the time
(ssh).
Since I fixed the shutdown() problem, we no longer have any known-buggy
programs that are double-closing.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Thu, 31 Mar 2016 16:19:14 +0000 (09:19 -0700)]
Bump the size of the ancillary state (XCC)
Increases the size of extended_region by 8 bytes to accommodate
the PKRU
Reinstall your kernel headers and maybe rebuild apps.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
[rebuild warning]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Tue, 29 Mar 2016 18:26:31 +0000 (11:26 -0700)]
Moved timing parameters into proc_global_info (XCC)
Moved tsc_freq, timing_overhead and bus freq into __proc_global_info.
PIT stuff is now in k/a/x/time.c as a static. timing_overhead has been
renamed to tsc_overhead.
Reinstall your kernel headers.
Signed-off-by: GanShun <ganshun@gmail.com>
[kernel headers warning, removed extraneous comment]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Wed, 30 Mar 2016 20:44:28 +0000 (16:44 -0400)]
Remove unused variable from prep_syscalls().
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Ronald G. Minnich [Fri, 25 Mar 2016 15:59:42 +0000 (08:59 -0700)]
Add a control file in #cons to support killing children.
This is needed for ssh support for ^C.
The ssh server, when it sees a ^C, opens and writes a command
to #cons/killkids.
Right now the command is ignored, but that might change.
We might at some point decide to implement /proc/self, and this can
move there. It's arguably a bit gross to have it in #cons.
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
[ slight touchups ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Ronald G. Minnich [Thu, 24 Mar 2016 21:19:25 +0000 (14:19 -0700)]
passwd: put in a one line passwd file for Unix programs
This is unfortunate but we'll never see the end of it otherwise.
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Thu, 24 Mar 2016 17:54:05 +0000 (10:54 -0700)]
Apic msr exit handling added with timer thread support
Added emsr_apic in user/vmm/vmxmsr to write all apic msr writes to the
vapic page. Started the timer thread using the vector linux writes to the
timer msr. We ignore the initial count as long as its not 0 and we just
inject a timer interrupt at 100hz.
Signed-off-by: GanShun <ganshun@gmail.com>
[touched up function declaration formatting]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Wed, 23 Mar 2016 16:59:36 +0000 (09:59 -0700)]
Moved Trap Injection macros to the correct location (XCC)
Reinstall your kernel headers.
Signed-off-by: GanShun <ganshun@gmail.com>
[kernel header warning]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 22 Mar 2016 21:20:42 +0000 (17:20 -0400)]
Cosmetic: Change tabs to spaces in glibc Versions file.
This is purely a cosmetic change: change tabs to spaces
in glibc-2.19-akaros/sysdeps/akaros/Versions for consistency.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Geoff Collyer [Thu, 17 Mar 2016 23:32:43 +0000 (16:32 -0700)]
Correct multicast setup to make ipv6 work reliably.
Signed-off-by: Geoff Collyer <geoff.collyer@gmail.com>
[checkpatch touchups]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Fri, 11 Mar 2016 00:04:50 +0000 (16:04 -0800)]
Upgrade parlib fp state handling, use proc_global_info (XCC)
Rebuild kernel headers and all user apps!
This upgrades parlib so it also has the fp state upgrades
recently made to the Akaros kernel (xsave, xsaveopt, xrstor),
and also makes Akaros use proc_global_info for x86_default_xcr0
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
[ touched up a checkpatch warning ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Tue, 22 Mar 2016 00:15:21 +0000 (17:15 -0700)]
User library changes to take a guest_thread instead of a vmctl.
Removed vmctls from user/. user libraries just use a guest_thread now.
Signed-off-by: GanShun <ganshun@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Tue, 22 Mar 2016 16:17:46 +0000 (09:17 -0700)]
Modify cpu feat barrier for enabling CR4_OSXSAVE
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 22 Mar 2016 19:45:56 +0000 (15:45 -0400)]
Implement sched_getcpu() (XCC)
Added a 'sched_getcpu()' routine that returns the
current pcore ID, as an analog to the routine of
the same name in Linux. Rebuild your toolchain.
Signed-off-by: Dan Cross <crossd@gmail.com>
[changed the commit subject from Linux compat changes]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 22 Mar 2016 16:32:47 +0000 (12:32 -0400)]
Minor changes to build C++ threads in gcc (XCC)
Added a constant and modified our Makefile to enable
C++ thread support when building GCC. Rebuild your
toolchain.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 19:43:08 +0000 (15:43 -0400)]
Add debugging info to ipchaninfo()
Reports nonblock status and whether or not the conv is tapped.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 19:39:22 +0000 (15:39 -0400)]
Implement shutdown() (XCC)
We had been just closing the FD, which is clearly wrong (you may have been
getting warnings from the kernel about this). Best case, you just get a
warning. Worst case, you accidentally close another FD in a concurrent
program.
I don't know if the TCP code is right. It sends a FIN. Maybe it doesn't
send it the right way, or maybe we should do things for the other states
too.
It's better than it was before.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 15:53:38 +0000 (11:53 -0400)]
Fix a bunch of Rock warnings (XCC)
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 15:20:03 +0000 (11:20 -0400)]
Use a fork callback in select()
We need to flush our state in the child on a fork. Otherwise we'll think
we are still tracking the FDs, even though the underlying taps weren't
inherited.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 15:16:09 +0000 (11:16 -0400)]
Add callbacks for fork() (XCC)
If a process forks but does not exec, some user-level subsystems (e.g.
select) will need to run a callback in the child.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 14:02:05 +0000 (10:02 -0400)]
qio: Fire read taps on actual edges
We were only firing when the queue was Qstarved, which means that someone
had to attempt to drain the queue at some point. Thus if the queue was
drained exactly, but no one waited, then the tap wouldn't fire.
Now we fire whenever the queue was empty, not when it thought it had a
someone who starved it.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 22 Mar 2016 14:01:08 +0000 (10:01 -0400)]
qio: Fire writeable taps immediately
We were only firing when the queue was drained below the flow control
limit.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Wed, 16 Mar 2016 17:40:10 +0000 (13:40 -0400)]
Add git helper scripts and update Doc/Contributing
This updates the Contributing guidelines, providing instructions and
examples for how to submit code to the mailing list, e.g. send-email and
request-pull.
Of the scripts/git files, contributors will be most interested in
git-checkpatch and git-akaros-request-pull. All of the git scripts will
work as git subcommands if you put them in your PATH. e.g.
$ git checkpatch master..my_branch
or
$ git akaros-request-pull master my_repo my_branch
akaros-request-pull is just like the request-pull, but with an added
github URL for those who want to look at the patches on the world wide
web.
The other scripts, such as track-review, are mostly useful for code
reviewers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Thu, 10 Mar 2016 19:28:26 +0000 (11:28 -0800)]
Implement write combining on x86
MLX uses WC to write data payload into HCA buffers. The performance boost
over UC/UC- is significant.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
[ removed pat_init() and used the new PTE_ flags ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 11 Mar 2016 19:40:39 +0000 (14:40 -0500)]
Add write-combining memory mapping mode (XCC)
Use PTE_WRITECOMB for PTEs if you're manually building a mapping.
Reinstall your kernel headers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 11 Mar 2016 19:33:14 +0000 (14:33 -0500)]
x86: Initialize the PAT MSR
This sets up PAT so we can have WB, WC, WT, and UC- memory types via the
__PTE flags. The PCD and PWT bits don't necessarily disable caching or set
write-through anymore; they are indexes into the PAT table. (This was
always happening btw, since anything we run on has PAT support).
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 11 Mar 2016 19:21:22 +0000 (14:21 -0500)]
Stop using PTE_PCD and PTE_PWT directly (XCC)
If you want caching disabled, use PTE_NOCACHE. I'll be changing the
specific PAT settings shortly so accessing those bits directly will cause
trouble.
Regarding the change to compat.h, there's no difference between PTE_NOCACHE
and just a raw PTE_PCD in Akaros as of right now (PCD == UC- and
PCD|PWT == UC, and the minus only matters if MTRRs set WC, which they
aren't at the moment).
Reinstall your kernel headers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 11 Mar 2016 15:06:59 +0000 (10:06 -0500)]
Fix implicit declaration in procinfo.h (XCC)
The user-specific part of procinfo has a helper function that makes a
syscall. That was implying __ros_syscall_errno() existed. Ideally, we
wouldn't do that, but doing otherwise causes include loops.
Alternatively, we could just move or remove the functions. Considering
they are for debugging, just externing in the function seems fine.
Reinstall your kernel headers.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Fri, 11 Mar 2016 14:55:21 +0000 (09:55 -0500)]
Remove cpu_feats from kernel-features.h (XCC)
It turns out that glibc doesn't need its own copy of the cpu_feats, and it
can just include parlib's. It may be that some code in glibc won't be
able to include parlib files. If that's the case, and those files need
cpu_feats, then we can revisit this.
This popped up as a problem when a file in glibc included both
kernel-features. and parlib/cpu_feat.h.
Rebuild glibc if you want.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Thu, 10 Mar 2016 21:39:15 +0000 (13:39 -0800)]
Removing extra run_vmthread calls.
These calls should only be made at the bottom of the while loop, otherwise
we run the risk of missing a vmexit.
Signed-off-by: GanShun <ganshun@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Thu, 10 Mar 2016 19:24:57 +0000 (11:24 -0800)]
Return real vendor/part id in query_device
ofed_perftest tool cares about vendor/part id.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 8 Mar 2016 19:48:59 +0000 (14:48 -0500)]
Clean up logic in MSR read/write functions.
These code paths could be cleaned up and a level of indentation removed.
Also, remove the use of atomic types as they are unneeded in this case.
Signed-off-by: Dan Cross <crossd@gmail.com>
[minor git-fu]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Sat, 5 Mar 2016 00:27:20 +0000 (16:27 -0800)]
Added comment to note that fninit clears FOP
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Sat, 5 Mar 2016 00:25:44 +0000 (16:25 -0800)]
FP save/restore security patch for AMD processors
AMD processors do not save/restore the FOP/FIP/FDP values from/to the
x87 FPU unless an unmasked FPU exception is pending. This can result in
a state leak between processes during a context switch, and is a
potential security hole.
See CVE-2006-1056 and CVE-2013-2076 on cve.mitre.org.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Thu, 3 Mar 2016 21:32:05 +0000 (13:32 -0800)]
Extended state AMD backwards compatibility updates (XCC)
Rebuild your universe (kernel headers and user apps)!
These updates allow Akaros to defer to FXSAVE instructions in the event
that the processor does not support the XSAVE instructions. This is
necessary for Akaros to run on older AMD processors (pre bulldozer).
Akaros will still refuse to boot if you do not have support for FXSAVE.
These updates also include additional CPU feature detection,
particularly x86 vendor detection and support for the XSAVE instruction.
Finally, these updates allow the use of XSAVE in the absence of
XSAVEOPT, because it was an easy patch and we don't have to be that
mean.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Sat, 27 Feb 2016 00:03:11 +0000 (16:03 -0800)]
Added vmrunkernel option for extending the kernel command line passed to the guest
vmrunkernel now targets the launcher program in our linux fork's initramfs
instead of init (see rminnich/linux and mtaufen/ak-vm-tests)
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Wed, 9 Mar 2016 16:30:10 +0000 (11:30 -0500)]
Clean up IPv6 sources.
I'm diving into IPv6 code to get it working. These are trivial
cleanups that I don't want to obscure potential future changes
that would be more substantive.
Remove redundant or unused headers, whitespace cleanups, etc.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 8 Mar 2016 20:38:15 +0000 (15:38 -0500)]
ARRAY_SIZE is the standard in the kernel.
Trivial change to follow the convention used elsehwere in the
kernel.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 8 Mar 2016 16:54:42 +0000 (11:54 -0500)]
Clean up profiler configure and usage functions.
An incidental cleanup that became evident from the last cleanup;
the 'profiler_configure' function was unnecessarily hard to
follow due to lack of an early return.
Also, there was this odd function to return an array of strings
that could be used to construct an error message, but that were
used nowhere else; this was an encapsulation failure. Change
that to just construct the error message and call it.
Arguably, the configure function should just call 'error()'. Oh
well.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 8 Mar 2016 16:39:46 +0000 (11:39 -0500)]
Clean up profiler variables and formatting.
Remove unused variables, move loop indices to their loop,
use void* instead of char* in several places, clean up
declaration formatting.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Dan Cross [Tue, 8 Mar 2016 15:37:27 +0000 (10:37 -0500)]
Fix formatting: leading spaces to tabs, and fix continued-line alignment.
Indent using tabs, not spaces.
In the event that a line must be broken due to length, the coding
standard says to break it so that we use tabs to advance the
continued line to the level of indentation of the broken line,
and then spaces to align to the opening parenthesis.
Signed-off-by: Dan Cross <crossd@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Tue, 8 Mar 2016 00:33:32 +0000 (16:33 -0800)]
Add in more uverbs backward compatilibity
Add in support for older style extended query device.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 19:07:17 +0000 (14:07 -0500)]
Add rdmsr and wrmsr utilities
Note that wrmsr writes the same MSR value to *all* cores.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 19:04:04 +0000 (14:04 -0500)]
Add a helper for querying the number of cores
The info is exposed via #vars.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 19:23:11 +0000 (14:23 -0500)]
Remove MAX_VCORES
This was limiting us to 64 vcores. Instead of cranking the number up, I
opted to just remove the #define completely. We should be able to figure
these things out dynamically.
Right now MAX_NUM_CORES is 256 for x86. That was due to the old xAPIC.
One of these days we'll actually want to run on a large-scale SMP machine
and will want to increase that. And then we'll also start worrying about
the size of things that grow O(MAX_NUM_CORES) for every process, e.g.
procdata.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 19:20:12 +0000 (14:20 -0500)]
Remove MCS dissemination barrier
It's a cool thing, but it has a few problems.
- It wants to know statically how many vcores there are (max, at least).
- It doesn't pad its dissem structure properly (it adds 64 bytes extra, not
of padding, but just an array).
- It doesn't handle preemption.
All of these can be fixed, if we actually want the barriers. In that case,
we can bring this code back and fix up the above three things.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 18:58:59 +0000 (13:58 -0500)]
x86: Fix devarch's MSR error handling
In some cases, we weren't even setting errno, just returning -1. Then on
error, we'd get crap from perror() like:
pread: Success
Now get meaningful errstrs and at least have errno set.
E.g.
(On a machine without IA32_PERF_CTL)
/ $ rdmsr 0x199
pread: Bad address, read_msr() faulted on MSR 0x199
/ $ wrmsr 0x198 88888
pwrite: Operation not permitted, MSR 0x198 not in write whitelist
Most of the other errors would be triggered by a rdmsr or wrmsr bugging
out.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 7 Mar 2016 18:56:37 +0000 (13:56 -0500)]
x86: Properly initialize MSR whitelists
The address ranges need to be initialized so that they are sorted.
Otherwise, whoever adds entries needs to know the actual value of the MSRs
and maintain their ordering manually.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 29 Feb 2016 23:34:45 +0000 (18:34 -0500)]
x86: Use FSGSBASE for TLS changes (XCC)
When the CPU feature is available, userspace and the kernel will use the
instructions (e.g. wrfsbase) to change TLS.
Rebuild glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 29 Feb 2016 23:24:40 +0000 (18:24 -0500)]
x86: use setters/getters for MSR_{FS,GS}_BASE
We need to be a little careful in the kernel with using these before cr4 is
set. We'll eventually set cr4 to enable this usage in arch_pcpu_init. For
the most part, any MSR accesses of this sort will happen after smp_boot,
which is fine.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 29 Feb 2016 20:49:24 +0000 (15:49 -0500)]
x86: Detect XSAVEOPT
This is an examples of how the kernel can set and query CPU features. For
the most part, we should do all of the cpu_set_feat() very early during
boot in cpuinfo.
XSAVEOPT implies XSAVE, so we have just CPU_FEAT_X86_XSAVEOPT.
With these changes, both the user and the kernel can check at runtime for
XSAVEOPT and adapt accordingly.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 29 Feb 2016 20:45:13 +0000 (15:45 -0500)]
Add CPU feature detection (XCC)
Userspace, Glibc, and the kernel can now query whether the CPU has certain
features with
bool cpu_has_feat(int feature);
Some CPU features are architecture independent, such as the support for
virtual machines. Most others will be architecture dependent. I added a
few feature bits as an example, though they are not used yet.
To use within the kernel:
#include <cpu_feat.h>
To use within glibc:
#include <kernel-features.h>
To use in generic userspace (e.g. user/*, tests/*, etc):
#include <parlib/cpu_feat.h>
Reinstall your kernel headers to use the features. Rebuild glibc to make
sure I didn't mess anything up.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Mon, 29 Feb 2016 18:38:35 +0000 (13:38 -0500)]
Add proc_global_info (XCC)
This is a read-only, shared-memory region mapped into every process's
address space.
Rebuild the world.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 29 Feb 2016 16:57:53 +0000 (08:57 -0800)]
Fix mxcsr boot time init
The mxcsr register should be initialized to its power on default of 0x1f80.
This masks all SIMD floating point exceptions and clears all SIMD
floating-point exception flags, sets rounding control to round-nearest
disables flush-to-zero mode, and disables denormals-are-zero mode.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
[ removed a couple extra newlines ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Thu, 17 Dec 2015 22:43:30 +0000 (14:43 -0800)]
Virtualization changes to handle X2APIC mode.
These are changes to the vmm to allow it to handle the new MSR based
accesses. This includes allowing the direct msr access in vmx.c,
otherwise vmexiting will occur.
Signed-off-by: GanShun <ganshun@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Thu, 17 Dec 2015 01:36:39 +0000 (17:36 -0800)]
Enabling X2APIC
Changing all offsets from the old XAPIC mode to the newer X2APIC mode and
removing lapic_wait_to_send. All interaction with the X2APIC is done with
apicrput, apicrget or apicsendipi. Removed memory allocation in pmap64.c
and value check in check_sym_val
Signed-off-by: GanShun <ganshun@gmail.com>
[ removed some debugging comments, fixed pb_ktest ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
GanShun [Wed, 16 Dec 2015 20:21:09 +0000 (12:21 -0800)]
Removed lapic_set_id and lapic_set_logid functions
These functions are not used and are no longer allowed once we swap to the
X2APIC. Removing them in preparation for activating the X2APIC
Signed-off-by: GanShun <ganshun@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:55:52 +0000 (14:55 -0800)]
fp state save, restore, and error handling
save_fp_state and restore_fp_state now use xsaveopt64 and xrstor64,
restore_fp_state handles faults. In the event of a fault,
restore_fp_state prints an error message and then restores
the fp state to a default that was determined at boot.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:47:54 +0000 (14:47 -0800)]
vm exit handler for xsetbv
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:42:31 +0000 (14:42 -0800)]
Initialize guest xcr0, save and restore xcr0 between guest and Akaros
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:35:42 +0000 (14:35 -0800)]
Boot time and per-cpu extended state setup
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:27:02 +0000 (14:27 -0800)]
Add load, safe load, read xcr0 functions
void lxcr0(uint64_t xcr0)
int safe_lxcr0(uint64_t xcr0)
uint64_t rxcr0(void)
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Wed, 24 Feb 2016 23:15:48 +0000 (15:15 -0800)]
Relocated fixup table macros
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Mon, 22 Feb 2016 22:11:37 +0000 (14:11 -0800)]
Extended state data structures (XCC)
Rebuild your kenrel headers and rebuild all user apps!
new ancillary_state state components
x86_default_xcr0
xcr0 in guest_pcore
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Thu, 11 Feb 2016 17:50:47 +0000 (09:50 -0800)]
Remove some trailing whitespace.
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Wed, 24 Feb 2016 19:08:13 +0000 (14:08 -0500)]
Turn off TSD in slave processors.
Turn off TSD (Time Stamp Disable) on slaves.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Mon, 22 Feb 2016 23:42:20 +0000 (15:42 -0800)]
Add page reference counting to mm hooks.
Add page reference counting logic to some of the user map helper functions.
Expose one of the mlx4 parameters to user space.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar [Mon, 22 Feb 2016 23:35:17 +0000 (15:35 -0800)]
Make query_port not report port_down always.
Hack existing linux logic to avoid netdev stuff that was reporting port_down
always.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar' via Akaros [Thu, 18 Feb 2016 22:11:00 +0000 (14:11 -0800)]
Fix couple of problems in compat code.
While trying newer tests, the non-initialization logic of SGL's became
apparent. Also, newer tests invoke get_user_pages() without faulting in
corresponding pages, so we need to automatically allocate the pages.
Clean up to do reference counting in get_user_pages() etc will come later.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 16 Feb 2016 21:15:34 +0000 (16:15 -0500)]
Remove user include hacks
Due to the old style of having user libraries include their own headers as
both <libname/foo.h> and <foo.h>, we had to have a few hacks to force us to
include the 'real' headers that we wanted.
Now that we do things the right way, we don't need to carry those hacks
around.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Tue, 16 Feb 2016 19:33:42 +0000 (14:33 -0500)]
Clean up user library include paths (XCC)
Allowing libraries to search their own include/ for <foo.h> is a huge mess
that results in issues when glibc has foo.h. The fix is to not allow that,
and to insist libraries refer to their own files by their full name
(libname/foo.h).
All user libraries (other than pthread) now have their include directories
arranged as:
user/LIBNAME/include/LIBNAME/FOO.h
With their include path being set to user/LIBNAME/include/, and all
#includes explicitly list the libname.
Due to moving parlib's arch symlink, you'll need to do something like:
$ rm user/parlib/include/arch
$ make mrproper
$ mv .config.old .config
$ make ARCH=x86 oldconfig
$ make userclean
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar' via Akaros [Thu, 11 Feb 2016 01:11:53 +0000 (17:11 -0800)]
Activate kernel bypass logic
Hook in mlx4/ driver to activate kernel bypass logic.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar' via Akaros [Thu, 11 Feb 2016 01:09:47 +0000 (17:09 -0800)]
Port over linux 4.1.15 infiniband/core logic for kernel bypass NIC access
Port over linux 4.1.15 drivers/infiniband/core logic essential for
kernel bypass NIC access. Slight edits to adapt to Akaros environment
(#if exclusion of non essential code blocks, panic stubs etc), described
in README file.
Most of the interlock logic with core kernel (mm/vfs etc) is captured
in compat.[ch].
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Kanoj Sarcar' via Akaros [Wed, 10 Feb 2016 23:54:30 +0000 (15:54 -0800)]
Port over linux 4.1.15 mlx4 kernel bypass driver
Port over linux 4.1.15 drivers/infiniband/hw/mlx4 logic essential for
kernel bypass NIC access. Slight edits to adapt to Akaros environment
(#if exclusion of non essential code blocks, panic stubs etc), described
in README file.
Signed-off-by: Kanoj Sarcar <kanoj@google.com>
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Michael Taufen [Wed, 10 Feb 2016 17:37:58 +0000 (09:37 -0800)]
Updates from vmm-akaros
Boot params
e820 info
Use copy_vmctl_tovmtf(*) in __build_vm_ctx_cp(*)
Inject GPF on unsupported MSR access
Add linux_bootparam.h
Signed-off-by: Michael Taufen <mtaufen@gmail.com>
[ pragma once, static_assert->parlib_static_assert ]
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Sat, 13 Feb 2016 20:57:08 +0000 (15:57 -0500)]
Remove kernel errno string processing
The kernel doesn't really need to know about the string names for errno
values. We were using that mostly as a hack to not use proper
errstrings.
I kept parse_errno.sh around, since we (theoretically) still use that to
generate error lists in glibc.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Sat, 13 Feb 2016 19:03:57 +0000 (14:03 -0500)]
Remove uses of errno_to_string()
Using errno_to_string() was a hack.
In addition to removing that, this commit cleans up a few nasty things.
In namec(), we just had a static string floating around for some reason.
Good times.
More importantly, in sysfile we were doing a brain-dead strcmp on
ENODATA. Computers should do comparisons on errno. Errstr is for
humans. The danger there is that if someone did:
error(ENODATA, "Actually a useful message that was not NULL")
then the strcmp on errstr would fail, since it's not the "string that
meant ENODATA).
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
Barret Rhoden [Sat, 13 Feb 2016 19:12:01 +0000 (14:12 -0500)]
Outlaw the setting of NULL errstrs
This will catch them if we try to use them. O/w we'll have to rely on
other methods (code review/tools) to find them.
Maybe there's an argument to be made for a simple error(EFOO, 0),
where you just don't want to bother making a string. Then for now you
can use ERROR_FIXME.
Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>