This is a modified version of an assignment that ran first in Fall 2015 at JHU. See http://gaming.jhu.edu/~phf/2015/fall/cs318/assignment-uthreads.shtml and the Changelog below. If you are interested in using this assignment in your course, please contact me for reference implementations and test cases!
The second half of this assignment is Assignments for an xv6 OS Class: Kernel Thread Support.
Overview¶
This assignment is entirely in userspace. We provide for you a compiled, modified xv6 kernel (without sources) for you to run against. Despite that, this is a potentially big assignment!
This is almost entirely taken from Carnegie Mellon’s Operating Systems class. This assignment has been modified from its original version. It has been reformatted to fit this semester.
Words of Wisdom: First, please start early! Second, if you’re working as a pair, please work together and don’t split up the tasks, you’ll get yourself into trouble that way! Third, please make minimal changes to xv6; if you make it hard for us to grade, we’ll dock points.
What Are Threads?¶
Threads generalize the notion of an xv6 process. An xv6 process is, at once, several different things:
A description of an address space (which virtual addresses map to which physical addresses?)
A description of file descriptors (which fds are open? What files are they pointing to?)
A description of a CPU, sometimes called a “register file”: what are the contents of each (user-level) CPU register right now?
A thread is the name we give to just the last of these things. Obviously we need the other things when we’re running instructions on xv6, so threads are said to be in a process: that is, a process may have one or more threads. Every xv6 process you have ever created to date has had exactly one. Let’s change that!
Changes To The xv6 Kernel¶
These changes have been made for you, though you do not have the source of the modified kernel. It will, however, be beneficial for you to think about how the kernel might offer this functionality!
Thread Identifiers¶
Since threads are things managed by the kernel, we need some way of identifying a particular thread, just like we identify a particular open file or a particular process.
For this assignment, we put process and thread identifiers in the same
name-space: the process identifier as returned by fork()
will also
be the thread identifier of the first thread in that process.
In particular, thread identifiers, like process identifiers, are globally unique. That is, every thread on the system will have a unique thread identifier, just as every process on the system has a unique process identifier. The first thread in a process will have equal thread and process identifiers; subsequent threads spawned in a process will not.
Scheduling¶
void yield(int tid)
will invoke the scheduler again with a hint that it would be good fortid
to be on core soon (ideally, now), giving up the remainder of this thread’s timeslice. Note that if there are no other running threads, the kernel may return to this one without context switching. Iftid
is negative, the kernel will simply run the nextRUNNABLE
thread.void desch(int *guard)
will put the current thread to sleep (that is, suspend its execution) if*guard
is nonzero. If the processexit()
-s or is subject tokill()
while this thread remains so suspended, this thread will be destroyed, too.int mkrun(int tid)
should awaken a thread that is sleeping indesch()
. If there is no such thread, it will return a negative number; otherwise, it will return0
.A thread’s invocation of
desch()
is atomic with respect to anymkrun()
targeting that thread. That is, the decision to go to deschedule happens entirely before or entirely after any decision to wake up; dually, the decision to wake up happens entirely before or entirely after any decision to deschedule.
Creating A New Thread¶
We defined a new system call, int tspawn(void *stktop, void (*f)(void *),
void *a)
which, creatively enough, spawns a new thread running the
function f
with argument a
with stack built (downwards, as this is x86, not, for example, some ARM systems) at stktop
. stktop
must point to (or immediately above, at least) previously allocated memory;
the kernel will need two words to build the initial call frame for f
,
but there should, in general, be more space available for the use of the
function f
. tspawn
returns the thread identifier of the spawned
thread, or a negative number if it fails.
Note
As with process identifiers on xv6, thread identifiers are monotonically increasing and we do not concern ourselves with wrap-around. This is something of an embarrassing situation which can be blushingly acknowledged with “educational OS”.
The kernel we ship you supports NTHR = 256
threads, distributed in any
way between the up-to-NPROC
(i.e. 64) processes. Your submission
should, ideally, not depend upon either of these numbers (and especially
not as numbers).
Thread Exits¶
The new system call, void texit(int *where, int what)
, is sort of like
exit()
except that it simply stops the execution of the current thread
after writing what
to where
(with release atomic semantics). If (and
only if) there are no other threads in the process, then texit()
will
behave like exit()
does in xv6: it will result in the process being torn
down and its parent process being notified. If there are other threads,
then only the invoking thread will be terminated. Put another way,
wait()
will only return a process identifier when all threads in that
process have exited. wait()
will never return a thread identifier that
isn’t also a process identifier! (This should make sense if you think about
about fork()
and wait()
are supposed to work together.)
Note that texit()
is permitted to return if where
is not a valid
pointer. This is the only case in which texit()
may return.
If, on the other hand, any thread in a process calls exit()
, all
threads in that process will be terminated (or scheduled for termination
prior to their next return to userland). That is, texit()
is the polite
“I’m done, may I be excused.” while exit()
is the table-flipping “We’re
done here!” ((╯°□°)╯︵ ┻━┻).
Similarly, when a process identifier is kill()
-ed, all threads in
that process will be terminated (or scheduled for termination). It is an
error to call kill()
with a thread identifier that is not also a
process identifier.
Miscellany¶
The new system call
int gettid(void)
will return the invoking thread’s identifier. The existingint getpid(void)
will return the process identifier (i.e. the thread identifier of the first thread that was in this process, even if that thread has subsequently exited) whenever any thread therein asks.File descriptors are shared by all threads within a process, so one could imagine building a multi-threaded application that worked with files.
fork()
will fail if there are multiple threads in the invoking process, as it is not clear what should happen otherwise. (For the very curious, POSIX introduced anatfork()
mechanism to help with these issues.)exec()
will fail if there are multiple threads in the invoking process. While it is clear what to do here, I do not imagine we will need the functionality and so would rather not implement it. ;)
Part 1: Study, Design, Log (20%)¶
There’s not a whole lot of material to study, but do carefully review the
diffs we have made to the user-visible xv6 header files. You can do this
however you want, but git log -p
may be of utility.
I would suggest that you read this entire assignment at least twice before writing any part of it.
Please keep a detailed log in README
, including your initial plans and
what actually happened. :)
Part 2: usem: Better Userspace Synchronization¶
Port Spinlocks¶
Based on the in-kernel spinlocks, please write:
struct uspin { ... }
. Up to you what goes in the...
. This lives inuthrdefs.h
.int uspin_init(struct uspin *sl);
should get a spinlock ready (in its unlocked configuration, of course). Should getting one ready prove impossible, this function should return a negative number; otherwise, it should return0
.void uspin_destroy(struct uspin *sl);
should free any resources required byuspin_init
. You do not need to handle the case when there are threads making use of this spinlock and someone callsuspin_destroy
. By that, I mean, the onus is on the user of the API to make sure that doesn’t happen.void uspin_acquire(struct uspin *sl);
will acquire the lock. That is, upon return, it will have arranged to prevent the return of any other thread’suspin_acquire(sl)
calls until any thread calls…void uspin_release(struct uspin *sl);
drops the lock. After this call, anotheruspin_acquire(sl)
can complete. You do not need to handle the case when a thread callsuspin_release()
on an already-released
spinlock. By that, I mean, the onus is on the user of the API to make sure that doesn’t happen.
Calling uspin_acquire(sl)
on an already acquired spinlock sl
should never return unless some other thread calls uspin_release(sl)
.
Note that it may be a different thread that releases.
uspin_acquire(sl)
must not use while(condition){;}
to burn CPU time,
but should instead use yield();
. You’ll be happier, trust me, and I’ll
be happier too.
All of these functions are given in skeletal form in uspin.c
.
Semaphores¶
In addition to spinlocks, we’ll define a thing called a “counting semaphore”. A counting semaphore is, essentially, a counter that refuses to go negative. If an attempt to decrement it would make it negative, that attempt will wait for the number to be big enough so that the decrement yields a non-negative value. Another way of thinking about this: there are N instances of some resource available; the semaphore’s job is to make sure that there are only ever at most N instances claimed for use.
As before, you will need to fill out a structure definition:
struct usem { ... }
. Your job to fill in the...
; whatever needs to go in there should go in there. This lives inuthrdefs.h
.
The steady-state calls are rather like uspin
:
void usem_acquire(struct usem *sem);
decrements the counter, if possible, or blocks until such time as the decrement is possible. Blocking is to be done bydesch()
rather thanyield()
,sleep()
, or any other mechanism, for full credit.void usem_release(struct usem *sem);
increments the counter and notifies at least one sleeping thread of this condition.
The constructor and destructor functions are:
int usem_init(struct usem *sem, unsigned int initial_count);
should initialize a semaphore into a state so thatinitial_count
-manyusem_acquire(sem)
calls would succeed without waiting and the next would block.usem_init()
should reject any initial count that is greater thanINT_MAX
(i.e.2147483647
, as defined inlimits.h
). If getting a semaphore ready fails, this call should return a negative number; otherwise, it should return0
.void usem_destroy(struct usem *sem);
should destroy the semaphore (that is, release whatever resources were required for its construction). If there are threads sleeping on the semaphore, their fate is up to you, but be prepared to defend your decision. (Extreme policies are fair game!)
The implementation need not be prepared to deal with the internal counter
going above INT_MAX
; that is, it is, again, on the user to avoid that.
Note that we anticipate semaphores using spinlocks internally!
STOP AND CONSIDER: The use of desch()
poses an interesting
challenge: in usem_acquire()
, one may need to atomically unlock and
deschedule. To be concrete: one may be holding a lock to guard the
counter, observe that the counter is in a state where usem_acquire()
must desch()
, and need to drop the lock before desch()
-ing (since
holding the lock while asleep is unlikely to work out well!). But! After
dropping the lock but before usem_acquire()
calls desch()
, another
thread could usem_release()
, invalidating the observation made of the
counter! In other words, “Why does desch()
take an argument, anyway?”
In tabular form, the case to consider is something like
Thread 1 |
Thread 2 |
---|---|
Enter |
|
Lock internals |
|
Observe counter too low |
|
Drop lock |
|
Enter |
|
Lock, increment, unlock |
|
Leave |
|
|
A full-credit implementation of usem
will, in addition to being correct
as per the above API, be fair in a FIFO ordering sense:
If two threads,
T1
andT2
, callusem_acquire()
on ausem
whose counter is currently0
, withT1
acquiring beforeT2
, then the nextusem_release()
will wakeT1
and notT2
. That is, the intended behavior, assuming the semaphore is initialized to0
, is:Thread 1
Thread 2
Thread 3
usem_acquire()
desch
usem_acquire()
desch
usem_release()
usem_acquire()
finishesMoreover, in the above scenario, another thread calling
usem_acquire()
on this spinlock, even if after theusem_release()
call indicated, will only return afterT2
returns. That is, this outcome, again assuming that the semaphore is initialized to0
, is to be avoided:Thread 1
Thread 2
Thread 3
usem_acquire()
desch
usem_release()
Call
usem_acquire()
usem_acquire()
returns
All of these functions are given in skeletal form in usem.c
.
Part 3: uthr: Thread Library Functions¶
uthr
provides functionality akin to (a small subset of) the POSIX
threads library functions
pthread_create
, pthread_exit
, and pthread_join
on a real UNIX
system. It defines the following functions:
int uthr_init(unsigned int stack_size);
initializes the thread library and sets the stack size of all created threads tostack_size
bytes. Unfortunately, despite the symmetry of threads in general, the library as we are designing it is unable to ensure that the main thread (i.e. the caller ofthr_init
) will have a stack ofstack_size
bytes, as its stack comes from the xv6 kernel.void *uthr_malloc(int size);
is a thread-safe wrapper around the existingumalloc
malloc()
function.void uthr_free(void *what);
is a thread-safe wrapper aroundfree()
, by analogy.Because of the way
umalloc
is written, at most one thread per process can safely be manipulating its data structures; it is the job of these wrapper functions to ensure that this holds!int uthr_create(void (*f)(void *), void *arg);
creates a new thread, with its own stack, running the codef(arg)
. Returns to the parent the thread identifier of this thread, or a negative number on error (in which case, no thread was created).Undergraduates: it is OK if you wish to require
f
to calluthr_exit(...)
, just like xv6 requires our program to callexit
.Graduates: your submission must interpret a return from
f
asuthr_exit(0)
(as well as handling anyuthr_exit(...)
calls that it may make!).
void uthr_exit(void *baton);
ends the execution of the current thread, arranging for thebaton
to be passed to whichever thread calls…int uthr_join(int tid, void **hand);
, which waits for the thread identified bytid
to calluthr_exit(baton)
and then arranges forbaton
to be placed in*hand
. You may, at your discretion, drop thebaton
ifhand
is0
.If the thread identified by
tid
has already beenuthr_join()
-ed (regardless of whether or not it has exited and had itsbaton
collected),uthr_join()
returns a negative number and does not mutate*hand
.The effect of
uthr_join(gettid(), ...)
is either to hang forever or to return an error (negative result). The latter is more friendly to the API user, to be sure.
uthr_create()
and uthr_exit()
will need to use the tspawn
and texit
system calls introduced above. Note that there is no
need for kernel-side support for uthr_join()
! Isn’t that interesting…
uthr_exit()
and uthr_join()
must not suffer from “the terminal
irony” whereby a failure to allocate memory could imply that a thread could
not exit or make progress through uthr_join()
(and thereby free up memory).
Because it may be a long while between uthr_join()
and uthr_exit()
,
uthr_join()
must not poll (spin until) a thread exits. The kernel
offers all the functionality required to avoid such inefficiencies. Moreover,
if the target of uthr_join()
has already uthr_exit()
-ed,
uthr_join()
must return promptly.
The main thread is permitted to call uthr_exit()
and must be
uthr_join()
-able. That is, once the main thread has called
uthr_init()
, it is just another thread as far as the library is
concerned. Note that this deviates from some platforms’ interpretation of
threading, whereby the main thread shutting down is grounds to terminate the
entire process.
All of these functions are given in skeletal form in uthr.c
.
It is expected that the userland thread library will depend on the userland
malloc() functionality; there is no need to write your own allocator for any
of this! (Though of course even here invasive data structures are possibly
worth considering!) BIG HINT: You may wish to carefully consider the
interaction of uthr
’s internal use of malloc
and free
, if any,
and the exported uthr_malloc
and uthr_free
wrappers!
Thread Exits And Stacks¶
Note that there is a tricky difficulty with free()
-ing the stack
associated with a thread that is uthr_exit()
-ing. There is a reason
that texit()
takes the parameters it does. As a big hint, you may
find yourself wishing for a uspin_release_and_texit()
or even
usem_release_and_texit()
operation. Feel free to write it/them!
Suggested Plan Of Attack¶
As I said at the beginning, this assignment has a lot of pieces. Here’s how I might (and mostly did) approach this problem:
User-level locking primitives (
uspin
orusem
API) can be designed and written before anything else, given the kernel we give out. You will probably want to look at318_usem1.c
and2
and3
too; look at318_dmr.c
for an example of usingdesch
andmkrun
.Review
tspawn
andtexit
calls. Look at318_ts1.c
for an example of their usage.Write a minimal
uthr_create
aroundtspawn
.318_tc1.c
does not depend on any otheruthr
function, and can help make sure the basics are in a good place. Modifying318_usem3.c
to useuthr_create
will also give you additional test cases (though the continued use oftexit
rather thanuthr_exit
will make them somewhat rude, but that’s OK!).Now implement
uthr_exit()
anduthr_join()
.318_tcj1.c
and318_tcj2.c
depend on the entirety of the user thread library and might be termed “basic” and “advanced” handling tests.318_main_te.c
and318_join_main.c
test the special case of the main (i.e. initial) thread exiting and being joined upon.Write your own tests! Debug! Go back and fix things!
When the deadline hits, or when you decide you’re happy with how things are, make your submission and go enjoy the outside.
In case you’re interested in line counts of my solution (just to estimate
the workload, perhaps); all sloc counts are generated using David A.
Wheeler’s SLOCCount
:
a little library of helper functions I wrote is 33 sloc long (61 literal newlines) (I’ll leave it to your imagination what might be in there.);
uthrdefs.h
is 9 (13);
uspin.c
is 19 (26);
usem.c
is 49 (64);and
uthr.c
is 188 (240).
By comparison, all the tests given add up to 401 sloc (546)!
README, Scoring, …¶
As usual, please include a README
file with discussion of your
submission and its design, including rationale for design decisions and
things you struggled with and all that other good stuff. I would
appreciate, though you do not need to include, opinions of the assignment; I
promise not to dock points if you tell me the assignment stinks. ;)
The usem
locking primitives will be worth about 20% of the grade, about
evenly divided between functionality when subjected to tests and manual
review. The uthr
library will be roughly 60%, again about evenly
divided. The remaining 20% is, of course, for the README.
Good luck!¶
One last bit of advice: if something really isn’t working, get up and stretch. Go for a walk around the building. Relax. Do other work. Come back later and you’ll see things you didn’t see before.
Changelog¶
Unlike the CMU original version, this version uses
tspawn()
instead of a much more rawthread_fork()
which spawns a new thread with an identical register file, requiring the use of assembler in the analog ofuthr_create()
.Updated
texit(void)
totexit(int *where, int what)
to indicate positive possession of a thread by the kernel. This alleviates the possible-but-tricky need for userland to write things likeuspin_release_and_texit()
in assembler. With the new API,uspin_release_and_texit()
can be entirely in C. Given that this is a 1-week assignment, that seems entirely reasonable.TODO: When and if mainline xv6 takes patches to fix the process ID wraparound issue, this assignment can be revised to address the thread ID wraparound. The proper fix, I believe, is to have threads linger until joined, only
texit()
-ing once that happens.Relatedly, there is now some new prose about tids being global.
For The Curious¶
There are, of course, other threading xv6 assignments than this one. For examples,