This is a modified version of an assignment that ran first in Fall 2015 at JHU. See http://gaming.jhu.edu/~phf/2015/fall/cs318/assignment-uthreads.shtml and the Changelog below. If you are interested in using this assignment in your course, please contact me for reference implementations and test cases!

The second half of this assignment is Assignments for an xv6 OS Class: Kernel Thread Support.

Overview ¶

This assignment is entirely in userspace. We provide for you a compiled, modified xv6 kernel (without sources) for you to run against. Despite that, this is a potentially big assignment!

This is almost entirely taken from Carnegie Mellon’s Operating Systems class. This assignment has been modified from its original version. It has been reformatted to fit this semester.

Words of Wisdom: First, please start early! Second, if you’re working as a pair, please work together and don’t split up the tasks, you’ll get yourself into trouble that way! Third, please make minimal changes to xv6; if you make it hard for us to grade, we’ll dock points.

What Are Threads?¶

Threads generalize the notion of an xv6 process. An xv6 process is, at once, several different things:

A description of an address space (which virtual addresses map to which physical addresses?)
A description of file descriptors (which fds are open? What files are they pointing to?)
A description of a CPU, sometimes called a “register file”: what are the contents of each (user-level) CPU register right now?

A thread is the name we give to just the last of these things. Obviously we need the other things when we’re running instructions on xv6, so threads are said to be in a process: that is, a process may have one or more threads. Every xv6 process you have ever created to date has had exactly one. Let’s change that!

Changes To The xv6 Kernel ¶

These changes have been made for you, though you do not have the source of the modified kernel. It will, however, be beneficial for you to think about how the kernel might offer this functionality!

Thread Identifiers ¶

Since threads are things managed by the kernel, we need some way of identifying a particular thread, just like we identify a particular open file or a particular process.

For this assignment, we put process and thread identifiers in the same name-space: the process identifier as returned by fork() will also be the thread identifier of the first thread in that process.

In particular, thread identifiers, like process identifiers, are globally unique. That is, every thread on the system will have a unique thread identifier, just as every process on the system has a unique process identifier. The first thread in a process will have equal thread and process identifiers; subsequent threads spawned in a process will not.

Scheduling ¶

void yield(int tid) will invoke the scheduler again with a hint that it would be good for tid to be on core soon (ideally, now), giving up the remainder of this thread’s timeslice. Note that if there are no other running threads, the kernel may return to this one without context switching. If tid is negative, the kernel will simply run the next RUNNABLE thread.
void desch(int *guard) will put the current thread to sleep (that is, suspend its execution) if *guard is nonzero. If the process exit()-s or is subject to kill() while this thread remains so suspended, this thread will be destroyed, too.
int mkrun(int tid) should awaken a thread that is sleeping in desch(). If there is no such thread, it will return a negative number; otherwise, it will return 0.

A thread’s invocation of desch() is atomic with respect to any mkrun() targeting that thread. That is, the decision to go to deschedule happens entirely before or entirely after any decision to wake up; dually, the decision to wake up happens entirely before or entirely after any decision to deschedule.

Creating A New Thread ¶

We defined a new system call, int tspawn(void *stktop, void (*f)(void *), void *a) which, creatively enough, spawns a new thread running the function f with argument a with stack built (downwards, as this is x86, not, for example, some ARM systems) at stktop. stktop must point to (or immediately above, at least) previously allocated memory; the kernel will need two words to build the initial call frame for f, but there should, in general, be more space available for the use of the function f. tspawn returns the thread identifier of the spawned thread, or a negative number if it fails.

Note

As with process identifiers on xv6, thread identifiers are monotonically increasing and we do not concern ourselves with wrap-around. This is something of an embarrassing situation which can be blushingly acknowledged with “educational OS”.

The kernel we ship you supports NTHR = 256 threads, distributed in any way between the up-to-NPROC (i.e. 64) processes. Your submission should, ideally, not depend upon either of these numbers (and especially not as numbers).

Thread Exits ¶

The new system call, void texit(int *where, int what), is sort of like exit() except that it simply stops the execution of the current thread after writing what to where (with release atomic semantics). If (and only if) there are no other threads in the process, then texit() will behave like exit() does in xv6: it will result in the process being torn down and its parent process being notified. If there are other threads, then only the invoking thread will be terminated. Put another way, wait() will only return a process identifier when all threads in that process have exited. wait() will never return a thread identifier that isn’t also a process identifier! (This should make sense if you think about about fork() and wait() are supposed to work together.)

Note that texit() is permitted to return if where is not a valid pointer. This is the only case in which texit() may return.

If, on the other hand, any thread in a process calls exit(), all threads in that process will be terminated (or scheduled for termination prior to their next return to userland). That is, texit() is the polite “I’m done, may I be excused.” while exit() is the table-flipping “We’re done here!” ((╯°□°)╯︵ ┻━┻).

Similarly, when a process identifier is kill()-ed, all threads in that process will be terminated (or scheduled for termination). It is an error to call kill() with a thread identifier that is not also a process identifier.

Miscellany ¶

The new system call int gettid(void) will return the invoking thread’s identifier. The existing int getpid(void) will return the process identifier (i.e. the thread identifier of the first thread that was in this process, even if that thread has subsequently exited) whenever any thread therein asks.
File descriptors are shared by all threads within a process, so one could imagine building a multi-threaded application that worked with files.
fork() will fail if there are multiple threads in the invoking process, as it is not clear what should happen otherwise. (For the very curious, POSIX introduced an atfork() mechanism to help with these issues.)
exec() will fail if there are multiple threads in the invoking process. While it is clear what to do here, I do not imagine we will need the functionality and so would rather not implement it. ;)

Part 1: Study, Design, Log (20%)¶

There’s not a whole lot of material to study, but do carefully review the diffs we have made to the user-visible xv6 header files. You can do this however you want, but git log -p may be of utility.

I would suggest that you read this entire assignment at least twice before writing any part of it.

Please keep a detailed log in README, including your initial plans and what actually happened. :)

Part 2: usem: Better Userspace Synchronization ¶

Port Spinlocks ¶

Based on the in-kernel spinlocks, please write:

struct uspin { ... }. Up to you what goes in the .... This lives in uthrdefs.h.
int uspin_init(struct uspin *sl); should get a spinlock ready (in its unlocked configuration, of course). Should getting one ready prove impossible, this function should return a negative number; otherwise, it should return 0.
void uspin_destroy(struct uspin *sl); should free any resources required by uspin_init. You do not need to handle the case when there are threads making use of this spinlock and someone calls uspin_destroy. By that, I mean, the onus is on the user of the API to make sure that doesn’t happen.
void uspin_acquire(struct uspin *sl); will acquire the lock. That is, upon return, it will have arranged to prevent the return of any other thread’s uspin_acquire(sl) calls until any thread calls…
void uspin_release(struct uspin *sl); drops the lock. After this call, another uspin_acquire(sl) can complete. You do not need to handle the case when a thread calls uspin_release() on an already-released spinlock. By that, I mean, the onus is on the user of the API to make sure that doesn’t happen.

Calling uspin_acquire(sl) on an already acquired spinlock sl should never return unless some other thread calls uspin_release(sl). Note that it may be a different thread that releases.

uspin_acquire(sl) must not use while(condition){;} to burn CPU time, but should instead use yield();. You’ll be happier, trust me, and I’ll be happier too.

All of these functions are given in skeletal form in uspin.c.

Semaphores ¶

In addition to spinlocks, we’ll define a thing called a “counting semaphore”. A counting semaphore is, essentially, a counter that refuses to go negative. If an attempt to decrement it would make it negative, that attempt will wait for the number to be big enough so that the decrement yields a non-negative value. Another way of thinking about this: there are N instances of some resource available; the semaphore’s job is to make sure that there are only ever at most N instances claimed for use.

As before, you will need to fill out a structure definition:

struct usem { ... }. Your job to fill in the ...; whatever needs to go in there should go in there. This lives in uthrdefs.h.

The steady-state calls are rather like uspin:

void usem_acquire(struct usem *sem); decrements the counter, if possible, or blocks until such time as the decrement is possible. Blocking is to be done by desch() rather than yield(), sleep(), or any other mechanism, for full credit.
void usem_release(struct usem *sem); increments the counter and notifies at least one sleeping thread of this condition.

The constructor and destructor functions are:

int usem_init(struct usem *sem, unsigned int initial_count); should initialize a semaphore into a state so that initial_count-many usem_acquire(sem) calls would succeed without waiting and the next would block. usem_init() should reject any initial count that is greater than INT_MAX (i.e. 2147483647, as defined in limits.h). If getting a semaphore ready fails, this call should return a negative number; otherwise, it should return 0.
void usem_destroy(struct usem *sem); should destroy the semaphore (that is, release whatever resources were required for its construction). If there are threads sleeping on the semaphore, their fate is up to you, but be prepared to defend your decision. (Extreme policies are fair game!)

The implementation need not be prepared to deal with the internal counter going above INT_MAX; that is, it is, again, on the user to avoid that.

Note that we anticipate semaphores using spinlocks internally!

STOP AND CONSIDER: The use of desch() poses an interesting challenge: in usem_acquire(), one may need to atomically unlock and deschedule. To be concrete: one may be holding a lock to guard the counter, observe that the counter is in a state where usem_acquire() must desch(), and need to drop the lock before desch()-ing (since holding the lock while asleep is unlikely to work out well!). But! After dropping the lock but before usem_acquire() calls desch(), another thread could usem_release(), invalidating the observation made of the counter! In other words, “Why does desch() take an argument, anyway?” In tabular form, the case to consider is something like

Thread 1	Thread 2
Enter `usem_acquire()`
Lock internals
Observe counter too low
Drop lock
	Enter `usem_release()`
	Lock, increment, unlock
	Leave `usem_release()`
`desch` anyway? `:(`

A full-credit implementation of usem will, in addition to being correct as per the above API, be fair in a FIFO ordering sense:

If two threads, T1 and T2, call usem_acquire() on a usem whose counter is currently 0, with T1 acquiring before T2, then the next usem_release() will wake T1 and not T2. That is, the intended behavior, assuming the semaphore is initialized to 0, is:

Thread 1

Thread 2

Thread 3

usem_acquire() desch

usem_acquire() desch

usem_release()

usem_acquire() finishes
Moreover, in the above scenario, another thread calling usem_acquire() on this spinlock, even if after the usem_release() call indicated, will only return after T2 returns. That is, this outcome, again assuming that the semaphore is initialized to 0, is to be avoided:

Thread 1

Thread 2

Thread 3

usem_acquire() desch

usem_release()

Call usem_acquire()

usem_acquire() returns

All of these functions are given in skeletal form in usem.c.

Part 3: uthr: Thread Library Functions ¶

uthr provides functionality akin to (a small subset of) the POSIX threads library functions pthread_create, pthread_exit, and pthread_join on a real UNIX system. It defines the following functions:

int uthr_init(unsigned int stack_size); initializes the thread library and sets the stack size of all created threads to stack_size bytes. Unfortunately, despite the symmetry of threads in general, the library as we are designing it is unable to ensure that the main thread (i.e. the caller of thr_init) will have a stack of stack_size bytes, as its stack comes from the xv6 kernel.
void *uthr_malloc(int size); is a thread-safe wrapper around the existing umalloc malloc() function.
void uthr_free(void *what); is a thread-safe wrapper around free(), by analogy.

Because of the way umalloc is written, at most one thread per process can safely be manipulating its data structures; it is the job of these wrapper functions to ensure that this holds!
int uthr_create(void (*f)(void *), void *arg); creates a new thread, with its own stack, running the code f(arg). Returns to the parent the thread identifier of this thread, or a negative number on error (in which case, no thread was created).
- Undergraduates: it is OK if you wish to require f to call uthr_exit(...), just like xv6 requires our program to call exit.
- Graduates: your submission must interpret a return from f as uthr_exit(0) (as well as handling any uthr_exit(...) calls that it may make!).
void uthr_exit(void *baton); ends the execution of the current thread, arranging for the baton to be passed to whichever thread calls…
int uthr_join(int tid, void **hand);, which waits for the thread identified by tid to call uthr_exit(baton) and then arranges for baton to be placed in *hand. You may, at your discretion, drop the baton if hand is 0.

If the thread identified by tid has already been uthr_join()-ed (regardless of whether or not it has exited and had its baton collected), uthr_join() returns a negative number and does not mutate *hand.

The effect of uthr_join(gettid(), ...) is either to hang forever or to return an error (negative result). The latter is more friendly to the API user, to be sure.

uthr_create() and uthr_exit() will need to use the tspawn and texit system calls introduced above. Note that there is no need for kernel-side support for uthr_join()! Isn’t that interesting…

uthr_exit() and uthr_join() must not suffer from “the terminal irony” whereby a failure to allocate memory could imply that a thread could not exit or make progress through uthr_join() (and thereby free up memory).

Because it may be a long while between uthr_join() and uthr_exit(), uthr_join() must not poll (spin until) a thread exits. The kernel offers all the functionality required to avoid such inefficiencies. Moreover, if the target of uthr_join() has already uthr_exit()-ed, uthr_join() must return promptly.

The main thread is permitted to call uthr_exit() and must be uthr_join()-able. That is, once the main thread has called uthr_init(), it is just another thread as far as the library is concerned. Note that this deviates from some platforms’ interpretation of threading, whereby the main thread shutting down is grounds to terminate the entire process.

All of these functions are given in skeletal form in uthr.c.

It is expected that the userland thread library will depend on the userland malloc() functionality; there is no need to write your own allocator for any of this! (Though of course even here invasive data structures are possibly worth considering!) BIG HINT: You may wish to carefully consider the interaction of uthr’s internal use of malloc and free, if any, and the exported uthr_malloc and uthr_free wrappers!

Thread Exits And Stacks ¶

Note that there is a tricky difficulty with free()-ing the stack associated with a thread that is uthr_exit()-ing. There is a reason that texit() takes the parameters it does. As a big hint, you may find yourself wishing for a uspin_release_and_texit() or even usem_release_and_texit() operation. Feel free to write it/them!

Suggested Plan Of Attack ¶

As I said at the beginning, this assignment has a lot of pieces. Here’s how I might (and mostly did) approach this problem:

User-level locking primitives (uspin or usem API) can be designed and written before anything else, given the kernel we give out. You will probably want to look at 318_usem1.c and 2 and 3 too; look at 318_dmr.c for an example of using desch and mkrun.
Review tspawn and texit calls. Look at 318_ts1.c for an example of their usage.
Write a minimal uthr_create around tspawn. 318_tc1.c does not depend on any other uthr function, and can help make sure the basics are in a good place. Modifying 318_usem3.c to use uthr_create will also give you additional test cases (though the continued use of texit rather than uthr_exit will make them somewhat rude, but that’s OK!).
Now implement uthr_exit() and uthr_join(). 318_tcj1.c and 318_tcj2.c depend on the entirety of the user thread library and might be termed “basic” and “advanced” handling tests.
318_main_te.c and 318_join_main.c test the special case of the main (i.e. initial) thread exiting and being joined upon.
Write your own tests! Debug! Go back and fix things!
When the deadline hits, or when you decide you’re happy with how things are, make your submission and go enjoy the outside.

In case you’re interested in line counts of my solution (just to estimate the workload, perhaps); all sloc counts are generated using David A. Wheeler’s SLOCCount:

a little library of helper functions I wrote is 33 sloc long (61 literal newlines) (I’ll leave it to your imagination what might be in there.);

uthrdefs.h is 9 (13);

uspin.c is 19 (26);

usem.c is 49 (64);

and uthr.c is 188 (240).

By comparison, all the tests given add up to 401 sloc (546)!

README, Scoring, …¶

As usual, please include a README file with discussion of your submission and its design, including rationale for design decisions and things you struggled with and all that other good stuff. I would appreciate, though you do not need to include, opinions of the assignment; I promise not to dock points if you tell me the assignment stinks. ;)

The usem locking primitives will be worth about 20% of the grade, about evenly divided between functionality when subjected to tests and manual review. The uthr library will be roughly 60%, again about evenly divided. The remaining 20% is, of course, for the README.

Good luck!¶

One last bit of advice: if something really isn’t working, get up and stretch. Go for a walk around the building. Relax. Do other work. Come back later and you’ll see things you didn’t see before.

Changelog ¶

Unlike the CMU original version, this version uses tspawn() instead of a much more raw thread_fork() which spawns a new thread with an identical register file, requiring the use of assembler in the analog of uthr_create().
Updated texit(void) to texit(int *where, int what) to indicate positive possession of a thread by the kernel. This alleviates the possible-but-tricky need for userland to write things like uspin_release_and_texit() in assembler. With the new API, uspin_release_and_texit() can be entirely in C. Given that this is a 1-week assignment, that seems entirely reasonable.
TODO: When and if mainline xv6 takes patches to fix the process ID wraparound issue, this assignment can be revised to address the thread ID wraparound. The proper fix, I believe, is to have threads linger until joined, only texit()-ing once that happens.

Relatedly, there is now some new prose about tids being global.

For The Curious ¶

There are, of course, other threading xv6 assignments than this one. For examples,

nwf's Ramblings

Assignments for an xv6 OS Class: Miniature pthreads Library

Overview ¶

What Are Threads?¶

Changes To The xv6 Kernel ¶

Thread Identifiers ¶

Scheduling ¶

Creating A New Thread ¶

Thread Exits ¶

Miscellany ¶

Part 1: Study, Design, Log (20%)¶

Part 2: usem: Better Userspace Synchronization ¶

Port Spinlocks ¶

Semaphores ¶

Part 3: uthr: Thread Library Functions ¶

Thread Exits And Stacks ¶

Suggested Plan Of Attack ¶

README, Scoring, …¶

Good luck!¶

Changelog ¶

For The Curious ¶

Thread 1	Thread 2	Thread 3
`usem_acquire()` `desch`
	`usem_acquire()` `desch`
		`usem_release()`
`usem_acquire()` finishes

Thread 1	Thread 2	Thread 3
`usem_acquire()` `desch`
		`usem_release()`
	Call `usem_acquire()`
	`usem_acquire()` returns