From: David Devecsery <ddevec@gatech.edu>
Date: Mon, 18 May 2020 13:54:04 +0000 (-0400)
Subject: Forgot to add instructions...
X-Git-Url: https://git.devinivas.org/?a=commitdiff_plain;h=e98b93431327bbb18cd8c703855737e024b6698f;p=cs3210-lab1.git

Forgot to add instructions...
---

diff --git a/instructions/lab1.md b/instructions/lab1.md
new file mode 100644
index 0000000..70e002a
--- /dev/null
+++ b/instructions/lab1.md
@@ -0,0 +1,403 @@
+# Lab 1 - Getting to know Kernel Development
+
+The goal of this lab is to get you familiar with kernel devlopment, and our xv6
+environment.
+
+The lab is composed of three parts:
+ - First, you will use git to checkout, and build the repository.
+ - Second, you will add some debugging functionality to the xv6 kernel.
+ - Third, you will get familiar with the boot procedure of xv6 by modifying the
+   kernel to support variable memory sizes.
+
+
+## Part 1 - checking out the repository
+
+**FIXME** Much of this is likely to change as I adjust the course
+structure/autograder stuff...
+
+All of the code for this course will be distributed and turned in using the
+[git](www.git-scm.com) revision system.  If you don't know git, we recommend
+looking at the 
+[Git User's Manual](www.kernel.org/pub/software/scm/git/docs/user-manual.html).
+You may also find [this](eagain.net/articles/git-for-computer-scientists/)
+overview helpful.
+
+The xv6 repository we're using for this course is avaialble on Georgia Tech's
+github:
+
+**TODO INSERT LINK**
+
+
+For this lab, we will be using the lab1 branch within git.  You may switch to it
+with:
+
+```bash
+git checkout lab1
+```
+
+Once you've checked out the lab, you'll want to build it.  We're using the [cmake](cmake-homepage)
+build system this semester.  You may find more information about it
+[here](cmake-manual).
+
+For now, to build the lab, we encourage you building in a separate build
+directory:
+
+```bash
+cd <YourLab1CheckoutDirectory>
+mkdir build
+cd build
+cmake .. -DCMAKE_BUILD_TYPE=Debug
+make
+```
+
+**NOTE: The above code uses a "Debug" build. This disables optimizations and
+adds in debug symbols.  Its much easier to work with than a "Release" build, the
+default CMAKE_BUILD_TYPE.  The autograder (more later) will run your code in
+Release mode**
+
+
+Once you've built xv6, you may launch your new kernel.  We've provided a
+convenience script to help you launch it:
+
+```bash
+./xv6-qemu
+```
+
+This should take you to a shell prompt, where you can use your basic xv6
+commands (try typing "ls" to see what files exist in your root directory).
+
+You may terminate the qemu instance launched by the script by hitting
+CTRL-A,CTRL-X.
+
+#### What's happening under the hood
+
+Our xv6 kernel is an OS kernel, its made to run on bare-hardware.  However,
+launching it on your PC seems like a bad idea, as it would overwrite your
+existing OS.  Instead, we want to run it in a virtual environment thats much
+more friendly for testing.  We could use a classic Virtual Machine (VM)
+solution (e.g. VMWare or VirtualBox), but those are pretty heavy-weight, take a
+long time to launch, and are hard to configure.  Instead, we use qemu, a machine
+emulator.  Qemu emulates a cpu, causing it to look to the program running inside
+of it like it has its own raw x86 CPU.  This is slow, and you'll notice that xv6
+actually runs really slowly in qemu, but its much more convenient for deubbing,
+as launching and tearing down a qemu instance is very fast!
+
+You can see the exact command used by turning on the verbose option to
+`xv6-qemu` with `-v` or `--verbose`, and all available commands with `-h`
+
+
+## Part 2 - Modifying The Repository
+
+Now that you have the kernel launching, we're going to make some changes to the
+kernel.  The first change is adding [stack backtrace](https://en.wikipedia.org/wiki/Stack_trace)
+support.  This exercise is aimed at getting you familiar with xv6, and some
+low-level x86 behaviors and registers (including the stack).  Overall the
+quantity of code you write for it will likely be small, but for most, it will
+take a great deal of effort to write.
+
+The second modification you'll be making to the kernel is to add variable memory
+support.  That will get you experience with the xv6 bootup, a little familiarity
+with system-call like behavior, and some assembly experience.  We'll cover that
+more in Part 3. **FIXME: Link?**
+
+### The Specification
+
+For this part of the lab, you will be modifying the xv6 kernel to add
+stack-trace support through a function named `backtrace`.  Backtrace has the
+following specification:
+
+```c
+void backtrace()
+
+Summary:
+Prints the stack trace of any functions run within the kernel.
+
+The format of the data printed is:
+   <0xaddress0> [top_of_stack_function_name]+offs1
+   <0xaddress1> [next_function_in_stack_name]+offs2
+   ...
+   <0xaddressN> [last_function_in_stack]+offsN
+```
+
+You must create a header `backtrace.h` with the declaration of the `backtrace()`
+function, such that any kernel file including `backtrace.h` may run the
+`backtrace()` function.
+
+##### The format:
+The backtrace function will print N lines, where N is the number of functions in
+the kernel's stack (excluding the backtrace function itself).  Each of those
+lines will have the following information and format (in order from left to right):
+-  three (3) spaces to start the line
+-  The address of the next instruction to run in the stack, printed in lower-case hex, surrounded on the left by a single `<` and on the right by a single `>`
+-  a single space
+-  The name of the function on the callstack
+-  A single `+` character
+-  A decimal number representing the offset from the start of that funciton in bytes
+-  A newline character
+
+
+So, if my codebase had a function named `foo`, starting on address `0x200` with
+length `0x100` (e.g. ending at `0x300`), and my backtrace identified the return
+address `0x210` on the stack, I would print the line:
+```
+   <0x210> foo+16
+```
+
+### Some Guidance:
+
+To create this function, you'll have to handle three primary tasks:
+1.  Identify return addresses on the function stack to identify the call-chain before this function
+2.  Lookup the name of the function via its stack location
+3.  Find the starting address of the function, and calculate its offset
+
+The specific details about each of these tasks is as follows:
+
+##### Stack Addresses:
+
+Function calls are managed through the stack abstraction, you should have
+learned about this extensively in CS2200 or equivalent.  If you're rusty on the
+stack, you may find references [here](TODO-stackreference).
+
+For this lab, you'll need to know to know how x86 handles some of these primitive stack
+operations:
+
+-  The stack pointer is maintained in the stack pointer `sp` register (`esp` in 32-bit mode)
+-  The stack grows downwards (pushing decrements the stack pointer)
+-  The `call` instruction, used to call functions implicitly pushes the return address of the call (the instruction after the call) onto the stack (e.g. after call `esp` will point to the return address used by the function)
+-  The `ret` instruction pops the top of the stack and jumps to that address (basically undoing a call)
+-  The base-pointer register (`bp` or `ebp` in 32-bit mode) points to the beginning of the stack frame for this function
+-  The stack pointer points to the last pushed value (e.g. `push` decrements the stack pointer, then writes to the location pointed by it, and `pop` dereferences `esp`, then increments)
+-  By convention, when a funciton is first called, the base-pointer is written to the stack, and stack-pointer is transferred into the base-pointer.
+
+This forces the stack layout to appear as follows:
+
+```
+---------------------
+|        ...        |
+---------------------
+|  Return Address 0 |  <-  First frame on the stack [0]
+---------------------
+|  Base Pointer 0   |  <- Not defined for first frame
+---------------------
+|        ...        |
+---------------------
+|  Local Variables  |
+---------------------
+|        ...        |
+---------------------
+|  Return Address 1 |  <-  Return address of the 2nd frame
+---------------------
+|  Base Pointer 1   |  <- Points to Base Pointer 0
+---------------------
+|        ...        |
+---------------------
+|  Local Variables  |
+---------------------
+|        ...        |
+|        ...        |
+|        ...        |
+|        ...        |
+---------------------
+|  Return Address N |  <-  Return address of the 2nd frame
+---------------------
+|  Base Pointer N   |  <- Points to Base Pointer N-1 <= %ebp
+---------------------
+|        ...        |
+---------------------
+|  Local Variables  |
+---------------------
+|        ...        |
+--------------------- <= $esp
+
+Curent registers:
+
+$esp: Top of current stack
+$ebp: Base Pointer N
+```
+
+#### Symbol Information:
+
+Getting the name of the running function within the kernel is another huge
+challenge.  Once I know the address of the function, how can I determine what
+function was running?  If I were on a normal desktop, I could try to figure it
+out from the binary file, but do I necessarily have a filesystem on all phases
+of kernel boot?
+
+Instead, the kernel has information embedded in its address space giving us some
+information about the running symbols (e.g. the names of the functions that are
+running, and the mapping from address to funciton name), we just need to parse
+this information out of the address space.
+
+##### STAB information
+
+The kernel has embedded in its address space [STAB information](TODO-stab-link).  This information
+is placed at special symbols called `todo-symname` and `todo-other-symname`.
+You can learn more about how we do this by exploring [linker
+scripts](TODO-linker-scripts), and looking at ours in `kernel/kernel.ld`.
+
+Having you write a library that parses the STAB information would be a little
+too  tedious for a class project, so we have instead provided you with a stab
+library in the file `stab-file.c`.  The library includes the following relevant
+functions:
+
+```c
+!!!Function definitions here!!!
+```
+
+
+## Part 3 - Modifying boot.
+
+The final part of this lab will involve modifying the boot system of the kernel
+to detect the amount of available RAM on the machine, then passing and utilizing
+that information within the xv6 kernel.
+
+First, some preliminaries:
+
+### Background Information
+
+Your computer has many essential low-level components, such as your RAM
+controller, power supply, and other peripherals.  These components are fixed on
+the motherboard and generally don't change.  They must also be set up properly,
+in a device specific way before you can run any general purpose code on your
+processor.  Think about it, how will your kernel load itself off the disk before
+the disk driver is set up?
+
+This problem, of bringing up the essential low-level devices on the machine and
+loading your kernel into memory is what we'll call boot-loading.  Let's now
+discuss the three phases of bootloading.
+
+#### BIOS
+
+To help enable the initial configuration and setup of these devices,
+motherboards come with a BIOS (now commonly replaced by the more complex EFI
+protocol), which sets up these devices, loads a single block of data from disk,
+and provides support for additional hardware device queries.  When you first
+launch xv6 (if you launch with gdb attached), you'll notice that the processor
+starts at address 0xfff0, then jumps (if you use `si`, or step instruction) to
+step to the next instruction, at address 0xe05b in the BIOS.
+
+The low-level details of what the BIOS does are hardware specific and not
+relevant for this course (if you're really interested, you can look at xv6's
+[bios source](TODO-sea-bios)).  What we care about for this course is the
+handshake the bios makes with the actual kernel.
+
+When the bios finishes running, it loads a single disk block (the first block on
+disk) into the address 0x7e00, then jumps to and begins executing the code at
+that location.  This is how control is passed from the BIOS to the kernel.
+However, the kernel doesn't fit on a single block in disk, so we have to provide
+one more boot layer to help load the kernel.  This layer is aptly called the
+Boot Loader.
+
+We will explore just a couple of details of the BIOS before we move onto the
+boot loader, as they are fundamental to the BIOS construction.  First, the BIOS
+isn't loaded from disk -- how could it be loaded from disk when it hasn't yet
+initialized the disk controllers?  Instead the BIOS lives on its own ROM that
+ships with your motherboard (or emulated rom for xv6).  This makes the BIOS
+brittle, as it cannot easily be changed, and consequently we try to move as much
+logic off of the BIOS as possible.  Additionally, the BIOS runs before the
+boot loader (and hence kernel), and therefore must leave the processor in its
+lowest configuration state (as if nothing had run), 16-bit real mode (we'll talk
+more about this later), and any BIOS functions that run must also be run from
+16-bit real mode.
+
+#### Boot Loader (bootblock)
+
+The boot loader is the first easily modifyable code that runs (it runs from the
+first disk block), so it can be more complex and flexible than the BIOS.
+However, the boot loader also operates in a constrained environment.  It starts
+in 16-bit real mode (a mode of the x86 processor with 16-bit registers, and
+physical addressing), with at most 512 bytes (1 disk block) of code.
+
+The goal of the boot block is to set up the CPU for the kernel, then load the
+kernel and pass it control (e.g. jump to the kernel's entry point).  You can
+find the code for our boot loader in the `bootblock` directory.  The entry point
+from the BIOS is `start` found in `bootblock/bootasm.S`.  Once the bootloader
+has switched the processor into 32-bit mode, initialized the disk, and loaded the
+kernel into memory, it jumps into the kernel at the end of `bootmain` in
+`bootblock/bootmain.c`.
+
+Modern bootloaders preform much more complex tasks than our bootblock, 
+[GRUB](TODO-grub) is an example of a modern boot loader.
+
+#### Kernel Startup
+
+The kernel goes through a long, arduous boot process, beginning at `entry` in
+`kernel/src/asm/entry.S`, and continuing through much of kernel's `main` in
+`kernel/src/main.c`.  This process will be relevant throughout the course, and
+we wont cover it in too much detail here.
+
+### The Assignment
+
+Currently, the kernel assumes it has `PHYSTOP` memory (defined in
+`include/memlayout.h` as `0xE000000`).  This is a static memory assumption, so
+regardless of what the attached machine has, the kernel will use exactly
+`PHYSTOP` bytes of RAM.  This could be an issue in two ways, (1) the machine has
+less than `PHYSTOP` memory, and the kernel assumes it has memory that doesn't
+exist! or (2) the machine has much more memory, but the kernel cannot allow the
+user-space to utilize it.
+
+In this part of the lab, we're going to call into the BIOS, and have it tell us
+how much memory is available, then we'll pass this information into our kernel
+proper.  However, as you'll see in a moment, we have to call the BIOS from the
+bootblock, in 16-bit real mode, and pass the information to the kernel from
+there.
+
+This part of your assignment has three sub-tasks:
+1.  Get the available RAM from the bootblock
+2.  Make that information available in the kernel
+3.  Have the kernel read that information, and initialize its free memory structure based on the available RAM instead of an arbitrary `PHYSTOP`.
+
+#### The BIOS Call.
+
+As the BIOS is responsible for initializing and configuring the RAM controller,
+it is our definitive source on how much RAM the machine has, and the logical
+place to ask about allocated memory.  However, our bootloader cannot simply call into the
+BIOS (then our bootloader would be BIOS dependant, and we don't want that),
+instead the BIOS exports an interface much like the system-call interface.
+We'll be making a BIOS call by interrupting and passing control into the BIOS
+through the `int` instruction.  Once this is done, the BIOS will pass us the
+information needed, and return.
+
+Recall that the BIOS runs before the boot loader, and must run in 16-bit
+real-mode.  As a result, we must make our BIOS call from code running in 16-bit
+real-mode.  Unfortunately, once the x86 cpu transitions into 32-bit mode, it
+cannot revert to real-mode.  This means, we must make our BIOS call before the
+CPU converts to 32-bit mode, in the bootloader.
+
+
+As the BIOS call is tedious and complex, we wont require you to actually write
+the assembly for the call itself, we've provided that method detailed below.
+Your job is to place the call at the appropriate location, and handle putting
+that memory in a location the kernel can read from.
+
+```asm
+```
+
+#### The Kernel.
+
+Once you have the memory passed to the kernel, your job within the kernel is
+make it aware of the amount and location of physical memory, and to enable the
+kernel to use that physical memory from its `kalloc` function.  After you have
+completed this lab the kernel should be able to allocate all not statically
+allocated physical pages from the `kalloc` function, and `kalloc` should never
+return a physical page that isn't present on the system.
+
+We encourage you to familiarize yourself with the kernel's `kinit` functions,
+and physical memory allocation functions, as they will likely be useful in
+completing this exercise
+
+
+
+## Grading
+
+As with all labs in this course, the lab has an associated autograder.  The
+policies and rules of the autograder may be found on the class
+[syllabus](TODO-syllabus).  You will submit your code to the auto-grader for
+auto-grading.  There will also be a hand graded portion of this lab, worth 15%
+of your lab grade.  Finally, this is our only *individual* lab, you cannot
+collaborate or share code with others (although discussion is allowed).  Your
+code will be checked for cheating, and any detection of shared code, or pulling
+code from the internet will be harshly punished.
+
+TODO: Actual submission instructions...  Once I figure them out.
+