Lab Guide: Real Time Operating System

The RTOS API is fully documented is os.h, so I won’t cover it.  For now this page will stick to the internals in os.c that will be useful to understand when you implement your RTOS in project 3.

The central feature of every RTOS is its task scheduler.  The scheduler is what makes the RTOS real-time.  As such, the RTOS we use (which I refer to as “the RTOS” in this page) is pared down to the point where it’s really only a glorified scheduler, but it’s still useful as an illustrative tool.  When we create a project with the RTOS we define a series of tasks that are given different priorities–system, periodic, and round-robin–as described in os.h.  Each task has its own stack, and statically allocated memory can be shared without restriction.  A separate RTOS kernel uses the global stack.  The memory layout looks something like this (address 0x0000 corresponds to the bottom of the diagram):

Image of memory layout in the RTOS.
Memory layout in the RTOS

N.b. the six task workspaces shown are part of the .bss segment of memory (there might be other .bss stuff above the workspaces).  As always, we turn up our noses at dynamic memory allocation on the heap.

Entering and Exiting the Kernel

The kernel is entered by performing a context switch from the currently-running task’s execution context into the kernel’s execution context.  This happens under two circumstances.  First, the RTOS timer ticks every 5 ms and when it ticks it jumps into the kernel and decides which task should be run.  Second, most of the API calls function by setting a kernel request and switching into the kernel to process the request.

The kernel is switched to from API calls and back using the enter_kernel and exit_kernel functions.  If you look at those function signatures at the top of the os.c file you’ll notice that they’re declared with the attributes noinline and naked.  The noinline attribute indicates to the compiler that it should always call the function using the call instruction instead of inserting it inline with the caller.  The naked attribute tells the compiler not to insert implicit prologue and epilogue code into the function’s assembly code.  An example of prologue code is at the beginning of an ISR, when the MCU context (status register, general purpose registers, etc.) is automatically pushed to the stack; the ret and reti return instructions are examples of epilogue code.

The enter_kernel function operates like this:

  1. Push the current task’s context onto its stack
  2. Save the SP register–that is, the current task’s stack pointer–in the current task descriptor
  3. Load the kernel stack pointer from the kernel_sp variable into the SP register
  4. Pop the kernel’s context from the kernel stack
  5. Generate a ret instruction (since the function is naked) to return into the kernel main loop.

The exit_kernel function is pretty much the reverse:

  1. Push the kernel context back onto the kernel stack
  2. Save the SP register to the kernel_sp variable
  3. Load the stack pointer from the currently-selected task into the SP register.  The currently-selected task is the caller task if the kernel was entered via an API call, or perhaps a new task that was selected by the scheduler if the kernel was entered via the tick interrupt (or the Task_Next API call)
  4. Pop the selected task’s context from its stack
  5. Generate a ret instruction to return into the task.

Step 5 indicates why the functions need to have the noinline attribute: the call instruction pushes onto the stack the address of the next instruction after call.  That address is part of the call frame.  The ret instruction pops that address back off the stack and puts it into the program counter (PC).  If the enter_kernel and exit_kernel functions are inlined, then there’s no call frame on the stack.  Let’s zoom out and look at how an API call from a task works:

  1. The API function sets the kernel request and parameters
  2. The API function calls the enter_kernel function.  The address of the instruction that follows the enter_kernel call gets pushed onto the task’s stack and the enter_kernel function switches context to the kernel
  3. When the enter_kernel function returns, the instruction that was stored at the last call to exit_kernel gets popped off the kernel stack into the PC and thus the kernel is entered
  4. When the kernel is done processing the request, it calls the exit_kernel function, which pushes the address of the instruction that follows the exit_kernel call onto the kernel stack (whence comes the instruction address popped off the kernel stack into the PC in step 3)
  5. The exit_kernel function switches back to the task context, and when the exit_kernel function returns the address that was previously pushed onto the task’s stack by the task’s last call to enter_kernel (or the tick interrupt) gets popped off into the PC, and the task continues from after its previous enter_kernel function call.

So the functions must be noinline because if there’s no call frame on the stack, the exit_kernel function can’t return into the task.  Got it?  Good!

The Timer Interrupt

The TIMER1_COMPA_vect function is also interesting.  Its prototype is declared at the top of os.c with the signal and naked attributes.  The signal attribute means that the function is an ISR.  It gets mapped into the interrupt vector table, and normally the compiler would generate a context switch when the function is called and would generate a reti (“return from interrupt”) instruction instead of ret when the function returns.  Except that the function is also declared as naked, which tells the compiler not to generate any context switch code or return instruction.  The addition of the naked attribute is why it’s declared using the prototype+implementation style instead of using the ISR macro.  (N.b. avr-libc allows the option to define extra attributes in the ISR macro; I’m not sure why the RTOS doesn’t use that functionality, it might not have existed when the RTOS was written, or the students might not have known about it, or maybe it legitimately doesn’t work)

The timer interrupt operates much like an API call that’s automatically generated on the RTOS tick.  It stores the interrupted task’s execution context, sets the “tick” kernel request, switches to the kernel context, and then returns into the kernel.  The main difference is that the RTOS implemenation has to set the I bit in the SREG value that gets pushed to the task context.  You’ll recall that the I bit is the global interrupt enable bit, so setting it ensures that global interrupts are enabled when the task starts back up.  We can do that because we know that global interrupts were enabled, otherwise the timer ISR wouldn’t have been able to run.  We have to do that because the I bit gets cleared automatically inside of a function decorated with the “signal” attribute, so when the task context was pushed onto the stack at the start of the interrupt handler the I bit was stored as 0 when it should have been stored as 1.

Finally, the timer interrupt’s naked attribute allows us to use the ret instruction instead of reti when the function returns into the kernel.  The difference between the ret and reti instructions is that the reti instruction sets the I bit when it executes.  We don’t want interrupts to be enabled in the kernel, so we do a regular return that leaves the I bit cleared.

OS_Init

The RTOS initialization function bears some explanation.  It’s mostly straightforward: it sets up the timer that generates the RTOS tick, initializes the task descriptors, does some error checking, and at the end it enters the kernel main loop.  It also creates two tasks: the idle task, and the main task.

The int main() function is implemented in os.c, and simply calls OS_Init.  OS_Init, in turn, creates a system task for the function r_main(), which is the user-implemented startup function.  In r_main() the user will set everything up for the application, do initialization stuff, create the tasks that will run, etc.

You’ll notice that the r_main() function returns, which is contrary to the rule that the main() function should never return.  In this case it’s okay.  Obviously the r_main() function needs to return because it’s run by OS_Init as a system task, and system tasks have the highest priority and cannot be pre-empted.  If r_main() didn’t return then it would starve out all the lower-priority tasks.  The main() function also doesn’t need an infinite loop after the call to OS_Init because OS_Init enters the kernel main loop.  In other words, OS_Init isn’t just an initialization function; the whole program runs inside of OS_Init, and OS_Init will never return to the main() function.  Once the RTOS is running the system can be in three states: in a user task, in the kernel, or in the idle task.

Task Stacks

Each task has its own stack located in its task descriptor.  By default 256 bytes are reserved for the task stack.  The task descriptors are held sequentially in the .bss section of data memory, which means that stack overruns will result in other task descriptors and/or global variables getting clobbered.  Usually stack overruns don’t happen–256 bytes is a lot if you think about it–but there are a few things that you should watch out for:

  • The printf family of functions (including snprintf, vfprintf, etc.) use quite a lot of stack space
  • Floating point emulation use a lot of stack space (you’re not using floating point numbers, right???)
  • If an interrupt occurs it will push the task context onto the stack, which is an unpleasant surprise if the stack is nearly full

There are ways to avoid the problem.  The obvious one is to make the stack bigger, which you can do as long as there’s enough data memory left for the kernel stack.  Another way is to offload some of the stack space into static memory (the .bss and .data segments) by declaring local variables as static or by using global variables.  I discuss that further in this article.

When a task is created, the kernel_create_task function populates the new stack with some data.  At the bottom of the stack the RTOS writes a call frame for the Task_Terminate() function, then a call frame for the task function.  The latter is required because the task function is never actually called, it is returned into from the kernel via exit_kernel.  The former is required in case the task function returns: when the ret instruction executes it will essentially return into the start of Task_Terminate().  Task_Terminate() will generate a kernel request to clean up the dead task descriptor and switch to another task.  Above those two call frames, the task stack is initialized with a default context so that when the kernel exits into the task for the first time the context gets loaded appropriately by exit_kernel.

Extended Addressing

The AVR architecture stores instructions in the program (“text”) section of memory, held in Flash.  Program memory is normally accessed by two-byte words addressed with 16-bit addresses.  Therefore the maximum memory size that can be addressed is 2 * 216 bytes, or 128 KB of Flash.

Except Atmel sells AVR CPUs with more Flash than that, e.g. the ATmega2560 that is used in the Arduino Mega 256 and our STK600 dev boards.  What the heck, Atmel???

AVR8 controllers with more than 128 KB of Flash use extended addressing to address data in the upper part of memory.  In order to address 256 KB of program memory the CPU effectively uses 17-bit addresses, with the extra bit stored in the EIND (extended indirect) register.  The compiler calls functions in extended memory using special instructions (e.g. eicall) that concatenate the EIND register with the 16-bit function pointer.  You can find more details on the AVR instruction set here [PDF].

Typically programmers don’t need to worry about extended addressing.  It’s rare that code will even exceed the 128 KB boundary, notwithstanding large data tables that you might offload from data memory into Flash.  If there is more than 128 KB of code, the compiler will handle everything.  Unless you’re using the RTOS of course, which doesn’t support 17-bit addresses by default.  The RTOS needs to store the EIND register with task contexts, and needs to initialize the call frames in the task contexts to use a 3-byte return address instead of 2 bytes.  A former CSC 460 student modified the RTOS to support extended addressing.  His report is here [PDF], and his code can be found here [ZIP].