Skip to content

3. Scheduling

Introduction

  • On one Simple single processor processor system1:
    • only one single process can run at a time, other processes have to wait for the processor to be free.
    • A process executes until it has to wait for an I/O operation.
    • the CPU stays idle until the I/O comes back and finishes the process execution.
  • On a multiprogramming system1:
    • only one single process can run at a time, other processes have to wait for the processor to be free.
    • A process executes until it has to wait for an I/O operation.
    • OS ensures that the processor is running all the time, to maximize CPU utilization.
    • If there are other available processes, the OS will choose one to run, and put the current process on a queue until I/O is done, then it will be rescheduled.

Mechanism: Limited Direct Execution 2

  • time sharing:
    • The way to implement virtual CPU (multiprogramming).
    • challenges of time-sharing:
      1. Performance: how to make the system run fast?
      2. Control: OS must control every aspect of every process.

Direct Execution

  • direct execution for a process(directly execute the process):
    • OS:
      • OS creates an entry in the process table for the process.
      • OS allocates memory for the process.
      • OS load program into memory.
      • OS sets up stack with argc/argv.
      • OS clears registers.
      • execute call to the program entry point (main function). (control moves to the process).
    • Process or program:
      • run until the main function returns.
      • Execute return from main. (control moves back to the OS).
    • OS again:
      • Free memory of process.
      • remove process from process table.

Direct Execution problems: 1. Restricted operations

  • what to do if the process wants to restricted operations (e.g. I/O request)?
  • We can not give the process control over OS to do those restricted operations.
  • instead we can use system calls to do those restricted operations.
  • during system calls run on kernel mode, once the system call is done, the process will run on user mode back again.
  • switching to kernel mode is called trap, and switching back to user mode is called return from trap.
  • Hardware needs to maintain enough of the caller’s registers when executing a trap, so that the trap can return to the caller correctly.
  • on X86:
    • the processor will push the program counter, flags, and registers onto a per-process kernel stack before executing a trap.
    • When executing the return-from-trap instruction, the processor will pop the registers and flags from the kernel stack, and restore the program counter, then resume execution in user mode.
  • When executing a system call, the process can not know the memory location of the system call procedure, since a process can not access the kernel memory.
  • The kernel sets up a trap table during the boot time, that maps traps handler to their memory location, the trap table may different in the next boot.
  • system calls are usually identified by a number, and the trap table is an array of function pointers.
  • user code (process) can not specify the exact memory location of the system call procedure, but it can specify the trap number where the OS maps that number to the memory location of the system call procedure.
  • Processes can not alter the trap table, nor can they access the kernel memory.
  • The trap table works to limit the direct execution as follows:

    1. OS during boot (kernel mode):
      • OS initializes the trap table.
    2. Hardware:
      • remembers the memory location of trap syscall handlers.
    3. OS when running a process (kernel mode):
      • creates entry in process table, allocates memory, loads program into memory, sets up stack, clears registers.
      • fill kernel stack with the roisters, flags, and program counter.
      • return from trap to the process.
    4. Hardware again (in between kernel mode and user mode):
      • restores registers from kernel stack.
      • moves to user mode.
      • jumps to the program counter (now at main()).
    5. Process or program (user mode):
      • Run main until it finds a syscall.
      • trap into the OS (do a system call).
    6. Hardware again (in between user mode and kernel mode):
      • save regs to kernel stack.
      • move to kernel mode.
      • jump to the syscall handler (trap handler)
    7. OS again (kernel mode):
      • execute the syscall handler.
      • return from trap to the process.
    8. Hardware again (in between kernel mode and user mode):
      • restore regs from kernel stack.
      • move to user mode.
      • jump to the program counter (now at the next instruction after the syscall).
    9. Process or program (user mode):
      • continue running until it finds another syscall where it will trap again.
      • return from main via exit() trap.
    10. OS again (kernel mode):
      • free memory of process.
      • remove process from process table.
      • return from trap to the OS.

Direct Execution problems: 2. Switching between processes

  • When a process is running, it is taking up the CPU, and other processes have to wait (including the OS).
  • OS is not on processor, so it can not do anything.
  • ways for OS to gain control back from a process:

    1. Cooperative approach: wait for the process to do a system call:
      • OS trusts processes will behave correctly, and will not run forever.
      • processes that takes long time periodically give up the CPU (by making a system call) so the OS can run and decide what to do next.
      • OSes like this are called cooperative, they have a yield system call, which its only purpose is to give up the CPU to the OS back again.
      • Processes transfers control to the OS when the do something illegal like, dividing by zero, or accessing memory that is not allocated to it.
      • illegal operations will generate a trap.
      • problems: what if a malicious process never calls yield, or never does an illegal operation, or stuck in an infinite loop?
    2. Non-Cooperative approach: OS takes control.
      • a timer interrupt is programmed to occur at regular intervals.
      • timer will raise an interrupt, which causes the process on CPU to halt and traps into a pre-configured interrupt handler.
      • during boot, OS must start the timer.
  • Now OS has gained back control, either cooperatively by a system call, or non-cooperatively by a timer interrupt, the next step is to decide which process to run next, or shall the current process continue running; this task is done by the scheduler which is a part of the OS.

  • switch context:
    • context switching is conceptually simple, OS saves the registers of the current process to the kernel stack, and restores the registers of the next process from the kernel stack, then return from trap to the next process, and the new process will continue running from where it left off.
    • There are 2 saves operation that happen during context switching:
      • when the timer interrupt occurs, the processor saves the registers of the current process to the kernel stack.
      • during the timer interrupt handler, the OS decides to switch to another process, so it saves the registers of the current process to the process stack.
      • OS loads the registers of the next process from the process stack into the kernel stack then returns from trap to the next process.

context switching

  • if an interrupt happens during another interrupt handling (e.g. during context-switching), that leaves the OS in an inconsistent state, OSes usually do things to prevent this:
    • disable interrupts during context switching.
    • complex locking mechanism to protect concurrent access to shared data structures.

Scheduling 3

  • workload: the set of all processes that are running on the system. Processes are called jobs.
  • fully-operational scheduling discipline has the following assumptions:

    1. each job runs for the same amount of time.
    2. all jobs arrive at the same time.
    3. once started, each job runs till completion.
    4. all jobs use only CPU (no I/O).
    5. run-time for each job is known.
  • non-preemptive scheduling: once a process is running, it will continue running until it finishes, or it is blocked by I/O.

  • preemptive scheduling: once a process is running, it may be interrupted by the scheduler, and the scheduler may choose to run another process.

Scheduling Metrics

  • trunarround time: the time between when a job arrives in the system and when it finishes execution. it is a performance metric. Tturnaround = Tfinish - Tarrival.
  • fairness: the degree to which the system treats all jobs equally. Measured by Jain’s fairness index.
  • response time: the time between when a job arrives in the system and when it first gets to run. it is a performance metric. TResponse = Tfirstrun - Tarrival.

Scheduling Algorithms: FIFO

  • FIFO (first-in-first-out) is the simplest scheduling algorithm, non-preemptive, sometimes called FCFS (first-come-first-served).
  • simple, easy to implement.
  • average turnaround time: T = (n+1) * (T1 + T2 + ... + Tn) / (2n), where n is the number of jobs, and Ti is the run time of job i.
  • convoy effect:
    • a number of relatively short jobs (lightweight resource consumers) get queued after (behind) a long job (heavyweight resource consumer), which causes the long job to run for a long time, and the short jobs to wait for a long time.
    • the average turnaround time is high.

Scheduling Algorithms: SJF (Shortest Job First)

  • SJF is a non-preemptive scheduling algorithm, it selects the job with the shortest run time to run next.
  • average turnaround time looks better in ideal conditions (like, short jobs always arrive before long jobs), but it is not practical.
  • problems arise when the short jobs arrive while a long job is already running. in this case the newly arrived short job will have to wait for the long job to finish, which causes the average turnaround time to be high.

Scheduling Algorithms: STCF (Shortest Time to Completion First)

  • STCF is a preemptive scheduling algorithm, sometimes called Preemptive shortest Job First (PSJF); it selects the job with the shortest remaining run time to run next.
  • each time a new job enters the system, STCF scheduler will calculate the remaining run time for each job, and select the job with the shortest remaining run time to run next.
  • average turnaround time is optimal.
  • bad for response time, since one job has to wait for all another small jobs to finish before it can run.

Scheduling Algorithms: Round Robin (RR)

  • Round Robin is a preemptive scheduling algorithm, it selects the next job to run in a circular fashion.
  • it runs each job for a time slice, sometimes called scheduling quantum, then it switches to the next job. sometimes called time slicing.
  • time slice must be a multiple of the interrupt timer interval.
  • RR tends to give equal time to each job, which is fair.
  • RR tends to extend the running time for each job to the max.
  • RR is fair, good for response time. bad for turnaround time.
  • The shorter the time slice, the better the response time. The longer the time slice, the better the turnaround time.
  • Time slice should be long enough to minimize the context switching overhead, but short enough to minimize the convoy effect.

I/O

  • all programs perform I/O, and I/O is a slow operation.
  • current job won’t be using the CPU during the wait for I/O response, so the job is blocked, and the scheduler will select another job to run.
  • When the I/O completes, an interrupt will rise, OS must move the job to the ready queue.

Scheduling: The Multi-Level Feedback Queue (MLFQ) 3

  • MLFQ was first introduced in 1962 by Corbato el al. in a system called CTSS (Compatible Time-Sharing System) and later Multics.
  • fundmental problems that MLFQ sloves:
    • optimizing the response time.
    • system needs to stay active and responsive all the time => minimize response time.
  • Basic Rules of MLFQ:
    1. MLFQ has different queues, each queue has a different priority level.
    2. at any given time, a job that is ready to run is on a single queue.
    3. MLFQ uses priority level to decide the next job to run.
    4. if 2 jobs exist in a queue (have the same priority level), the scheduler will use Round Robin to decide which job to run next.
  • MLFQ gives priority based on observed behavior of the job:
    • if the job usually gives up CPU while waiting for inputs it gets higher priority.
    • if the job usually uses CPU for a long time it gets lower priority.
  • MLFQ uses the history of the job to decide its priority level and predict its future behavior.
  • Rules in summary, for 2 jobs A and B
    • Rule 1: if Priority(A) > Priority(B), then A will run next.
    • Rule 2: if Priority(A) = Priority(B), then Run RR(A, B).
  • many systems use a form of MLFQ as their base scheduler, including:
    • BSD UNIX derivatives
    • Solaris
    • Windows NT and subsequent Windows operating systems

How Job priority changes over time

  • During the lifetime of a job, its priority level may change. MLFQ considers the current workload when changing the priority level of a job.
  • rules:
    • Rule 3: When a job enters the system, it is placed in the highest priority queue.
    • Rule 4a: if the job uses an entire time slice while running, it is moved to the next lower priority queue (-1).
    • Rule 4b: if the job uses less than an entire time slice while running, it stays the same priority queue.
    • Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced (i.e., it moves down one queue).

Problems With Our Current MLFQ

  • starvation: a job may be stuck in a low priority queue forever. If we have too many interactive (short jobs) they take the entire CPU time and the jobs with the lowest priority may never have a chance to run (they starve).
  • gaming the scheduler: the job does some sneaky stuff to trick the scheduler to give it the most time. as the following:
    • before the time slice finishes, give up the CPU (issue an I/O request), if you did it exactly after taking 99% of the CPU time, the job would still in the same priority.
  • program behavior may change over time: a job used to be long, but now it is short. the job used to be short, but now it is long. History data may not accurately reflect the current behavior of the job.
  • parameterization:
    • answers questions like:
      • how many queues should we have?
      • how long should the time slice be?
      • how often should we change the priority level of a job?
      • how often should we give a priority boost?
    • Most MLFQs give a different time slice for each queue. shorter time slice for higher priority queues, longer time slice for lower priority queues.
    • Example: Solaris MLFQ: has 60 queues, with time slices ranges from 20ms (highest priority) to x100ms (lowest priority), priority boost every 1 second.
    • Some schedulers reserve the highest priority queues for the OS.
    • Some schedulers allow users to modify the priority of a job, using the nice command for example.

The Priority Boost

  • priority boost:
    • periodically post the priorities of all jobs in the system.
    • Rule 5: After some time period S, move all the jobs in the system to the topmost queue.
  • voo-doo constants:
    • S: the time period between priority boosts.
    • Q: the number of queues in the MLFQ.
    • N: the number of jobs in the system.

References


  1. NESO Academy. (2019). Introduction to CPU Scheduling https://www.youtube.com/watch?v=EWkQl0n0w5M&t=69s 

  2. Arpaci-Dusseau, R. H., & Arpaci-Dusseau, A. C. (2018). Operating systems: three easy pieces (1.01 ed.). Arpaci-Dusseau Books. Retrieved June 16, 2022, from https://pages.cs.wisc.edu/~remzi/OSTEP/. Chapter 6: Mechanism: Limited Direct Execution. 

  3. Arpaci-Dusseau, R. H., & Arpaci-Dusseau, A. C. (2018). Operating systems: three easy pieces (1.01 ed.). Arpaci-Dusseau Books. Retrieved June 16, 2022, from https://pages.cs.wisc.edu/~remzi/OSTEP/. Chapter 7 + 8.