Abstraction and Process¶

Introduction¶

The OS does not run many programs at once but runs several in states of suspended animation and activates what is needed at the time it is needed. This is executed by a set of application program interfaces (API) as well as system calls, states, and some fancy scheduling.
Search This index for definitions of terms and concepts ³.

The Abstraction: The Process ¹¶

Process:
- is a running program.
Time Sharing: The ability of OS to run several programs at once by switching between process rapidly (virtualizing the CPU), each process owns the whole CPU for a fraction of time.
Space Sharing: The opposite of time sharing, where each resource is physically divided among its users. eg, disk is divided between files.
Virtualization:
- is the process of creating a virtual (rather than actual) version of something.
- to achieve virtualization, OS uses low level machinery (mechanisms), and high level intelligence (Policies).
- Mechanisms are low-level methods or protocols that implement a needed piece of functionality. eg, context switching.
- Policies are algorithms for making some kind of decisions within the OS. eg, scheduling.
- scheduling policy: the algorithm that the OS uses to decide which process to run next. and it uses information about history of the program, workload knowledge, performance metrics, etc to decide.
Machine state: what a program can read or update when it is running:
- address space: the set of memory locations that a program can access (read or write) for its data and code.
- registers: the state of registers in the CPU, especially:
  - The program counter (PC) or instruction pointer (IP) which tells the location of the next instruction.
  - The stack pointer (SP), Frame Pointer which are used to manage the stack for a function parameter, local variables and return addresses.
- I/O state: the state of the I/O devices that the program is using, and the list of files that the program has open.

Process API¶

functions must be provided by the OS to allow a program to create and manage processes:
- create: a new process
- destroy: a process
- wait: wait for a process to terminate
- misc: like. suspend, resume, get/set priority, etc.
- status: get the status of a process

How a Program becomes a Process (create a process)¶

The loading process is done before the process starts running.

loading a process to memory

program initially resides in a file on disk in executable format.
OS loads program code and static data into memory:
- OS reads all bytes of the file from disk and place them in memory. code and static data (initialized variables) are placed in into the address space of the process.
- Loading process can be:
  - eager: load the entire program into memory at once before running it. (old OSes).
  - lazy: load parts of the program into memory only when it is needed. (modern OSes). uses paging and swapping.
allocate memory for program’s runtime-stack or stack:
- stack is used to store local variables and function parameters and return addresses.
- OS allocates a stack segment in the process’s address space.
- OS will initialize the stack with arguments that are passed to the main() function, like argc and argv.
allocate memory for the program’s heap:
- heap is used to store dynamic memory or dynamically allocated data aka. variables that are assigned during the program run.
- OS allocates a heap segment in the process’s address space.
- OS will initialize a small heap first, then the heap grows while the program run and use memory.
- in C, the malloc() function is used to allocate memory in the heap. and the free() function is used to free the memory. OS will give the process more heap memory as long as the program requests it using malloc().
initialize related I/O tasks and file descriptors:
- file descriptors are used to represent open files.
- OS will initialize the file descriptors for the standard input, output, and error.
start the process:
- CPU calls the main() function of the program.
- CPU transfers control to the first instruction of the newly created process.
- Program starts executing.

Process States¶

there are 3 states for a process:

running: the process is executing instructions on the processor. aka. process has control of the CPU at the moment.
ready: the process is waiting to be assigned to a CPU. The process is ready to run, but the CPU is busy with another process.
blocked: the process is waiting for an event to occur, like I/O completion. The process ran and it can not continue until some event occurs. aka. when a process requires file from the disk that will need time to resolve, the process becomes blocked until the file is loaded into memory, which means that other processes can run (use the cpu) in the meantime.

events that change the process state:

process state transitions

scheduled: from ready to running.
descheduled: from running to ready.
I/O request: from running to blocked.
I/O completion: from blocked to ready.

OS Data Structures for Process Management¶

ProcessList: a list of all processes in the system saved as Process Control Block (PCB) aka. ProcessDescriptor.
The PCB is similar to the object below:

// the registers xv6 will save and restore
// to stop and subsequently restart a process
// register context
struct context { int eip; int esp; int ebx; int ecx; int edx; int esi; int edi; int ebp; };

// the different states a process can be in
enum proc_state { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };

// the information xv6 tracks about each process
// including its register context and state
struct proc {
    char *mem; // start of process's memory (location of first location in address space)
    uint sz; // size of process memory (bytes)
    char *kstack; // bottom of kernel stack for this process (end of stack in addr space)
    enum proc_state state; // process status (state)
    int pid; // process ID
    struct proc *parent; // parent process
    void *chan; // if non-zero, sleeping on chan
    int killed; // if non-zero, have been killed
    struct file *ofile[NOFILE]; // open files (file descriptors)
    struct inode *cwd; // current directory
    struct context context; // swtch() here to run process, save registers here when process is not running
    struct trapframe *tf; // trap frame for current syscall, current interrupt, or current exception
};

OS Process API (Process Control) ¹¶

fork() System Call:
- create a new child process within its parent.
- it is non-deterministic because the child process can run before or after the parent process.
- this causes issues with concurrency in multi-threaded programs.
- alternative: spawn() system call.
- details here: https://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
wait() System Call:
- parent waits for a child process to terminate.
- another similar system call is waitpid() which allows the parent to wait for a specific child process to terminate.
- wait makes the output of the program deterministic since the child will always run and exit first, then the parent will continue.
exec() System Call:
- runs process within the current process.
- it does not create a new process, it just replaces the current process with a new one. aka. replaces the current process’s stack(code and data), heap, and file descriptors with the new process’s ones and the process reinitialized.
- exec does not return if successful.
separation of fork() and exec() is important, because it allows the shell to run code after fork and before exec. This code is used to alter the environment of the process that is about-to-be-run (the forked or the child process).
shell is just a user program that runs other programs:
- examples: bash, zsh, tcsh, etc.
- shell prompts the user for a command.
- after you hit enter:
  1. the shell locates the program depends on the command you entered.
  2. shell calls fork() to create a new process.
  3. shell calls exec() to run the program in the child process.
  4. shell calls wait() to wait for the child process to terminate.
  5. after waiting, shell’s parent process terminates and the shell continues to prompt the user for another command.
shell example: run: $ wc p3.c > newFile.txt what happens?:
1. shell locates the program wc in the path.
  - wc is a program that counts the number of lines, words, and characters in a file. it exists in the /usr/bin directory. it takes 2 arguments: the file name and the output file name.
  - first argument is p3.c and the second argument is newFile.txt.
  - this will count the number of lines, words, and characters in the p3.c file and save the output in the newFile.txt file.
2. shell calls fork() to create a new process for the wc program.
3. shell closes the standard output file descriptor (fd 1) and opens the newFile.txt file. aka. altered the environment of the child process (of wc) before it runs.
4. shell calls exec() to run the wc program in the child process, and the output will be saved in the newFile.txt file.
5. shell calls wait() to wait for the child process to terminate.
6. shell terminates and the shell continues to prompt the user for another command.
kill() System call:
- send signal to a process.
- signals are used to interrupt a process: pause, die, another directives or imperatives.
- ctrl+c sends SIGINT signal to the process, which is the interrupt signal terminating the process.
- ctrl+z sends SIGTSTP signal to the process, which is the stop signal pausing the process.
signal() System Call:
- process uses this function to catch a signal, so that the process will get suspended and run a handler for the signal.
- this is useful to deliver external events to a process or process group.
- only users can send signals to processes, not the kernel.
- users can only control their own processes.

CMDs for Process Management¶

ps: list all processes in the system.
top: list all processes in the system and their CPU usage.
kill -9 <pid>: kill a process with the given pid.
killall <process name>: kill all processes with the given name.

POSIX ²¶

POSIX: Portable Operating System Interface for UNIX :
- a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.
- any software that conforms to POSIX is portable (compatible) to any other POSIX-conforming operating system.
- for this reason, all applications that run on Linux and Unix-based systems behaves the same on Debian, macOS and other POSIX-compliant operating systems.
Before POSIX, each OS has its own system interface and environment, which made harder to develop a program that can run on different OSes.
POSIX History:
- POSIX originated in 1988
- POSIX.1 standard was internationally accepted in 1990 as ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990.
- POSIX.2 standard was internationally accepted in 1992 as IEEE Std 1003.2-1992.
- The family of standards that POSIX refers to is known as IEEE Std 1003.n-yyyy.
- currently more than 20 standards and drafts under the POSIX umbrella.
POSIX.1 defines:
1. application portability.
2. C interface
3. system services behavior in fundamental tasks: process creation and termination, env manipulation, file and directory access and simple I/O.
POSIX.2 defines:
1. the command interpreter.
2. portable shell programming
3. users environment and related utilities.
POSIX Defined Standards:
1. The C API: POSIX uses the C language to define the system interface. So programs are portable to other operating systems at the source code level.
2. General concepts:
  - rules for writing programs, like: safety for the initialization of pointer types and concurrent executions.
  - rules for memory synchronization, like: restricting memory modification when it’s already in use.
  - security protection for file/directory access.
3. File Formats:
  - POSIX defines rules for formatting strings that we use in files, standard output, standard error, and standard input.
  - The format can contain regular characters, escape sequence characters, and conversion specifications.
4. Environment Variables: see Environment Variables below.
5. Locale:
  - A locale defines the language and cultural convention that is used in the user environment.
  - Each locale consists of categories that define the behavior of software components, such as date-time formatting, monetary formatting, and numeric formatting.
  - Locale API defined by POSIX conforms to C locale library.
  - reserved locale env vars are listed under Environment Variables below.
6. character set:
  - A character set is a collection of characters that consists of codes and bit patterns for each character.
  - POSIX recommends that our implementation should contain at least one character set and a portable character set.
  - The first eight entries in the character set shall be control characters.
  - The POSIX locale should include at least 256 characters from both portable and non-portable character sets.
7. Regular Expressions:
  - A regular expression, or RE, is a string of characters that defines a search pattern for finding text.
  - POSIX implements the C library RE which also used by tools such awk, grep, sed, and vi.
  - POSIX elements 2 types of RE:
    - Basic RE (BRE): a subset of the extended RE (ERE).
    - Extended RE (ERE): a superset of the basic RE (BRE).
8. Directory Structure:
  - Linux distributions conform to the Filesystem Hierarchy Standard (FHS)
9. Utilities:
  - POSIX defines several conventions for programmers about how we should implement our utility programs as:
  - utility_name [-a][-b][-c option_argument][-d|-e][-f[option_argument]][operand...] <parameter name>
  - details are here

Environment Variables¶

An environment variable is a variable that we can define in the environment file, which the login shell processes upon successful login.
As a convention, the variable name should merely contain uppercase letters and an underscore.
environment variable, can only be a string as defined in the portable character set.
Environment variables should avoid the overwriting the environment vars of the standard libraries.
Environment variables are case sensitive.
Environment variables should avoid the reserved environment variables:
- COLUMNS: defines the number of columns(width) in the terminal.
- HOME: defines the home directory of the user.
- LOGNAME: defines the login name of the user.
- LINES: defines the number of lines(height) in the terminal.
- PATH: defines the search path for the executable files. aka. binary colon-separated paths for executables.
- PWD: defines the current working directory.
- SHELL: defines the current shell program in use.
- TERM: defines the terminal type.
Environment variables reserved for each Locale category:
- LC_TYPE: defines the locale for the character classification and conversion.
- LC_COLLATE: defines the locale order for characters.
- LC_MOMENTARY: defines the locale for date and time formatting.
- LC_NUMERIC: defines the locale for numeric formatting.
- LC_TIME: defines the locale for date and time formatting.
- LC_MESSAGES: defines the locale for message translation.

OSes compatibility with POSIX¶

Linux: older version yes. newer version partially compatible.
Darwin: complete compatibility. Darwin is the core of macOS, iOS, watchOS, and tvOS and other Apple products.
Windows NT:
- Microsoft Windows doesn’t conform to the standard at all because its whole design is completely different than UNIX-like operating systems.
- However, we can set up a POSIX compliant environment by using the WSL compatibility layer or Cygwin.
Others: operating systems that are POSIx certified:
- AIX
- HP-UX
- Oracle Solaris

Multitasking ⁴¶

Multitasking:
- the ability of a computer to run multiple programs at the same time.
- old OSes partially supported multitasking by using time-sharing (a process takes the whole CPU for a specific time, then give it back to the OS to give to another process).
- modern operating systems have complete multitasking capability, numerous programs can run concurrently without interfering with one other
Types of Multitasking:
1. Preemptive multitasking
2. Cooperative multitasking
Preemptive multitasking:
- OS controls and decides how much time each process gets to run.
- Used in desktop OSes, like: Unix, Windows NT, 95.
- OS x uses Proactive multitasking where the OS notifies the process that it is about to be interrupted, and other process will take the processor.
Cooperative multitasking:
- the current process must voluntarily give up the processor.
- it uses the task_yield() function to give up the processor. (the process is not interrupted, it voluntarily gives up the processor).
- when task_yield() is called, the context_switch() starts.
Advantages of multitasking:
- manage several users simultaneously.
- virtual memory. (each process has its own virtual memory space).
- good reliability. (if one process crashes, the other processes are not affected).
- secured memory. (each process has its own memory space).
- time sharable. (each process has its own time slice).
- background processing. (the process can run in the background while the user is working on another process).
- optimize computer resources.
- run several programs at the same time.
Disadvantages of multitasking:
- processor binding. (the processor is shared between processes, so everything depends on the processor capabilities). programs get slower as the number of processes increases.
- memory binding. (the memory is shared between processes, so everything depends on the memory capabilities). programs get slower as the memory gets full.
- CPU heat up.

Process Scheduling ⁵¶

Process scheduling:
- it is the activity of the process manager that handles the removal of the running process from the CPU and the selection of next process on the basis of a particular strategy.
- it is an essential part of multiprogramming OSes.
Categories of Process Scheduling:
1. Process Scheduling Queues
2. Two-State Process Model

Process Scheduling Queues¶

OS maintains a queue of all Processes Control Blocks (PCBs) in the Process Scheduling Queue.
OS maintains a queue for each state of the process, all processes in the same state are in the same queue.
when a process change its state, it is moved to the corresponding queue.
some important queues OS maintains:
- Job Queue: contains all processes in the system.
- Ready Queue: contains all processes that are ready to run. new processes are added to the end of this queue.
- Device Queue: contains all processes that are waiting (in blocked state due to I/O request waiting) for a device to become available.
- Run Queue: contains only one process at a time for each processor core. the process that is currently running on the CPU.
different queue policies are used. such as FIFO, Priority, Round Robin, etc.

Two-State Process Model¶

refers to running and not running states.
there are 2 queues:
- Running Queue: contains all processes that are currently running on the CPU, new processes are added to the end of this queue.
- Not Running Queue:
  - contains all processes that are waiting their turn to execute.
  - implemented as Linked List.
the dispatcher maintains the queues:
- When the process in running is interrupted, it is moved to the end of the running queue.
- If the process is completed or aborted, it got discarded.
- Then the dispatcher selects the next process from not running queue and moves it to the running queue.

Schedulers¶

Schedulers are special system software which handle process scheduling in various ways. Their main task is to select the jobs to be submitted into the system and to decide which process to run next.
there are 3 types of schedulers:
1. Short-term scheduler (CPU scheduler, dispatcher)
2. Medium-term scheduler (Swapper)
3. Long-term scheduler (Job scheduler)

Schedulers

Short-term schedulers - dispatchers¶

called CPU schedulers or dispatchers.
Its main objective is to increase system performance in accordance with the chosen set of criteria.
It is the change of ready state to running state of the process, so this scheduler selects a process from the ready queue and moves it to the running queue.
choses the process to run next.
faster than other schedulers.

Medium-term schedulers - swappers¶

called Swapping schedulers.
moves thr process from main memory to secondary memory and vice versa.
reduces the degree of multiprogramming.
in-charge of handling the swapped-out processes.
it moves the process with suspended state to the secondary memory reducing the number of processes in the main memory.
process get suspended usually while waiting for I/O operations, where the process can not move foreword until the I/O operation is completed.

Long-term schedulers - job schedulers¶

called Job schedulers.
determines which programs are admitted to the system for processing.
it selects processes from the queue and loads them into memory for execution.
Process loads into the memory for CPU scheduling
The primary objective of the job scheduler is to provide a balanced mix of jobs, such as I/O bound and processor bound.
If the degree of multiprogramming is stable, then the average rate of process creation must be equal to the average departure rate of processes leaving the system.
On some systems, the long-term scheduler may not be available or minimal.
Time-sharing operating systems have no long term scheduler.
When a process changes the state from new to ready, then there is use of long-term scheduler.

Context Switching¶

context switching is the mechanism to store and restore the state or context of a CPU in Process Control block so that a process execution can be resumed from the same point at a later time.
Context switching is an essential part of a multitasking operating system features.
When the scheduler switches the CPU from executing one process to execute another, the state from the current running process is stored into the process control block.
After this, the state for the process to run next is loaded from its own PCB and used to set the PC, registers, etc.
At that point, the second process can start executing.

System Calls ⁶¶

system call is a way for a user program to interface with the operating system. aka. a request from computer software to an OS’s kernel.
system call can be written in assembly language or a high-level language like C or Pascal.
When an application creates a system call, it must first obtain permission from the kernel by sending an interrupt request, which pauses the current process and transfers control to the kernel.
Most operating systems launch a distinct kernel thread for each system call to avoid bottlenecks.
Types of System Calls:
1. Process Control: create, terminate, suspend, resume, etc.
2. File Management: create, delete, open, close, read, write, etc.
3. Device Management: request, release, read, write, getDeviceAttributes etc.
4. Information Maintenance: get system data, set time or date, get time or date, set system data etc.
5. Communication: create connection, delete connection, send message, receive message, etc.
Windows vs Linux system calls:

Windows vs Linux system calls

References¶

Arpaci-Dusseau, R. H., & Arpaci-Dusseau, A. C. (2018). Operating systems: three easy pieces (1.01 ed.). Arpaci-Dusseau Books. Retrieved June 16, 2022, from https://pages.cs.wisc.edu/~remzi/OSTEP/ . Chapter 3 – Dialogue, Chapter 4 – Processes, and Chapter 5 – Process API ↩↩
Baeldung. (2021, November 9). A guide to POSIX. https://www.baeldung.com/linux/posix ↩
The Open Group Base Specifications Issue 6 https://pubs.opengroup.org/onlinepubs/009695399/nfindex.html ↩
Multitasking. (n.d.). javaTpoint. https://www.javatpoint.com/multitasking-operating-system ↩
Operating system - Process scheduling. (n.d.). Biggest Online Tutorials Library. https://www.tutorialspoint.com/operating_system/os_process_scheduling.htm ↩
System calls in operating system. (n.d.). javaTpoint. https://www.javatpoint.com/system-calls-in-operating-system ↩

Abstraction and Process¶

Introduction¶

The Abstraction: The Process 1¶

Process API¶

How a Program becomes a Process (create a process)¶

Process States¶

OS Data Structures for Process Management¶

OS Process API (Process Control) 1¶

CMDs for Process Management¶

POSIX 2¶

Environment Variables¶

OSes compatibility with POSIX¶

Multitasking 4¶

Process Scheduling 5¶

Process Scheduling Queues¶

Two-State Process Model¶

Schedulers¶

Short-term schedulers - dispatchers¶

Medium-term schedulers - swappers¶

Long-term schedulers - job schedulers¶

Context Switching¶

System Calls 6¶

References¶

The Abstraction: The Process ¹¶

OS Process API (Process Control) ¹¶

POSIX ²¶

Multitasking ⁴¶

Process Scheduling ⁵¶

System Calls ⁶¶