14 Feb 1990

Processes in an operating system

A running program is called a process in the operating system. It can summarize everything about the program at a given state of execution. This includes memory locations, a couple of registers (all, in fact) and any open IO devices. Earlier OSes loaded a process data eagerly, but now it's used lazily. This involves paging and swapping.

A process has three states (4 actually)

Running: Running
Blocked: Waiting for IO on a file descriptor or something
Ready: Ready to execute but waiting for its turn.

Data structures involved

The proc struct stores a lot of things in xv6. First up is the set of registers.

struct context {
  int eip;
  int esp;
  int ebx;
  int ecx;
  int edx;
  int esi;
  int edi;
  int ebp;
};

Next, we store the state of the process at the time of descheduling.

enum proc_state { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };

The actual struct also holds the start and stop of the memory region that the process is occupying, alongside the kernel stack, the process id, the parent if any, open files, current working directory, the saved registers, and the trapframe.

struct proc {
  char *mem; // Start of process memory
  uint sz; // Size of process memory
  char *kstack; // Bottom of kernel stack for this process
  enum proc_state state; // Process state
  int pid; // Process ID
  struct proc *parent; // Parent process
  void *chan; // If !zero, sleeping on chan
  int killed; // If !zero, has been killed
  struct file *ofile[NOFILE]; // Open files
  struct inode *cwd; // Current directory
  struct context context; // Switch here to run process
  struct trapframe *tf; // Trap frame for the current interrupt
};

Such an entry is also sometimes called as a Process Control Block.

Process related syscalls

The fork() system call creates a new process in Linux. It returns twice! Once in parent process (where it returns the pid of the child process) and once in child process (where it returns 0). The only register that changes is the rax, where it's set to the PID of the child in the parent process and to 0 in the child process.

The wait() system call makes the parent wait till the child process has finished execution. This is useful for syncing. If called from a process that has no active unwaited-for child. There's also waitpid() that takes the pid of the child process and checks for that one specifically. wait() has unspecified order for two or more children. Both of them take an argument to a pointer to an int where they can store the return status of the child process. If NULL then it's discarded.

The exec() system call is useful when you want to replace the current process with another one without creating a new process. It replaces the code and data segment of the current process, refreshes the register entries and cleans up stack, heap and other stuff. Any other arguments are passed as arguments to the binary. A successful call to exec() never returns.

InnocentZero's Treasure Chest

Processes in an operating system