24 Oct 2025

Virtual memory in RISCV

We introduce a paging unit between physical memory and the CPU, which is unique for every process.

RAM is split into multiple page frames (usually 4 KiB). The virtual memory is also split into multiple chunks of 4KiB size. These two chunks are then mapped in some fashion which is stored in the process page table.

While scheduling, the program has an active page table that is maintained during the program's runtime. In some cases, pages from multiple programs are loaded simultaneously to the memory.

Demand Paging

Virtual memory takes advantage of the fact that not all blocks need to be loaded at once for the program to be executed. Hence, the paging table also has a value called present bit that tells you if the page is present in the CPU or not.

If a page is not loaded when it was supposed to be, the program raises an interrupt called page fault exception. There is a page fault exception handler that takes care of loading the page and fill the page table. It may remove a page frame of some other process if necessary.

There is another bit that keeps track of whether or not a page needs to be written back to swap because of data updation. If the dirty bit is set to 1, we write stuff back to the swap. Else we don't to save memory operations.

There are also protection bits for each page to indicate the permissions of each page. These are used for maintaining stuff like the code being readable, stack being non-executable, etc.

Usually there are 2 level paging, where one table has the address of another sub-table and there is an offset that are used together to obtain the memory.

Shared memory

If two programs have the same page frame, then they share the same memory physically that lets them have a common shared memory (eg. glibc sharing, VDSO in linux). You can also duplicate a page within a program by pointing two blocks to the same frame.

2 level paging

The virtual address has the first 10 bits as offset from the page table base register. This gives you the exact address of the page table from the page directory. The next 10 bits store the offset from the base of the page table. From this value the first 22 bits of the physical address are obtained. The last 12 bits for both virtual and physical address are the same. This scheme supports page directory sizes of 4KB and 4 MB.

For the satp register (the page table base register), the MSB will indicate that translation is on or not. The next 9 bits are just meant for address space separation for different processes. The next 22 bits hold the address of the first level directory of the page translation. These 22 bits have to be left shifted by 12 bits (which happens to be the page size as well) to get a 34 bit address that points to the base directory.

The page table format is as follows:

PPN[1]	12: physical page number
PPN[0]	10: physical page number
RSW	2: bits reserved for OS
D	1: dirty bit (0 for non-leaf),
A	1: Accessed (0 for non-leaf)
G	1: Global
U	1: user bit
X	1: execute
W	1: write
R	1: read
V	1: Valid

Accessed: There are two somewhat complicated schemes that juggle how the page is updated and whether or not it is still valid.
Global: Set if the mapping is valid for all virtual address spaces. Used by the OS to optimize
Valid: it is used for caching in TLBs. Implementations can cache both legally but ideally only if valid bit is 1, the translation is valid.

The PPN[1] + PPN[0] of the first level are used to get the address of the second level table. This too, has to be left shifted by 12 bits. Once we get that, we use the PPN[1] + PPN[0] to get the first 22 bits of the 34 bit physical page address. The last 12 bits are the offset bits from the OG virtual page address. 34 bits imply that a total of 16 GB is addressable and supported in 32 bit RISCV architecture.

For a non-leaf page table entry, all three of the read-write-execute bits are 0. If a page is writable it also has to be readable as well.

3 level paging

This is done for 64 bit architecture. This comes in two variants:

39 bit addressing
48 bit addressing

The satp for this scheme has 4 bits for mode specification and 16 bits for address space separation. The rest is used for the base address of the top level page directory. Once again, left shift by 12 bits before getting the root directory's address. The first 4 bits are:

0 if virtualization is off.
8 if it is 39 bit scheme.
9 if it is 48 bit scheme.
Others are reserved.

39 bit

This scheme can support page table directories of sizes 4KB, 2MB, 1GB. The most significant 25 bits are unused. There are a total of \(2^{27}\) page table entries . Once again, last 12 bits are the same.

The page table entry is as follows:

Reserved	10: reserved for future use
PPN[2]	26: physical page number
PPN[1]	12: physical page number
PPN[0]	10: physical page number
RSW	2: bits reserved for OS
D	1: dirty bit (0 for non-leaf),
A	1: Accessed (0 for non-leaf)
G	1: Global
U	1: user bit
X	1: execute
W	1: write
R	1: read
V	1: Valid

Once again, all of PPN[2] + PPN[1] + PPN[0] bits will be used to find the page directory beginning in the subsequent levels. Also, left shift by 12 bits before adding the offset from the VA.

TLBs

These are special memory banks that store the frequently used mappings of virtual memory and physical memory.

There are three types of TLB-cache combinations:

Physically indexed physically tagged
In this case the virtual address is first looked up in the TLB. If the TLB leads to a miss, it then goes on to check the page tables which are separate memory operations of their own.

If the TLB gives a hit, then it goes on to check the cache, which is again a hit or a miss.

This is slow as the cache needs to wait for the TLB to finish its operations and then proceed further.

This is useful for low level caches though since they are rarely accessed. There is no overhead of translations in case of TLB misses as it was already translated for VIPT. Basically TLB misses will be fewer in this case.

PROS:
- Simple setup, no aliasing problems like VIVT
- Cache access is slower due to the requirement of translation first.
Virtually indexed virtually tagged
If only virtual addresses are used to calculate both the index and the tag.

PROS:
- Cache access if fast since translation isn't needed.
CONS:
- Same physical memory might be mapped by different virtual addresses leading to aliasing of the memory in the cache.
Virtually indexed physically tagged
Used for L1 /L2 cache. TLB calculates the physical address to get the tag for the cache. the index is obtained from the virtual address.
- Parallel lookup for cache and TLB
- Since tag is based on physical address it doesn't lead to aliasing problems
- However, since the index is based on virtual address it won't need to slow down for a TLB lookup

InnocentZero's Treasure Chest

Virtual memory in RISCV