-
- Paging requires address translation from virtual to physical addresses
- Accessing the page table in memory for every translation is slow
- TLBs (Translation Lookaside Buffers) are hardware caches that store recently used virtual-to-physical address mappings
-
- For a virtual address, the hardware extracts the virtual page number (VPN)
- It searches the TLB for the VPN translation (TLB hit or miss)
- On a TLB hit, the physical frame number (PFN) is retrieved from the TLB entry and combined with the offset to form the physical address
- On a TLB miss, the hardware walks the page table to fetch the translation, updates the TLB, and retries the instruction
-
- Example of accessing an array in virtual memory, showing TLB hits and misses based on spatial locality
-
- Software-managed TLBs raise an exception on a miss, and the OS handles the page table walk and TLB update
- Hardware-managed TLBs have the hardware walk the page table and update the TLB on a miss
-
- TLB entries contain the VPN, PFN, valid bit, protection bits, and other fields like an address space ID (ASID)
- TLBs are typically fully associative and parallel-searched
-
- On a context switch, the TLB entries from the previous process are not valid for the next process
- Solutions: flush the TLB on context switches, or use an ASID field to distinguish translations from different processes
-
- When installing a new TLB entry, an existing entry needs to be replaced
- Common policies: Least Recently Used (LRU), random replacement
-
- Example of a real TLB entry from the MIPS R4000 architecture
- Fields like VPN, PFN, global bit, ASID, dirty bit, etc.
- Instructions to manage the software-managed TLB (TLBP, TLBR, TLBWI, TLBWR)
-
- TLBs make virtual memory practically as fast as physical memory in the common case (TLB hits)
- Performance can degrade if the program's working set exceeds the TLB coverage (capacity misses)
- Supporting larger page sizes can increase the effective TLB coverage for some workloads
-
- TLB access can be a bottleneck in the CPU pipeline, especially with physically-indexed caches
- Virtually-indexed caches avoid the translation step on hits but introduce other complexities
The key takeaway is that TLBs are critical hardware components that accelerate address translation and enable practical use of virtual memory by caching recent translations and avoiding the overhead of page table walks in the common case.
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>
#include <pthread.h>
#define PAGESIZE (4096)
#define ARRAY_SIZE (4096 * 4096)
void pin_to_cpu0() {
cpu_set_t cpu_set;
CPU_ZERO(&cpu_set);
CPU_SET(0, &cpu_set);
pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_set);
}
int main(int argc, char *argv[]) {
if (argc != 3) {
printf("Usage: %s <num_pages> <num_trials>\n", argv[0]);
return 1;
}
int num_pages = atoi(argv[1]);
int num_trials = atoi(argv[2]);
int *array = calloc(ARRAY_SIZE, sizeof(int));
// Initialize the array
for (int i = 0; i < ARRAY_SIZE; i++)
array[i] = 0;
// Pin the thread to CPU 0
pin_to_cpu0();
struct timeval start, end;
long long elapsed_time = 0;
for (int trial = 0; trial < num_trials; trial++) {
int jump = PAGESIZE / sizeof(int);
int i;
long long sum = 0;
gettimeofday(&start, NULL);
for (i = 0; i < num_pages * jump; i += jump) {
array[i] += 1;
sum += array[i];
}
gettimeofday(&end, NULL);
elapsed_time += (end.tv_sec - start.tv_sec) * 1000000LL + (end.tv_usec - start.tv_usec);
}
printf("%d %lld\n", num_pages, elapsed_time / num_trials);
free(array);
return 0;
}