Skip to content

Latest commit

 

History

History
96 lines (82 loc) · 4.18 KB

File metadata and controls

96 lines (82 loc) · 4.18 KB
  1. Address Translation and the Need for TLBs

    • Paging requires address translation from virtual to physical addresses
    • Accessing the page table in memory for every translation is slow
    • TLBs (Translation Lookaside Buffers) are hardware caches that store recently used virtual-to-physical address mappings
  2. TLB Basic Algorithm

    • For a virtual address, the hardware extracts the virtual page number (VPN)
    • It searches the TLB for the VPN translation (TLB hit or miss)
    • On a TLB hit, the physical frame number (PFN) is retrieved from the TLB entry and combined with the offset to form the physical address
    • On a TLB miss, the hardware walks the page table to fetch the translation, updates the TLB, and retries the instruction
  3. TLB Hit/Miss Example

    • Example of accessing an array in virtual memory, showing TLB hits and misses based on spatial locality
  4. Software-managed vs. Hardware-managed TLBs

    • Software-managed TLBs raise an exception on a miss, and the OS handles the page table walk and TLB update
    • Hardware-managed TLBs have the hardware walk the page table and update the TLB on a miss
  5. TLB Contents and Organization

    • TLB entries contain the VPN, PFN, valid bit, protection bits, and other fields like an address space ID (ASID)
    • TLBs are typically fully associative and parallel-searched
  6. Context Switches and TLB Management

    • On a context switch, the TLB entries from the previous process are not valid for the next process
    • Solutions: flush the TLB on context switches, or use an ASID field to distinguish translations from different processes
  7. TLB Replacement Policies

    • When installing a new TLB entry, an existing entry needs to be replaced
    • Common policies: Least Recently Used (LRU), random replacement
  8. Real TLB Example (MIPS R4000)

    • Example of a real TLB entry from the MIPS R4000 architecture
    • Fields like VPN, PFN, global bit, ASID, dirty bit, etc.
    • Instructions to manage the software-managed TLB (TLBP, TLBR, TLBWI, TLBWR)
  9. TLB Performance Considerations

    • TLBs make virtual memory practically as fast as physical memory in the common case (TLB hits)
    • Performance can degrade if the program's working set exceeds the TLB coverage (capacity misses)
    • Supporting larger page sizes can increase the effective TLB coverage for some workloads
  10. Other TLB Issues

    • TLB access can be a bottleneck in the CPU pipeline, especially with physically-indexed caches
    • Virtually-indexed caches avoid the translation step on hits but introduce other complexities

The key takeaway is that TLBs are critical hardware components that accelerate address translation and enable practical use of virtual memory by caching recent translations and avoiding the overhead of page table walks in the common case.

Questions & Answers

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>
#include <pthread.h>

#define PAGESIZE (4096)
#define ARRAY_SIZE (4096 * 4096)

void pin_to_cpu0() {
    cpu_set_t cpu_set;
    CPU_ZERO(&cpu_set);
    CPU_SET(0, &cpu_set);
    pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_set);
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        printf("Usage: %s <num_pages> <num_trials>\n", argv[0]);
        return 1;
    }

    int num_pages = atoi(argv[1]);
    int num_trials = atoi(argv[2]);
    int *array = calloc(ARRAY_SIZE, sizeof(int));

    // Initialize the array
    for (int i = 0; i < ARRAY_SIZE; i++)
        array[i] = 0;

    // Pin the thread to CPU 0
    pin_to_cpu0();

    struct timeval start, end;
    long long elapsed_time = 0;

    for (int trial = 0; trial < num_trials; trial++) {
        int jump = PAGESIZE / sizeof(int);
        int i;
        long long sum = 0;

        gettimeofday(&start, NULL);
        for (i = 0; i < num_pages * jump; i += jump) {
            array[i] += 1;
            sum += array[i];
        }
        gettimeofday(&end, NULL);

        elapsed_time += (end.tv_sec - start.tv_sec) * 1000000LL + (end.tv_usec - start.tv_usec);
    }

    printf("%d %lld\n", num_pages, elapsed_time / num_trials);
    free(array);
    return 0;
}