Explore the Foundation ofComputer ArchitectureSystem Organization, CPU Cycles & Data Flows
|
0Core Topics
0Practice Checks
0Interactive Quiz
Your Progress: 0/12 Topics Completed
01
Introduction to Computer Architecture
Computer Architecture describes the structure, organization, and behavioral design of computer hardware systems. It outlines how instructions are parsed, how memory is laid out, and how computer components communicate.
Most modern general-purpose computers are designed according to the Von Neumann Architecture, published in 1945 by mathematician John von Neumann.
Von Neumann Bottleneck
Because the Von Neumann architecture shares a single system bus for both instruction fetch and data transfer operations, the CPU is frequently idle while waiting for instructions/data to transit memory. This throughput limit is called the **Von Neumann Bottleneck**.
Practice: Bottlenecks
What defines the main drawback of the standard Von Neumann architecture?
Answer: Shared bus bottleneck
Explanation: Because a shared bus is used for fetching both instructions and data, data transfers block instruction fetches, causing CPU idle states.
02
Computer Organization vs Computer Architecture
Although often used interchangeably, Computer Architecture and Computer Organization focus on distinct layers of system design:
Computer Architecture (Abstract): Describes the programmer-visible attributes of a system (Instruction Set Architecture - ISA). It defines data formats, registers, and memory addressing modes. *Example:* Implementing a new ISA instruction called `ADD`.
Computer Organization (Physical): Describes how architectural specifications are physically implemented using gate-level circuits, wires, clock cycles, and control signals. *Example:* Deciding whether to implement the `ADD` operation using a Carry-Lookahead Adder or Ripple-Carry Adder.
Practice: Arch vs. Org
Is defining the size of a register block an architectural or organizational decision?
Answer: Architectural
Explanation: Register quantities and sizes directly affect the programmer's visibility (ISA instructions reference these registers), meaning it is an architectural decision.
03
CPU Architecture
The Central Processing Unit (CPU) is the "brain" of the computer, executing instructions and managing component coordination. It contains three main sub-components:
Arithmetic Logic Unit (ALU): Performs basic mathematical calculations (addition, subtraction) and logical comparisons (AND, OR, NOT).
Control Unit (CU): The supervisor. It decodes instructions fetched from memory and routes control signals directing the ALU, registers, and memory blocks.
Registers: Tiny, ultra-fast memory storage locations located directly inside the CPU core.
Special Purpose Registers
Program Counter (PC): Holds the memory address of the next instruction to fetch.
Instruction Register (IR): Holds the instruction code currently being decoded/executed.
Memory Address Register (MAR): Holds the memory address currently being read from or written to.
Memory Data Register (MDR): Holds the actual data payload read from or written to memory.
Accumulator (ACC): Temporarily holds intermediate mathematical results calculated by the ALU.
Practice: Registers
Which register holds the address of the next instruction to fetch?
Answer: Program Counter (PC)
Explanation: The Program Counter (PC) increments after every fetch cycle, tracking the sequence of execution instructions in memory.
04
Memory Hierarchy
In a computer system, memory is organized in a hierarchy to balance **speed** and **cost**. Registers are fast but small and expensive, while mechanical/solid-state hard drives are slow but massive and cheap.
Cache Levels
L1 Cache: Smallest, fastest, and built directly into the individual CPU cores.
L2 Cache: Slightly larger and slower than L1, serving the core.
L3 Cache: Shared across all cores of a CPU chip, larger but slower than L2.
Practice: Cache speed
Which level of cache memory is typically the fastest?
Answer: L1 Cache
Explanation: L1 Cache is located closest to the CPU core registers and operates at the internal clock frequency of the processor core, making it the fastest cache tier.
05
Input and Output Systems
I/O systems manage communication between the CPU/Memory and external peripheral devices (keyboards, monitors, network cards). There are three primary mechanisms for I/O operations:
Programmed I/O (Polling): The CPU repeatedly queries the peripheral device to check if it has data ready. This keeps the CPU busy in a loop, wasting cycles.
Interrupt-driven I/O: The peripheral device sends a hardware signal (Interrupt) to the CPU when it is ready. The CPU suspends its current work, runs an **Interrupt Service Routine (ISR)**, and resumes.
Direct Memory Access (DMA): Used for high-speed transfers. A dedicated DMA controller copies blocks of data directly between peripheral devices and RAM, bypassing the CPU completely and notifying the CPU only when the transfer completes.
Practice: DMA
Why is Direct Memory Access (DMA) used instead of Programmed I/O for SSD transfers?
Answer: It avoids CPU execution overhead.
Explanation: For high-speed devices, having the CPU copy byte-by-byte would saturate the CPU. DMA handles the transfer autonomously, freeing the CPU to execute other tasks.
06
Data Representation
Computers use electrical switches (on/off states) to represent data. As a result, all characters, numbers, and symbols are stored using the Binary System (Base 2).
Binary and Hexadecimal
Binary: States of `0` and `1`. Each digit represents a **bit**.
Hexadecimal (Base 16): Uses digits `0-9` and letters `A-F`. A single hex character represents exactly 4 binary bits (a nibble), making memory dumps readable.
Negative integers are represented using Two's Complement representation. To convert a positive binary number to negative: invert all bits and add `1`.
Explanation: Splitting into nibbles: `1010` is 10 (which is hex `A`), and `1100` is 12 (which is hex `C`). Thus, it is represented as hex `AC` or `0xAC`.
07
Instruction Execution Cycle
The CPU continuously executes a loop called the **Fetch-Decode-Execute Cycle** to process instructions:
Fetch: The CPU copies the instruction address from the Program Counter (PC) to the MAR, triggers a memory read, loads the instruction data into the MDR, and moves it to the Instruction Register (IR). The PC then increments.
Decode: The Control Unit (CU) parses the instruction code inside the IR to understand what operation (opcode) to run and identifies data operand locations.
Execute: The Control Unit routes control signals, the ALU performs arithmetic or logical operations, and results are written back to registers or RAM.
Practice: Fetch Phase
During the Fetch phase, which register transfers the memory address to the MAR?
Answer: Program Counter (PC)
Explanation: The Program Counter (PC) stores the location of the next instruction. The CPU copies this address to the Memory Address Register (MAR) to request it from RAM.
08
System Buses
A Bus is a physical channel consisting of wires or copper tracks on a motherboard that transmits electronic signals between components. The main system bus consists of three sub-channels:
Data Bus: Transmits actual data bits (bi-directional channel).
Address Bus: Transmits memory locations to read/write (uni-directional channel pointing from CPU outward).
Control Bus: Transmits command signals, synchronization clocks, and write/read indicators (bi-directional).
Practice: Bus Directions
True or False: The Address Bus is bi-directional.
Answer: False
Explanation: The Address Bus is uni-directional. Only the CPU (or DMA controller) generates memory addresses to select locations; memory chips do not generate address coordinates.
Clock Speed (Hz): The frequency of internal clock ticks per second. *Example:* A 3.2 GHz processor ticks 3.2 billion times per second.
CPI (Cycles Per Instruction): The average number of clock cycles required to execute a single instruction.
IPS (Instructions Per Second): Total instructions executed per second. Calculated as `Clock Speed / CPI`.
CPU Time Equation
The time required to run a program is calculated as:
CPU Time = Instruction Count × CPI × Clock Cycle Time
Practice: CPU speed
If a processor executes a program with 1,000 instructions, taking an average CPI of 2 cycles, on a clock speed of 1 GHz (1,000,000,000 Hz), what is the total execution time in microseconds?
Modern computers implement optimization architectures to maximize throughput:
Pipelining: An execution strategy that overlaps instruction execution stages, similar to an assembly line. While instruction 2 is being decoded, instruction 1 is executing, and instruction 3 is being fetched.
RISC (Reduced Instruction Set Computer): Uses small, simplified instruction sets that execute in exactly one cycle, relying on compiler efficiency. *Example:* ARM processors.
CISC (Complex Instruction Set Computer): Uses large, complex instructions that can perform multi-cycle tasks directly (e.g. copying directly from RAM to RAM). *Example:* Intel x86 processors.
Practice: RISC vs CISC
Which design strategy focuses on simple instructions executing in a single clock cycle?
Answer: RISC (Reduced Instruction Set Computer)
Explanation: RISC architectures prioritize hardware simplicity and single-cycle instruction execution, delegating complexity to compiler software optimization.
11
Comprehensive Exercises
Solve these architecture assembly analysis questions to consolidate your knowledge:
Exercise 1: Pipeline Hazards
What is a Data Hazard in a CPU execution pipeline, and how does it occur?
Solution: Dependency delay
Explanation: A data hazard occurs when an instruction in the pipeline depends on the result of a previous instruction that has not yet completed execution. This causes pipeline stalls (bubbles) until the data becomes available.
Exercise 2: Cache Misses
Explain the difference between a **Spatial Locality** and a **Temporal Locality** cache hit.
Solution: Locality differences
Explanation: **Temporal Locality** assumes that memory accessed once is likely to be accessed again soon (like a loop variable). **Spatial Locality** assumes that memory locations close to the recently accessed location are likely to be accessed soon (like array elements stored sequentially). Cache controllers preload neighboring lines to exploit spatial locality.
12
Quiz & Knowledge Check
Test your understanding of hardware design and computer organization concepts:
Q1.Which channel carries physical hardware data blocks between the CPU and memory?
Explanation: The Data Bus is the bi-directional channel that carries the data payload between the processor, memory, and peripherals.
Q2.Which cache level is typical smaller, faster, and integrated closest to individual CPU cores?
Explanation: L1 cache is built directly inside the CPU core circuitry and operates at core frequency, offering the fastest access speed.
Q3.Which CPU sub-component decodes assembly instructions and routes routing control signals?
Explanation: The Control Unit (CU) manages instruction decoding and coordinates control signals across the computer components.
Q4.Which peripheral communication strategy triggers CPU execution only when data transitions are active?
Explanation: Interrupt-driven I/O allows devices to signal the CPU when they are ready, avoiding wasting CPU cycles on polling.
Q5.What is the very first stage of the standard instruction execution cycle?
Explanation: The CPU must first fetch the instruction from memory before it can decode and execute it.