Learn Computer Architecture – Hardware & Assembly Basics

01

Introduction to Computer Architecture

Computer Architecture describes the structure, organization, and behavioral design of computer hardware systems. It outlines how instructions are parsed, how memory is laid out, and how computer components communicate.

Most modern general-purpose computers are designed according to the Von Neumann Architecture, published in 1945 by mathematician John von Neumann.

Von Neumann Bottleneck

Because the Von Neumann architecture shares a single system bus for both instruction fetch and data transfer operations, the CPU is frequently idle while waiting for instructions/data to transit memory. This throughput limit is called the **Von Neumann Bottleneck**.

Practice: Bottlenecks

What defines the main drawback of the standard Von Neumann architecture?

Answer: Shared bus bottleneck

Explanation: Because a shared bus is used for fetching both instructions and data, data transfers block instruction fetches, causing CPU idle states.

02

Computer Organization vs Computer Architecture

Although often used interchangeably, Computer Architecture and Computer Organization focus on distinct layers of system design:

Computer Architecture (Abstract): Describes the programmer-visible attributes of a system (Instruction Set Architecture - ISA). It defines data formats, registers, and memory addressing modes. *Example:* Implementing a new ISA instruction called `ADD`.
Computer Organization (Physical): Describes how architectural specifications are physically implemented using gate-level circuits, wires, clock cycles, and control signals. *Example:* Deciding whether to implement the `ADD` operation using a Carry-Lookahead Adder or Ripple-Carry Adder.

Practice: Arch vs. Org

Is defining the size of a register block an architectural or organizational decision?

Answer: Architectural

Explanation: Register quantities and sizes directly affect the programmer's visibility (ISA instructions reference these registers), meaning it is an architectural decision.

03

CPU Architecture

The Central Processing Unit (CPU) is the "brain" of the computer, executing instructions and managing component coordination. It contains three main sub-components:

Arithmetic Logic Unit (ALU): Performs basic mathematical calculations (addition, subtraction) and logical comparisons (AND, OR, NOT).
Control Unit (CU): The supervisor. It decodes instructions fetched from memory and routes control signals directing the ALU, registers, and memory blocks.
Registers: Tiny, ultra-fast memory storage locations located directly inside the CPU core.

Special Purpose Registers

Program Counter (PC): Holds the memory address of the next instruction to fetch.
Instruction Register (IR): Holds the instruction code currently being decoded/executed.
Memory Address Register (MAR): Holds the memory address currently being read from or written to.
Memory Data Register (MDR): Holds the actual data payload read from or written to memory.
Accumulator (ACC): Temporarily holds intermediate mathematical results calculated by the ALU.

Practice: Registers

Which register holds the address of the next instruction to fetch?

Answer: Program Counter (PC)

Explanation: The Program Counter (PC) increments after every fetch cycle, tracking the sequence of execution instructions in memory.

04

Memory Hierarchy

In a computer system, memory is organized in a hierarchy to balance **speed** and **cost**. Registers are fast but small and expensive, while mechanical/solid-state hard drives are slow but massive and cheap.

Cache Levels

L1 Cache: Smallest, fastest, and built directly into the individual CPU cores.
L2 Cache: Slightly larger and slower than L1, serving the core.
L3 Cache: Shared across all cores of a CPU chip, larger but slower than L2.

Practice: Cache speed

Which level of cache memory is typically the fastest?

Answer: L1 Cache

Explanation: L1 Cache is located closest to the CPU core registers and operates at the internal clock frequency of the processor core, making it the fastest cache tier.

05

Input and Output Systems

I/O systems manage communication between the CPU/Memory and external peripheral devices (keyboards, monitors, network cards). There are three primary mechanisms for I/O operations:

Programmed I/O (Polling): The CPU repeatedly queries the peripheral device to check if it has data ready. This keeps the CPU busy in a loop, wasting cycles.
Interrupt-driven I/O: The peripheral device sends a hardware signal (Interrupt) to the CPU when it is ready. The CPU suspends its current work, runs an **Interrupt Service Routine (ISR)**, and resumes.
Direct Memory Access (DMA): Used for high-speed transfers. A dedicated DMA controller copies blocks of data directly between peripheral devices and RAM, bypassing the CPU completely and notifying the CPU only when the transfer completes.

Practice: DMA

Why is Direct Memory Access (DMA) used instead of Programmed I/O for SSD transfers?

Answer: It avoids CPU execution overhead.

Explanation: For high-speed devices, having the CPU copy byte-by-byte would saturate the CPU. DMA handles the transfer autonomously, freeing the CPU to execute other tasks.

06

Data Representation

Computers use electrical switches (on/off states) to represent data. As a result, all characters, numbers, and symbols are stored using the Binary System (Base 2).

Binary and Hexadecimal

Binary: States of `0` and `1`. Each digit represents a **bit**.
Hexadecimal (Base 16): Uses digits `0-9` and letters `A-F`. A single hex character represents exactly 4 binary bits (a nibble), making memory dumps readable.

Negative integers are represented using Two's Complement representation. To convert a positive binary number to negative: invert all bits and add `1`.

Binary Conversion Example

Decimal: 5
Binary (8-bit): 00000101

Invert Bits: 11111010
Add 1:        11111011
Result (-5): 11111011

Practice: Hex translation

Convert the binary byte 1010 1100 to Hexadecimal.

Answer: AC

Explanation: Splitting into nibbles: `1010` is 10 (which is hex `A`), and `1100` is 12 (which is hex `C`). Thus, it is represented as hex `AC` or `0xAC`.

07

Instruction Execution Cycle

The CPU continuously executes a loop called the **Fetch-Decode-Execute Cycle** to process instructions:

Fetch: The CPU copies the instruction address from the Program Counter (PC) to the MAR, triggers a memory read, loads the instruction data into the MDR, and moves it to the Instruction Register (IR). The PC then increments.
Decode: The Control Unit (CU) parses the instruction code inside the IR to understand what operation (opcode) to run and identifies data operand locations.
Execute: The Control Unit routes control signals, the ALU performs arithmetic or logical operations, and results are written back to registers or RAM.

Practice: Fetch Phase

During the Fetch phase, which register transfers the memory address to the MAR?

Answer: Program Counter (PC)

Explanation: The Program Counter (PC) stores the location of the next instruction. The CPU copies this address to the Memory Address Register (MAR) to request it from RAM.

08

System Buses

A Bus is a physical channel consisting of wires or copper tracks on a motherboard that transmits electronic signals between components. The main system bus consists of three sub-channels:

Data Bus: Transmits actual data bits (bi-directional channel).
Address Bus: Transmits memory locations to read/write (uni-directional channel pointing from CPU outward).
Control Bus: Transmits command signals, synchronization clocks, and write/read indicators (bi-directional).

Practice: Bus Directions

True or False: The Address Bus is bi-directional.

Answer: False

Explanation: The Address Bus is uni-directional. Only the CPU (or DMA controller) generates memory addresses to select locations; memory chips do not generate address coordinates.

09

Performance Metrics

Evaluating processor speeds requires analyzing multiple execution variables:

Clock Speed (Hz): The frequency of internal clock ticks per second. *Example:* A 3.2 GHz processor ticks 3.2 billion times per second.
CPI (Cycles Per Instruction): The average number of clock cycles required to execute a single instruction.
IPS (Instructions Per Second): Total instructions executed per second. Calculated as `Clock Speed / CPI`.

CPU Time Equation

The time required to run a program is calculated as:

CPU Time = Instruction Count × CPI × Clock Cycle Time

Practice: CPU speed

If a processor executes a program with 1,000 instructions, taking an average CPI of 2 cycles, on a clock speed of 1 GHz (1,000,000,000 Hz), what is the total execution time in microseconds?

Answer: 2 microseconds

Explanation: Cycles = 1,000 * 2 = 2,000 cycles. Time = 2,000 / 1,000,000,000 seconds = 0.000002 seconds = 2 microseconds.

10

Modern Computer Systems

Modern computers implement optimization architectures to maximize throughput:

Pipelining: An execution strategy that overlaps instruction execution stages, similar to an assembly line. While instruction 2 is being decoded, instruction 1 is executing, and instruction 3 is being fetched.
RISC (Reduced Instruction Set Computer): Uses small, simplified instruction sets that execute in exactly one cycle, relying on compiler efficiency. *Example:* ARM processors.
CISC (Complex Instruction Set Computer): Uses large, complex instructions that can perform multi-cycle tasks directly (e.g. copying directly from RAM to RAM). *Example:* Intel x86 processors.

Practice: RISC vs CISC

Which design strategy focuses on simple instructions executing in a single clock cycle?

Answer: RISC (Reduced Instruction Set Computer)

Explanation: RISC architectures prioritize hardware simplicity and single-cycle instruction execution, delegating complexity to compiler software optimization.

11

Comprehensive Exercises

Solve these architecture assembly analysis questions to consolidate your knowledge:

Exercise 1: Pipeline Hazards

What is a Data Hazard in a CPU execution pipeline, and how does it occur?

Solution: Dependency delay

Explanation: A data hazard occurs when an instruction in the pipeline depends on the result of a previous instruction that has not yet completed execution. This causes pipeline stalls (bubbles) until the data becomes available.

Exercise 2: Cache Misses

Explain the difference between a **Spatial Locality** and a **Temporal Locality** cache hit.

Solution: Locality differences

Explanation: **Temporal Locality** assumes that memory accessed once is likely to be accessed again soon (like a loop variable). **Spatial Locality** assumes that memory locations close to the recently accessed location are likely to be accessed soon (like array elements stored sequentially). Cache controllers preload neighboring lines to exploit spatial locality.

12

Quiz & Knowledge Check

Test your understanding of hardware design and computer organization concepts:

Algo Infinity Verse

Explore the Foundation of Computer Architecture System Organization, CPU Cycles & Data Flows