Master the core tenets of database systems with step-by-step theories and SQL sandboxes.
0 / 10 topics completed
01
Introduction to DBMS
A Database Management System (DBMS) is software designed to store, retrieve, define, and manage data in databases. It acts as an interface between the database and its end-users or programs, ensuring that data is consistently organized and easily accessible.
Advantages of using a DBMS
Reduced Data Redundancy: Minimizes duplication of data, saving disk storage.
Data Integrity & Security: Enforces security policies and database constraints (e.g., age cannot be negative).
Concurrent Access: Allows multiple users to read and update data at the same time without conflicts.
Backup & Recovery: Automatically backs up data and recovers it in case of system crashes.
DBMS vs File System
Before DBMS, data was stored in simple operating system files. Here is why we moved to DBMS:
Feature
File System
DBMS
Redundancy
High; same data is repeated across different files.
Low; redundant data is controlled through relations.
Data Sharing
Difficult; requires manual file transfer.
Easy; handled concurrently by the server interface.
Concurrency
No concurrent locking. Updates overwrite each other.
Highly supported; concurrent transactions use locks.
Integrity Constraints
Must be enforced inside application code manually.
Enforced natively by the database schema.
Practice: Introduction to DBMS
Which of the following is NOT a benefit of a DBMS over a traditional File System?
A) Enforcing column-level check constraints automatically.
B) Storing data in flat files that are directly readable by text editors without software.
C) Providing transactional rollbacks during database crashes.
D) Restricting database access to authorized users via roles.
Correct Answer: B
Explanation: Flat file readability is a characteristic of File Systems. A DBMS compiles data into binary formats (pages, blocks) optimized for storage and indexing, requiring DBMS interfaces or SQL engines to query it, rather than direct text editors.
02
Database Architecture
The Three-Level ANSI-SPARC Architecture separates the user's view of data from its physical representation. This abstraction ensures Data Independence (changes to physical storage don't break user-facing queries).
Understanding the Levels
External Level (Views): What the end-users see (e.g., individual screens, customized summaries).
Conceptual Level: The logical structure of the entire database. It defines tables, attributes, primary keys, and relations.
Internal Level (Physical): How data is actually stored (e.g., B+ Trees, sequential files, hashing strategies, clustering).
Schema vs. Instance
Schema: The skeleton structure / blueprint of the database. It is defined once and rarely changes (e.g., table columns and datatypes).
Instance: The snapshot collection of actual data stored in the database at a specific moment. It changes constantly as rows are added, edited, or deleted.
Practice: Schema & Instance
If you execute ALTER TABLE Students ADD COLUMN email VARCHAR(100);, are you modifying the database Schema or the database Instance?
Answer: Schema
Explanation: Running `ALTER TABLE` changes the structural blueprint (the columns available in the table definition). Changes to the actual row records (e.g., adding a student's record) modify the database instance.
03
Types of Databases
Databases are grouped by how they organize data. Choosing the correct type is critical for application performance.
1. Relational Databases (RDBMS)
Data is organized into tabular structures with rows and columns. They enforce strict relations using primary and foreign keys, supporting complex transactions and standardized queries via SQL. Examples: PostgreSQL, MySQL, SQLite.
2. NoSQL Databases (Not Only SQL)
Designed for horizontal scaling, high write velocities, and semi-structured or unstructured data. They avoid rigid tables and use four main models:
Document: Stores data as JSON documents (e.g., MongoDB).
Key-Value: Ultra-fast caches storing binary data pairs (e.g., Redis).
Column-Family: Optimized for scanning large analytical datasets (e.g., Cassandra).
Graph: Represents highly connected nodes and relationships (e.g., Neo4j).
3. Distributed Databases
Data is physically split (partitioned or sharded) or duplicated (replicated) across multiple distinct machines or cloud data centers. This ensures high system availability and disaster tolerance. Examples: Google Spanner, CockroachDB.
Practice: Database Types
Scenario: You are building a real-time chat dashboard showing live active-user counts and message delivery locks. Performance must be sub-millisecond, and the schema will expand constantly. Which database model is best suited?
A) Relational database (PostgreSQL)
B) Document database (MongoDB)
C) Key-Value memory store (Redis)
D) Graph database (Neo4j)
Correct Answer: C
Explanation: For sub-millisecond real-time tracking, an in-memory key-value cache (like Redis) is ideal. Relational and document databases query disks which introduces read latency, whereas Key-Value memory caches operate directly inside RAM.
04
ER Model & ER Diagrams
The Entity-Relationship (ER) Model is a conceptual framework used to design databases. It represents real-world concepts as Entities, Attributes, and Relationships.
Core Concepts of the ER Model
Entities: Objects or concepts (e.g., Student, Course). Weak entities depend on another entity for identity.
Attributes: Attributes are characteristics of entities:
Simple: Atomic values (e.g., Gender).
Composite: Can be divided (e.g., Name -> First Name, Last Name).
Multi-valued: Can have multiple values (e.g., PhoneNumbers).
Derived: Calculated from other fields (e.g., Age derived from BirthDate).
Relationships: Associations between entities (e.g., a Student enrolls in a Course). Relationships have cardinalities: One-to-One (1:1), One-to-Many (1:N), or Many-to-Many (N:M).
Practice: ER Model
If a student can enroll in multiple courses, and each course can contain multiple students, what is the cardinality of the Enrolls relationship?
Answer: Many-to-Many (N:M)
Explanation: Since entities on both sides are associated with multiple instances of the other, it represents an N:M relationship. In relational mapping, this results in a separate junction/association table.
05
Keys in DBMS
Keys are attributes or sets of attributes that uniquely identify rows in a table. They establish and enforce relationships between tables.
Key Type
Definition
Allows NULL?
Super Key
A set of one or more columns that uniquely identifies a row.
Yes
Candidate Key
A minimal Super Key (no redundant attributes).
No (typically)
Primary Key
The candidate key chosen by the database designer to uniquely identify rows.
No
Foreign Key
A column that references a Primary Key in another table, establishing a relationship.
Yes
Composite Key
A Primary Key consisting of two or more columns.
No
💡
A table can have multiple **Candidate Keys**, but only one can be chosen as the **Primary Key**. Any candidate keys not chosen as the primary key are called **Alternate Keys**.
Practice: Keys
Given a table schema: Enrollments(StudentID, CourseID, DateEnrolled, Grade). If a student can enroll in multiple courses, and multiple students can take the same course, which of the following acts as the Primary Key?
A) StudentID
B) CourseID
C) Composite Key of (StudentID, CourseID)
D) DateEnrolled
Correct Answer: C
Explanation: Because StudentID alone repeats (a student takes multiple courses) and CourseID repeats (multiple students enroll), neither can be unique alone. The combination of (StudentID, CourseID) is unique for each enrollment, making it a composite primary key.
06
Normalization
Normalization is the process of structuring relational database tables to reduce redundancy and prevent **data anomalies** (Insertion, Deletion, and Update anomalies).
The Anomalies
Insertion Anomaly: Cannot add data because some unrelated columns are missing.
Deletion Anomaly: Deleting a record unintentionally deletes unrelated crucial information.
Update Anomaly: Modifying a value requires editing multiple redundant rows, risking inconsistencies.
Normal Forms (1NF to BCNF)
1. First Normal Form (1NF)
Rules: All column values must be **atomic** (no multi-valued attributes / lists), and each column must contain values of the same type.
Example: If column Phones contains "555-0192, 555-0143", it must be split into separate rows.
2. Second Normal Form (2NF)
Rules: Must be in **1NF**, and there must be **no partial dependency** (non-prime attributes must depend on the entire candidate primary key, not a subset of a composite primary key).
Example: In a table with key (EmpID, ProjectID), if employee name depends solely on EmpID, it violates 2NF and employee data must be moved to a separate table.
3. Third Normal Form (3NF)
Rules: Must be in **2NF**, and there must be **no transitive dependency** (non-prime attributes cannot depend on other non-prime attributes; they must depend only on the primary key).
Example: In a table with key StudentID, if City depends on ZipCode, which depends on StudentID, it violates 3NF. Move ZipCode -> City to a separate table.
4. Boyce-Codd Normal Form (BCNF)
Rules: Must be in **3NF**, and for every functional dependency X → Y, X must be a **super key**. BCNF is a stronger version of 3NF.
Practice: Normalization
Suppose you have a table: Store(StoreID, ManagerID, ManagerPhone, ZipCode) where StoreID is the Primary Key. If ManagerPhone depends directly on ManagerID (a non-prime attribute), which normal form does this table violate?
A) 1NF
B) 2NF
C) 3NF
D) It is already in BCNF
Correct Answer: C
Explanation: Since ManagerPhone depends on ManagerID, which depends on the primary key StoreID, this forms a transitive dependency (StoreID -> ManagerID -> ManagerPhone). This violates 3NF rules.
07
SQL Basics
SQL (Structured Query Language) is the standard programming language used to interact with relational databases.
+---------+----------+
| name | salary |
+---------+----------+
| Charlie | 80000.00 |
| Alice | 75000.00 |
+---------+----------+
SQL - UPDATE
UPDATEemployeesSETsalary=65000.00WHEREid=2;
SQL - DELETE
DELETE FROMemployeesWHEREid=3;
Practice: SQL Basics
Write an SQL query to retrieve the name and salary of all employees who earn less than 70000.
SQL Solution
SELECTname, salaryFROMemployeesWHEREsalary<70000;
08
Transactions and ACID Properties
A Transaction is a single logical unit of database processing. All database operations within a transaction must be executed as a pack. To preserve database consistency, DBMS engines enforce the four ACID Properties.
The ACID Rules
Atomicity: "All or nothing". If any part of a transaction fails, the entire transaction is rolled back, leaving the database unchanged.
Consistency: Ensures that a transaction moves the database from one valid state to another valid state, maintaining all constraints (e.g., non-negative balances).
Isolation: Transactions executing concurrently must run as if they are executing sequentially. Intermediate states of a transaction are hidden from other transactions.
Durability: Once a transaction commits, its changes are written to persistent storage (disk) and will survive any subsequent system failures.
The Banking Example
Consider transferring $100 from Account A to Account B:
Read balance A
Deduct $100 from A
Read balance B
Add $100 to B
If the system crashes after step 2, Atomicity ensures the $100 is returned to A instead of disappearing into thin air.
Practice: Transactions & ACID
Which ACID property is violated if database transactions executing at the same time read uncommitted variables from each other, resulting in incorrect calculations?
A) Atomicity
B) Consistency
C) Isolation
D) Durability
Correct Answer: C
Explanation: Isolation guarantees that concurrently running transactions do not interfere or read uncommitted states of each other. Violating this rule leads to Isolation anomalies (like Dirty Reads).
09
Concurrency Control Basics
When multiple transactions run concurrently, conflicts arise. Concurrency Control prevents anomalies such as **Dirty Reads**, **Non-Repeatable Reads**, and **Phantom Reads**.
Lock-Based Protocols
Locks prevent multiple transactions from editing the same data simultaneously:
Shared Lock (S): Acquired for reading. Multiple transactions can hold S-locks on the same data item simultaneously.
Exclusive Lock (X): Acquired for writing/updating. Only one transaction can hold an X-lock. S-locks and other X-locks are blocked.
Deadlocks
A Deadlock occurs when Transaction 1 holds a lock on resource A and waits for resource B, while Transaction 2 holds a lock on resource B and waits for resource A. Neither can proceed.
⚠️
To handle deadlocks, DBMS engines use **prevention** algorithms (like Wait-Die or Wound-Wait schemes based on timestamps) or run background **deadlock detection** loops that abort one of the transactions (the victim) to resolve the block.
Isolation Levels
Standard SQL engines define four isolation levels (from least strict to most strict):
Read Uncommitted: Allows dirty reads (reading uncommitted changes from other transactions).
Read Committed: Prevents dirty reads, but allows non-repeatable reads (data changes between two reads in the same transaction).
Repeatable Read: Prevents non-repeatable reads, but allows phantom reads (new rows are inserted by other transactions).
Serializable: Highest isolation level. Full lock protection, simulating sequential execution.
Practice: Concurrency Control
If Transaction T1 holds a Shared Lock (S) on row 10, which lock can Transaction T2 acquire on row 10 immediately?
A) Exclusive Lock (X)
B) Shared Lock (S)
C) Neither
D) Both X and S
Correct Answer: B
Explanation: Shared locks are compatible with each other, meaning multiple transactions can read the same row at the same time. Exclusive locks are incompatible with any locks.
10
Indexing and Query Optimization
Indexes are data structures used to speed up data retrieval. Without an index, the database engine must perform a slow **Full Table Scan** (reading every row on disk).
Clustered vs. Non-Clustered Indexes
Clustered Index: Dictates the physical order of data rows on the disk. Because data can only be sorted in one way, there can be only **one clustered index** per table (usually on the Primary Key).
Non-Clustered Index: Creates a separate structure containing copy of key columns and a pointer to the physical data rows. You can create **multiple non-clustered indexes** on a table.
B-Trees and B+ Trees
Relational databases heavily use B+ Trees for indexing. The advantages are:
Balanced Tree: Guaranteed O(log N) search, insert, and delete times.
Range Queries: Leaf nodes are linked together sequentially (like a linked list), making range queries (e.g., WHERE age BETWEEN 20 AND 30) extremely fast.
Practice: Indexing
Why can a relational database table have only one Clustered Index?
Answer: Physical sorting constraints
Explanation: Since a clustered index dictates the physical sorting and organization of rows on disk blocks, and data rows can physically be sorted in only one sequence (e.g., by ID or Surname), only one clustered index is physically possible. Non-clustered indexes, however, are separate index tables that point back to the raw rows, allowing multiples.