Learn DBMS – Beginner Guide | Algo Infinity Verse

01

Introduction to DBMS

A Database Management System (DBMS) is software designed to store, retrieve, define, and manage data in databases. It acts as an interface between the database and its end-users or programs, ensuring that data is consistently organized and easily accessible.

Advantages of using a DBMS

Reduced Data Redundancy: Minimizes duplication of data, saving disk storage.
Data Integrity & Security: Enforces security policies and database constraints (e.g., age cannot be negative).
Concurrent Access: Allows multiple users to read and update data at the same time without conflicts.
Backup & Recovery: Automatically backs up data and recovers it in case of system crashes.

DBMS vs File System

Before DBMS, data was stored in simple operating system files. Here is why we moved to DBMS:

Feature	File System	DBMS
Redundancy	High; same data is repeated across different files.	Low; redundant data is controlled through relations.
Data Sharing	Difficult; requires manual file transfer.	Easy; handled concurrently by the server interface.
Concurrency	No concurrent locking. Updates overwrite each other.	Highly supported; concurrent transactions use locks.
Integrity Constraints	Must be enforced inside application code manually.	Enforced natively by the database schema.

Practice: Introduction to DBMS

Which of the following is NOT a benefit of a DBMS over a traditional File System?

A) Enforcing column-level check constraints automatically.
B) Storing data in flat files that are directly readable by text editors without software.
C) Providing transactional rollbacks during database crashes.
D) Restricting database access to authorized users via roles.

Correct Answer: B

Explanation: Flat file readability is a characteristic of File Systems. A DBMS compiles data into binary formats (pages, blocks) optimized for storage and indexing, requiring DBMS interfaces or SQL engines to query it, rather than direct text editors.

02

Database Architecture

The Three-Level ANSI-SPARC Architecture separates the user's view of data from its physical representation. This abstraction ensures Data Independence (changes to physical storage don't break user-facing queries).

Understanding the Levels

External Level (Views): What the end-users see (e.g., individual screens, customized summaries).
Conceptual Level: The logical structure of the entire database. It defines tables, attributes, primary keys, and relations.
Internal Level (Physical): How data is actually stored (e.g., B+ Trees, sequential files, hashing strategies, clustering).

Schema vs. Instance

Schema: The skeleton structure / blueprint of the database. It is defined once and rarely changes (e.g., table columns and datatypes).
Instance: The snapshot collection of actual data stored in the database at a specific moment. It changes constantly as rows are added, edited, or deleted.

Practice: Schema & Instance

If you execute ALTER TABLE Students ADD COLUMN email VARCHAR(100);, are you modifying the database Schema or the database Instance?

Answer: Schema

Explanation: Running `ALTER TABLE` changes the structural blueprint (the columns available in the table definition). Changes to the actual row records (e.g., adding a student's record) modify the database instance.

03

Types of Databases

Databases are grouped by how they organize data. Choosing the correct type is critical for application performance.

1. Relational Databases (RDBMS)

Data is organized into tabular structures with rows and columns. They enforce strict relations using primary and foreign keys, supporting complex transactions and standardized queries via SQL. Examples: PostgreSQL, MySQL, SQLite.

2. NoSQL Databases (Not Only SQL)

Designed for horizontal scaling, high write velocities, and semi-structured or unstructured data. They avoid rigid tables and use four main models:

Document: Stores data as JSON documents (e.g., MongoDB).
Key-Value: Ultra-fast caches storing binary data pairs (e.g., Redis).
Column-Family: Optimized for scanning large analytical datasets (e.g., Cassandra).
Graph: Represents highly connected nodes and relationships (e.g., Neo4j).

3. Distributed Databases

Data is physically split (partitioned or sharded) or duplicated (replicated) across multiple distinct machines or cloud data centers. This ensures high system availability and disaster tolerance. Examples: Google Spanner, CockroachDB.

Practice: Database Types

Scenario: You are building a real-time chat dashboard showing live active-user counts and message delivery locks. Performance must be sub-millisecond, and the schema will expand constantly. Which database model is best suited?

A) Relational database (PostgreSQL)
B) Document database (MongoDB)
C) Key-Value memory store (Redis)
D) Graph database (Neo4j)

Correct Answer: C

Explanation: For sub-millisecond real-time tracking, an in-memory key-value cache (like Redis) is ideal. Relational and document databases query disks which introduces read latency, whereas Key-Value memory caches operate directly inside RAM.

04

ER Model & ER Diagrams

The Entity-Relationship (ER) Model is a conceptual framework used to design databases. It represents real-world concepts as Entities, Attributes, and Relationships.

StudentID Enrolls COURSE CourseID N M

Core Concepts of the ER Model

Entities: Objects or concepts (e.g., Student, Course). Weak entities depend on another entity for identity.
Attributes: Attributes are characteristics of entities:
- Simple: Atomic values (e.g., Gender).
- Composite: Can be divided (e.g., Name -> First Name, Last Name).
- Multi-valued: Can have multiple values (e.g., PhoneNumbers).
- Derived: Calculated from other fields (e.g., Age derived from BirthDate).
Relationships: Associations between entities (e.g., a Student enrolls in a Course). Relationships have cardinalities: One-to-One (1:1), One-to-Many (1:N), or Many-to-Many (N:M).

Practice: ER Model

If a student can enroll in multiple courses, and each course can contain multiple students, what is the cardinality of the Enrolls relationship?

Answer: Many-to-Many (N:M)

Explanation: Since entities on both sides are associated with multiple instances of the other, it represents an N:M relationship. In relational mapping, this results in a separate junction/association table.

05

Keys in DBMS

Keys are attributes or sets of attributes that uniquely identify rows in a table. They establish and enforce relationships between tables.

Key Type	Definition	Allows NULL?
Super Key	A set of one or more columns that uniquely identifies a row.	Yes
Candidate Key	A minimal Super Key (no redundant attributes).	No (typically)
Primary Key	The candidate key chosen by the database designer to uniquely identify rows.	No
Foreign Key	A column that references a Primary Key in another table, establishing a relationship.	Yes
Composite Key	A Primary Key consisting of two or more columns.	No

💡

A table can have multiple **Candidate Keys**, but only one can be chosen as the **Primary Key**. Any candidate keys not chosen as the primary key are called **Alternate Keys**.

Practice: Keys

Given a table schema: Enrollments(StudentID, CourseID, DateEnrolled, Grade). If a student can enroll in multiple courses, and multiple students can take the same course, which of the following acts as the Primary Key?

A) StudentID
B) CourseID
C) Composite Key of (StudentID, CourseID)
D) DateEnrolled

Correct Answer: C

Explanation: Because StudentID alone repeats (a student takes multiple courses) and CourseID repeats (multiple students enroll), neither can be unique alone. The combination of (StudentID, CourseID) is unique for each enrollment, making it a composite primary key.

06

Normalization

Normalization is the process of structuring relational database tables to reduce redundancy and prevent **data anomalies** (Insertion, Deletion, and Update anomalies).

The Anomalies

Insertion Anomaly: Cannot add data because some unrelated columns are missing.
Deletion Anomaly: Deleting a record unintentionally deletes unrelated crucial information.
Update Anomaly: Modifying a value requires editing multiple redundant rows, risking inconsistencies.

Normal Forms (1NF to BCNF)

1. First Normal Form (1NF)

Rules: All column values must be **atomic** (no multi-valued attributes / lists), and each column must contain values of the same type.

Example: If column Phones contains "555-0192, 555-0143", it must be split into separate rows.

2. Second Normal Form (2NF)

Rules: Must be in **1NF**, and there must be **no partial dependency** (non-prime attributes must depend on the entire candidate primary key, not a subset of a composite primary key).

Example: In a table with key (EmpID, ProjectID), if employee name depends solely on EmpID, it violates 2NF and employee data must be moved to a separate table.

3. Third Normal Form (3NF)

Rules: Must be in **2NF**, and there must be **no transitive dependency** (non-prime attributes cannot depend on other non-prime attributes; they must depend only on the primary key).

Example: In a table with key StudentID, if City depends on ZipCode, which depends on StudentID, it violates 3NF. Move ZipCode -> City to a separate table.

4. Boyce-Codd Normal Form (BCNF)

Rules: Must be in **3NF**, and for every functional dependency X → Y, X must be a **super key**. BCNF is a stronger version of 3NF.

Practice: Normalization

Suppose you have a table: Store(StoreID, ManagerID, ManagerPhone, ZipCode) where StoreID is the Primary Key. If ManagerPhone depends directly on ManagerID (a non-prime attribute), which normal form does this table violate?

A) 1NF
B) 2NF
C) 3NF
D) It is already in BCNF

Correct Answer: C

Explanation: Since ManagerPhone depends on ManagerID, which depends on the primary key StoreID, this forms a transitive dependency (StoreID -> ManagerID -> ManagerPhone). This violates 3NF rules.

07

SQL Basics

SQL (Structured Query Language) is the standard programming language used to interact with relational databases.

SQL Statements classification

DDL (Data Definition Language): Define schema structures. Commands: CREATE, ALTER, DROP.
DML (Data Manipulation Language): Modify and query database rows. Commands: SELECT, INSERT, UPDATE, DELETE.

Core CRUD Commands with Examples:

SQL - CREATE

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    salary DECIMAL(10, 2)
);

SQL - INSERT

INSERT INTO employees (id, name, salary) VALUES
(1, 'Alice', 75000.00),
(2, 'Bob', 62000.00),
(3, 'Charlie', 80000.00);

SQL - SELECT

SELECT name, salary 
FROM employees 
WHERE salary >= 70000 
ORDER BY salary DESC;

Output Result

+---------+----------+
| name    | salary   |
+---------+----------+
| Charlie | 80000.00 |
| Alice   | 75000.00 |
+---------+----------+

SQL - UPDATE

UPDATE employees 
SET salary = 65000.00 
WHERE id = 2;

SQL - DELETE

DELETE FROM employees WHERE id = 3;

Practice: SQL Basics

Write an SQL query to retrieve the name and salary of all employees who earn less than 70000.

SQL Solution

SELECT name, salary 
FROM employees 
WHERE salary < 70000;

08

Transactions and ACID Properties

A Transaction is a single logical unit of database processing. All database operations within a transaction must be executed as a pack. To preserve database consistency, DBMS engines enforce the four ACID Properties.

The ACID Rules

Atomicity: "All or nothing". If any part of a transaction fails, the entire transaction is rolled back, leaving the database unchanged.
Consistency: Ensures that a transaction moves the database from one valid state to another valid state, maintaining all constraints (e.g., non-negative balances).
Isolation: Transactions executing concurrently must run as if they are executing sequentially. Intermediate states of a transaction are hidden from other transactions.
Durability: Once a transaction commits, its changes are written to persistent storage (disk) and will survive any subsequent system failures.

The Banking Example

Consider transferring $100 from Account A to Account B:

Read balance A
Deduct $100 from A
Read balance B
Add $100 to B

If the system crashes after step 2, Atomicity ensures the $100 is returned to A instead of disappearing into thin air.

Practice: Transactions & ACID

Which ACID property is violated if database transactions executing at the same time read uncommitted variables from each other, resulting in incorrect calculations?

A) Atomicity
B) Consistency
C) Isolation
D) Durability

Correct Answer: C

Explanation: Isolation guarantees that concurrently running transactions do not interfere or read uncommitted states of each other. Violating this rule leads to Isolation anomalies (like Dirty Reads).

09

Concurrency Control Basics

When multiple transactions run concurrently, conflicts arise. Concurrency Control prevents anomalies such as **Dirty Reads**, **Non-Repeatable Reads**, and **Phantom Reads**.

Lock-Based Protocols

Locks prevent multiple transactions from editing the same data simultaneously:

Shared Lock (S): Acquired for reading. Multiple transactions can hold S-locks on the same data item simultaneously.
Exclusive Lock (X): Acquired for writing/updating. Only one transaction can hold an X-lock. S-locks and other X-locks are blocked.

Deadlocks

A Deadlock occurs when Transaction 1 holds a lock on resource A and waits for resource B, while Transaction 2 holds a lock on resource B and waits for resource A. Neither can proceed.

⚠️

To handle deadlocks, DBMS engines use **prevention** algorithms (like Wait-Die or Wound-Wait schemes based on timestamps) or run background **deadlock detection** loops that abort one of the transactions (the victim) to resolve the block.

Isolation Levels

Standard SQL engines define four isolation levels (from least strict to most strict):

Read Uncommitted: Allows dirty reads (reading uncommitted changes from other transactions).
Read Committed: Prevents dirty reads, but allows non-repeatable reads (data changes between two reads in the same transaction).
Repeatable Read: Prevents non-repeatable reads, but allows phantom reads (new rows are inserted by other transactions).
Serializable: Highest isolation level. Full lock protection, simulating sequential execution.

Practice: Concurrency Control

If Transaction T1 holds a Shared Lock (S) on row 10, which lock can Transaction T2 acquire on row 10 immediately?

A) Exclusive Lock (X)
B) Shared Lock (S)
C) Neither
D) Both X and S

Correct Answer: B

Explanation: Shared locks are compatible with each other, meaning multiple transactions can read the same row at the same time. Exclusive locks are incompatible with any locks.

10

Indexing and Query Optimization

Indexes are data structures used to speed up data retrieval. Without an index, the database engine must perform a slow **Full Table Scan** (reading every row on disk).

Clustered vs. Non-Clustered Indexes

Clustered Index: Dictates the physical order of data rows on the disk. Because data can only be sorted in one way, there can be only **one clustered index** per table (usually on the Primary Key).
Non-Clustered Index: Creates a separate structure containing copy of key columns and a pointer to the physical data rows. You can create **multiple non-clustered indexes** on a table.

B-Trees and B+ Trees

Relational databases heavily use B+ Trees for indexing. The advantages are:

Balanced Tree: Guaranteed O(log N) search, insert, and delete times.
Range Queries: Leaf nodes are linked together sequentially (like a linked list), making range queries (e.g., WHERE age BETWEEN 20 AND 30) extremely fast.

Practice: Indexing

Why can a relational database table have only one Clustered Index?

Answer: Physical sorting constraints

Explanation: Since a clustered index dictates the physical sorting and organization of rows on disk blocks, and data rows can physically be sorted in only one sequence (e.g., by ID or Surname), only one clustered index is physically possible. Non-clustered indexes, however, are separate index tables that point back to the raw rows, allowing multiples.

Algo Infinity Verse

Start Your Journey With Database Systems Beginner-Friendly Concepts, SQL & Practice

🗄️ Learn DBMS

Introduction to DBMS

Advantages of using a DBMS

DBMS vs File System

Database Architecture

Understanding the Levels

Schema vs. Instance

Types of Databases

1. Relational Databases (RDBMS)

2. NoSQL Databases (Not Only SQL)

3. Distributed Databases

ER Model & ER Diagrams

Core Concepts of the ER Model

Keys in DBMS

Normalization

The Anomalies

Normal Forms (1NF to BCNF)

1. First Normal Form (1NF)

2. Second Normal Form (2NF)

3. Third Normal Form (3NF)

4. Boyce-Codd Normal Form (BCNF)

SQL Basics

SQL Statements classification

Core CRUD Commands with Examples:

Transactions and ACID Properties

The ACID Rules

The Banking Example

Concurrency Control Basics

Lock-Based Protocols

Deadlocks

Isolation Levels

Indexing and Query Optimization

Clustered vs. Non-Clustered Indexes

B-Trees and B+ Trees