OS Unit 3 - Storage, File Systems, Protection & Distributed OS

📀Chapter 1 — Secondary Storage & RAID

🧠 Secondary Storage & RAID — Mind Map

1.1 Magnetic Disk Structure

📖 What is a Magnetic Disk?
A magnetic hard drive (HDD) is a secondary storage device made of rapidly spinning platters coated with magnetic material. The OS stores all persistent data here — files, programs, the OS itself.

📀 HDD Anatomy — Platters, Tracks, Sectors, Cylinder

Key Components

Component	What it is	Exam Tip
Platter	Spinning magnetic disk (multiple stacked)	More platters = more capacity
Track	Concentric ring on a platter	Outermost = Track 0
Sector	Smallest unit of storage (512 bytes traditionally)	Modern drives: 4096 bytes (4K)
Cylinder	Same track number across ALL platters	Cylinder = 3D stack of tracks
Actuator Arm	Moves R/W heads across platters	All heads move together
RPM	Rotations Per Minute (5400, 7200, 10000, 15000)	Higher RPM = faster access

1.2 Disk Performance & Access Time

ACCESS TIME FORMULA (Most Important!):
Access Time = Seek Time + Rotational Latency + Transfer Rate

Seek Time: Time for arm to move to correct cylinder (typically 3–15ms)
Rotational Latency: Time waiting for sector to spin under head (avg = ½ rotation)
Transfer Rate: Time to actually read/write the data bits

🔢 Numeric Example (PYQ!):
Disk spins at 7200 RPM. Time for 1 full rotation = 60/7200 = 8.33ms
Average Rotational Latency = 8.33 / 2 = 4.165ms

📋 5-Stage Disk I/O Process:

1. Issue Command → CPU sends read/write request to Disk Controller
2. Seek → Actuator arm physically moves to correct cylinder
3. Rotate → Platter spins until correct sector is under head
4. Transfer → Data bits are read/written magnetically
5. DMA → Direct Memory Access moves data to RAM (bypasses CPU)

Disk Scheduling Algorithms (Brief)

Algorithm	Strategy	Advantage
FCFS	Serve in order of arrival	Fair but slow
SSTF	Serve nearest cylinder first	Fast, but starvation possible
SCAN (Elevator)	Move in one direction, then reverse	Good throughput, no starvation
C-SCAN	Circular SCAN, always moves one way	Uniform wait time

1.3 RAID — Redundant Array of Independent Disks

📖 What is RAID?
RAID (Redundant Array of Independent Disks) is a technology that combines multiple physical hard drives into a single logical unit to achieve:

Performance — Parallel read/write across multiple spindles
Redundancy — Survive drive failures without data loss
Capacity — Pool multiple smaller drives into one large volume

⚙️ Two Core RAID Techniques:
Striping: File data is split into blocks and spread across multiple disks simultaneously → improves speed.
Mirroring: Exact duplicate copy written to a second disk → improves redundancy. Parity: XOR-based mathematical checksum stored on disk → allows recovery without 100% duplication.

⚔️ RAID 0 vs RAID 1 mechanics

1.4 RAID Levels — Deep Comparison

RAID Level	Technique	Min Drives	Fault Tolerance	Usable Capacity	Best For
RAID 0	Pure Striping	2	ZERO ❌	N × Drive	Video editing, gaming
RAID 1	Pure Mirroring	2	1 Drive ✅	N/2 × Drive	OS boot drives
RAID 5	Stripe + Distributed Parity	3	1 Drive ✅	(N–1) × Drive	NAS / file servers
RAID 6	Stripe + Double Parity	4	2 Drives ✅✅	(N–2) × Drive	Enterprise storage
RAID 10	Mirror, then Stripe	4	1 per mirror ✅	N/2 × Drive	Databases, high IOPS

🧩 RAID 5 Distributed Parity rotation

⚠️ Critical Pitfalls of RAID (PYQ!):
1. RAID ≠ Backup: If a user deletes a file, RAID deletes it on ALL disks simultaneously. RAID only protects against hardware failure, not human error or ransomware.
2. Write Penalty (RAID 5): Every write requires reading old data + computing new parity + writing both → 4 I/O ops per logical write.
3. Rebuild Risk: When replacing a failed drive, the entire array must be rebuilt under stress — during this period, a second failure destroys all data.

📁Chapter 2 — File Systems & Block Allocation

🧠 File Management — Mind Map

2.1 The File Concept

📖 What is a File?
A file is a named, logical collection of related information recorded on secondary storage. Files can represent programs (source code, object code, executables) or data (text, numeric, binary, multimedia).

File Control Block (FCB / Inode)

The OS tracks every file using a data structure called the File Control Block (FCB). In UNIX/Linux systems, this is called an Inode.

  [ FILE CONTROL BLOCK (FCB) / INODE ]
  ┌──────────────────────────────────┐
  │ File Name        │ report.pdf    │
  │ File Type        │ PDF           │
  │ File Size        │ 2.4 MB        │
  │ Owner            │ ankush (UID)  │
  │ Permissions      │ rwxr--r--     │
  │ Created/Modified │ timestamps    │
  │ Location (Ptr)   │ block# 4921   │ ← Points to disk blocks
  └──────────────────────────────────┘

📄 File Control Block structure

File Access Methods

Method	How it works	Example
Sequential	Read byte-by-byte; must go through beginning to reach middle	Tape drives, log files
Direct (Random)	Jump to any block directly using block number	Database records
Indexed	Use an index file to map keys to block locations	Library catalogue system

2.2 Directory Structures

📖 What is a Directory?
A directory is a symbol table that maps file names to their FCBs/Inodes. It allows the OS to search for, create, delete, rename, and organise files.

🌳 Directory structure evolution & evaluation

4. Acyclic Graph Directory

✅ Acyclic Graph: Extends the tree by allowing shared files and subdirectories — multiple paths lead to the same file (like Shortcuts/Symlinks in Windows/Linux).
Problem: When is a shared file "truly deleted"? Solved by Reference Counting — delete only when count reaches 0.

2.3 File Block Allocation Methods

The OS must decide how to physically lay out file blocks on disk. This is one of the most important design decisions in a file system.

Method 1: Contiguous Allocation

Each file occupies a set of contiguous (sequential) blocks on disk. The directory entry stores only the starting block number and length.

  Disk Blocks: [0][1][2][3][4][5][6][7][8][9]
  File A:       [A][A][A][ ][ ][B][B][B][B][ ]   ← starts at 0, length 3
  File B:                            starts at 5, length 4
  After delete A: [ ][ ][ ][ ][ ][B][B][B][B][ ]
                  ↑ Gap! Can't fit File C (4 blocks) here → External Fragmentation

🧱 Contiguous Allocation — External Fragmentation problem

✅ Advantages: Extremely fast sequential read (head doesn't move). Easy direct access: Block N of file = Start + N.

❌ Disadvantages: External Fragmentation — holes appear between files. Files cannot grow dynamically. Compaction needed (expensive).

Method 2: Linked Allocation

File blocks are stored anywhere on disk. Each block has a pointer to the next block, forming a linked list.

  Directory: File A → starts at block 3
  Block 3: [Data][→ Block 7]
  Block 7: [Data][→ Block 1]
  Block 1: [Data][→ Block 5]
  Block 5: [Data][→ NULL]   ← End of file

🔗 Linked Allocation — chained disk blocks

✅ Advantages: No external fragmentation. Files can grow dynamically to any size.

❌ Disadvantages: Direct access is O(N) — must traverse all pointers. FAT (File Allocation Table) is a variant that moves pointers into a table in RAM for faster access.

Method 3: Indexed Allocation ⭐ (Most Important)

All block pointers for a file are collected into a single Index Block. The directory entry points to this index block.

  Directory: File A → Index Block at 9
  ┌─────── Index Block 9 ────────┐
  │  Ptr[0] → Block 4            │
  │  Ptr[1] → Block 12           │
  │  Ptr[2] → Block 7            │
  │  Ptr[3] → Block 19           │
  └──────────────────────────────┘
  Access block 2 of File A → Index[2] → Block 7  ← O(1)!

📑 Indexed Allocation — fast O(1) direct access

✅ Advantages: O(1) direct access. No external fragmentation. Files can grow. Used in UNIX inodes!

❌ Disadvantages: Index block wastes space for tiny files. For huge files → multi-level indexing needed (indirect blocks).

Method	Direct Access	Fragmentation	File Growth	Used In
Contiguous	O(1) ✅	External ❌	No ❌	CD-ROM, DVDs
Linked	O(N) ❌	None ✅	Yes ✅	FAT16/32
Indexed	O(1) ✅	None ✅	Yes ✅	UNIX ext4, NTFS

2.4 Free Space Management

📌 Why? The OS must track which disk blocks are free so it can allocate them when files are created.

Method	How It Works	Advantage	Disadvantage
Bit Vector	1 bit per block: 1=free, 0=used	Fast to find free blocks	Large vector for big disks
Linked List	Free blocks linked together	No wasted space	Slow traversal
Grouping	First free block stores N next free block addresses	Find many free blocks quickly	Complex implementation
Counting	Store (start block, count) pairs	Efficient for contiguous free space	Less effective for random writes

🛡️Chapter 3 — System Protection & Distributed OS

🧠 Protection & Distributed OS — Mind Map

3.1 Protection vs Security

📖 Two Different Concepts:

Protection: Internal OS mechanism — dictates WHICH process/user can access WHICH internal resource (memory, file, CPU). Think of it as locks on doors inside the building.

Security: Holistic defense against external threats — hackers, viruses, physical theft, social engineering. Think of it as the fence, cameras, and guards around the building.

🔑 Principle of Least Privilege (PoLP) — #1 Security Principle:

Every user, program, and process must be granted only the absolute minimum permissions required to perform its specific job — nothing more.

Real Example: A PDF viewer app should have permission to read files — but NOT to access your network, camera, or microphone. Most malware exploits excessive permissions.

3.2 Access Matrix & Its Implementations

📖 Access Matrix:
A 2D table where rows = Protection Domains (users/processes) and columns = Objects (files, printers, devices). Each cell contains the access rights (Read, Write, Execute, Own) that domain has over that object.

🔐 Access Matrix — rows=domains, cols=objects

Storing this entire matrix is wasteful — most cells are empty. So it's implemented in two efficient ways:

Implementation 1: Access Control Lists (ACLs)

Slice the matrix by COLUMNS (Objects):
Each object maintains a list of (Domain, Rights) pairs describing who can access it.
Analogy: VIP Guest List at a nightclub door — security checks if your name is on the list.
Used in: Windows NTFS, Linux file permissions (chmod)

  File 2's ACL:
  ┌─────────────────────────────┐
  │ User A → Read               │
  │ User B → Read, Write        │
  └─────────────────────────────┘
  To access File 2, OS checks this list. ✅

📜 Access Control List (column-wise)

Implementation 2: Capability Lists

Slice the matrix by ROWS (Domains):
Each user/process holds a list of (Object, Rights) pairs — like a digital ticket book.
Analogy: Movie theater ticket — the theater doesn't know your name, but you hold a cryptographically secure ticket proving your right to enter.
Used in: Distributed systems, Kerberos authentication

Feature	ACL	Capability List
Stored with	Object (file)	Subject (user/process)
Q: Who can access this file?	Easy — just read the ACL	Hard — search all users
Q: What can this user access?	Hard — check all files	Easy — just read capabilities
Revocation	Easy — edit one ACL	Hard — find and revoke all copies

3.3 System Threats

A. Program Threats

🚨 Trojan Horse: Malware disguised as legitimate, useful software. A user willingly runs it (e.g., a "free game" that secretly sends passwords). Named after the hollow wooden horse of Greek mythology.

🚨 Trap Door (Back Door): A secret, undocumented entry point left by the programmer in a program. Allows bypassing authentication. Often inserted by rogue developers.

🚨 Logic Bomb: Malicious code planted inside an organization's own software by a rogue insider. Sleeps silently until a specific trigger condition activates it (e.g., on a specific date, or when a specific user account is deleted).

🚨 Buffer Overflow Attack: Sends a carefully crafted input string longer than the program's buffer. The overflow bytes overwrite the return address on the stack, redirecting execution to the attacker's injected shellcode.

  BUFFER OVERFLOW ATTACK
  Normal:                    Attack:
  ┌───────────────┐         ┌───────────────────────────┐
  │ Buffer[10]    │         │ AAAAAAAAAAAAAAA[SHELLCODE]│
  │ Return Addr   │──exec──►│ Overwritten Return Addr   │──►Hacker Code!
  └───────────────┘         └───────────────────────────┘

💥 Buffer Overflow — stack corruption exploit

B. Network Threats

🦠 Virus: Parasitic code segment embedded in a legitimate program. Requires human action to execute (e.g., opening an infected .exe file). Cannot self-propagate autonomously.

🐛 Worm: Standalone program that self-replicates across networks automatically by exploiting vulnerabilities. No human action needed. Example: WannaCry ransomware worm (2017).

💻 Denial of Service (DoS/DDoS): Attacker floods a server with millions of fake requests, consuming all CPU/RAM/bandwidth, making it unavailable to real users. DDoS = Distributed DoS using a botnet of thousands of infected machines.

🔍 Port Scanning: Probing a server's TCP/UDP ports to find open services (potential vulnerabilities). Used by both ethical hackers (pentesting) and attackers.

Threat	Type	Needs Host?	Self-Propagates?	Human Action?
Virus	Program	Yes (.exe)	No	Yes
Worm	Network	No (standalone)	Yes	No
Trojan	Program	Yes (disguise)	No	Yes
Logic Bomb	Program	Yes (insider)	No	Trigger-based

3.4 Distributed Operating Systems

📖 Distributed OS (DOS):
An OS that manages a collection of independent computers and makes them appear to the user as a single coherent machine (called a Single System Image). The network is completely transparent — users don't know or care which machine is actually running their job.

🌐 Distributed OS — transparency & single system image

✅ Key Features of Distributed OS:
• Transparency: User doesn't know which machine is running their process
• Load Balancing: OS automatically migrates jobs to less-busy machines
• Fault Tolerance: If one machine fails, jobs are automatically moved
• Scalability: Add more machines to increase capacity transparently

Network OS (NOS) — The Alternative

📖 Network OS: Loosely coupled collection of independent machines, each with its own OS. Users know and interact with each machine separately (e.g., SSH into a specific server).

Models:
• Client-Server: Dedicated servers provide services; clients request them (e.g., web browsing)
• Peer-to-Peer (P2P): Every machine is both client and server (e.g., BitTorrent, Blockchain)

Feature	Distributed OS (DOS)	Network OS (NOS)
User Awareness	Transparent — user unaware	User must know machine names
Coupling	Tightly coupled	Loosely coupled
Load Balancing	Automatic	Manual
Single Image	Yes ✅	No ❌
Example	Google's internal cluster	Unix NFS, Windows File Sharing

⚡ Ready for Exam? Sab padh liya? Ab Quick Revision karo — formulas, key points aur exam tricks ek jagah! Quick Revision Karo →

Topic	Key Formula/Fact	Trick
Disk Access Time	Seek + Rot.Lat + Transfer	Seek is slowest!
Avg Rot. Latency	= (60000/RPM) / 2 ms	Half a rotation
RAID 5 Capacity	(N–1) × Drive Size	One drive lost to parity
Indexed Alloc Access	O(1)	Best for direct access
Contiguous Flaw	External Fragmentation	Most asked PYQ topic
Virus vs Worm	Virus needs host; Worm is standalone	Worm = autonomous

📚 Table of Contents

Operating System

1.1 Magnetic Disk Structure

Key Components

1.2 Disk Performance & Access Time

Disk Scheduling Algorithms (Brief)

1.3 RAID — Redundant Array of Independent Disks

1.4 RAID Levels — Deep Comparison

2.1 The File Concept

File Control Block (FCB / Inode)

File Access Methods

2.2 Directory Structures

4. Acyclic Graph Directory

2.3 File Block Allocation Methods

Method 1: Contiguous Allocation

Method 2: Linked Allocation

Method 3: Indexed Allocation ⭐ (Most Important)

2.4 Free Space Management

3.1 Protection vs Security

3.2 Access Matrix & Its Implementations

Implementation 1: Access Control Lists (ACLs)

Implementation 2: Capability Lists

3.3 System Threats

A. Program Threats

B. Network Threats

3.4 Distributed Operating Systems

Network OS (NOS) — The Alternative

📀 Chapter 1 — Secondary Storage & RAID

📁 Chapter 2 — File Management & Allocation

🛡️ Chapter 3 — Protection & Distributed OS

Previous Year Question Paper Not Available Yet

RAID	Technique	Fault Tolerance	Capacity	Speed
RAID 0	Striping	0 ❌	N × Drive	Fastest ✅
RAID 1	Mirroring	1 Drive ✅	N/2 Drive	Normal
RAID 5	Stripe+Parity×1	1 Drive ✅	(N–1) Drive	Good ✅
RAID 6	Stripe+Parity×2	2 Drives ✅✅	(N–2) Drive	Moderate
RAID 10	Mirror+Stripe	1/mirror ✅	N/2 Drive	Fastest ✅

Method	Access Speed	Fragmentation	Dynamic Growth	Real Use
Contiguous	O(1) Fast	External ❌	No ❌	CD-ROM
Linked	O(N) Slow	None ✅	Yes ✅	FAT16/32
Indexed	O(1) Fast	None ✅	Yes ✅	UNIX ext4

Feature	Distributed OS (DOS)	Network OS (NOS)
User Visibility	Transparent — user unaware	User picks machine manually
Single Image	Yes ✅	No ❌
Load Balancing	Automatic ✅	Manual ❌
Coupling	Tightly coupled	Loosely coupled
Examples	Google cluster, AWS Lambda	Windows File Sharing, NFS