Next Previous Contents

8. How does my computer store things on disk?

When you look at a hard disk under Unix, you see a tree of named directories and files. Normally you won't need to look any deeper than that, but it does become useful to know what's going on underneath if you have a disk crash and need to try to salvage files. Unfortunately, there's no good way to describe disk organization from the file level downwards, so I'll have to describe it from the hardware up.

8.1 Low-level disk and file system structure

The surface area of your disk, where it stores data, is divided up something like a dartboard -- into circular tracks which are then pie-sliced into sectors. Because tracks near the outer edge have more area than those close to the spindle at the center of the disk, the outer tracks have more sector slices in them than the inner ones. Each sector (or disk block) has the same size, which under modern Unixes is generally 1 binary K (1024 8-bit words). Each disk block has a unique address or disk block number.

Unix divides the disk into disk partitions. Each partition is a continuous span of blocks that's used separately from any other partition, either as a file system or as swap space. The lowest-numbered partition is often treated specially, as a boot partition where you can put a kernel to be booted.

Each partition is either swap space (used to implement virtual memory or a file system used to hold files. Swap-space partitions are just treated as a linear sequence of blocks. File systems, on the other hand, need a way to map file names to sequences of disk blocks. Because files grow, shrink, and change over time, a file's data blocks will not be a linear sequence but may be scattered all over its partition (from wherever the operating system can find a free block when it needs one).

8.2 File names and directories

Within each file system, the mapping from names to blocks is handled through a structure called an i-node. There's a pool of these things near the ``bottom'' (lowest-numbered blocks) of each file system (the very lowest ones are used for housekeeping and labeling purposes we won't describe here). Each i-node describes one file. File data blocks live above the inodes.

Every i-node contains a list of the disk block numbers in the file it describes. (Actually this is a half-truth, only correct for small files, but the rest of the details aren't important here.) Note that the i-node does not contain the name of the file.

Names of files live in directory structures. A directory structure just maps names to i-node numbers. This is why, in Unix, a file can have multiple true names (or hard links); they're just multiple directory entries that happen to point to the same inode.

8.3 Mount points

In the simplest case, your entire Unix file system lives in just one disk partition. While you'll see this arrangement on some small personal Unix systems, it's unusual. More typical is for it to be spread across several disk partitions, possibly on different physical disks. So, for example, your system may one small partition where the kernel lives, a slightly larger one where OS utilities live, and a much bigger one where user home directories live.

The only partition you'll have access to immediately after system boot is your root partition, which is (almost always) the one you booted from. It holds the root directory of the file system, the top node from which everything else hangs.

The other partitions in the system have to be attached to this root in order for your entire, multiple-partition file system to be accessible. About midway through the boot process, your Unix will make these non-root partitions accessible. It will mount each one onto a directory on the root partition.

For example, if you have a Unix directory called `/usr', it is probably a mount point to a partition that contains many programs installed with your Unix but not required during initial boot.

8.4 How a file gets looked up

Now we can look at the file system from the top down. When you open a file (such as, say, /home/esr/WWW/ldp/fundamentals.sgml) here is what happens:

Your kernel starts at the root of your Unix file system (in the root partition). It looks for a directory there called `home'. Usually `home' is a mount point to a large user partition elsewhere, so it will go there. In the top-level directory structure of that user partition, it will look for a entry called `esr' and extract an inode number. It will go to that i-node, notice it is a directory structure, and look up `WWW'. Extracting that i-node, it will go to the corresponding subdirectory and look up `ldp'. That will take it to yet another directory inode. Opening that one, it will find an i-node number for `fundamentals.sgml'. That inode is not a directory, but instead holds the list of disk blocks associated with the file.

8.5 How things can go wrong

Earlier we hinted that file systems can be fragile things. Now we know that to get to file you have to hopscotch through what may be an arbitrarily long chain of directory and i-node references. Now suppose your hard disk develops a bad spot?

If you're lucky, it will only trash some file data. If you're unlucky, it could corrupt a directory structure or i-node number and leave an entire subtree of your system hanging in limbo -- or, worse, result in a corrupted structure that points multiple ways at the same disk block or inode. Such corruption can be spread by normal file operations, trashing data that was bot in the original bad spot.

Fortunately, this kind of contingency has become quite uncommon as disk hardware has become more reliable. Still, it means that your Unix will want to integrity-check the file system periodically to make sure nothing is amiss. Modern Unixes do a fast integrity check on each partition at boot time, just before mounting it. Every few reboots they'll do a much more thorough check that takes a few minutes longer.

If all of this sounds like Unix is terribly complex and failure-prone, it may be reassuring to know that these boot-time checks typically catch and correct normal problems before they become really disasterous. Other operating systems don't have these facilities, which speeds up booting a bit but can leave you much more seriously screwed when attempting to recover by hand (and that's assuming you have a copy of Norton Utilities or whatever in the first place...).


Next Previous Contents