Reading MBR and GPT partition tables by hand
Walk the partition table of a disk image yourself: MBR's 64 bytes, the EBR chain, GPT headers, CRCs, and the offset bugs that read the wrong sector.
A disk image has no inherent structure. It's a byte stream. Partitions, filesystems, files — all of it is interpretation laid on top by reading specific offsets and trusting the bytes there mean what a spec says. The partition table is the first layer of that interpretation, and it's small enough to read by hand. You should know how, because the day a tool shows you a clean partition tree you have reason to distrust is the day this stops being trivia.
Sector size comes first
Before you read a single partition entry, you need the sector size, because every partition start is stored as a logical block address and an LBA is meaningless without it. Most documentation assumes 512 bytes and a lot of old code hardcodes it. 4Kn drives have shipped for over a decade. If your parser assumes 512 on a 4Kn image, every LBA-to-byte calculation is off by a factor of eight — the MBR parse still succeeds because the MBR is at offset 0 either way, but every partition start points into garbage.
When you don't know the sector size, you can probe it: a GPT header lands at a predictable sector-relative location, so trying both 512 and 4096 and seeing which yields the EFI PART signature is a legitimate heuristic.
MBR: 64 bytes that still run the world
The Master Boot Record sits in the first sector. The partition table is 64 bytes starting at offset 0x1BE — four entries of 16 bytes — and the signature 0x55AA is at offset 0x1FE. Each entry:
0x00 1 boot indicator (0x80 = active)
0x01 3 starting CHS — ignore
0x04 1 partition type
0x05 3 ending CHS — ignore
0x08 4 starting LBA (little-endian)
0x0C 4 size in sectors (little-endian)
The CHS fields are a fossil from when drives didn't lie about their geometry. Never compute offsets from them; read the LBA. The only forensic value in CHS is as a tamper signal — nonsensical values sometimes mark a hand-crafted table.
Two traps catch everyone who writes their own MBR parser.
The 32-bit multiply that wraps
Start LBA and size are 32-bit. That caps an MBR partition at 2 TiB on 512-byte sectors, which is the entire reason GPT exists. But the bug isn't the cap — it's that the moment you multiply a 32-bit sector number by the sector size to get a byte offset, you can exceed 32 bits. Start sector 4,294,967,000 times 512 is well over 2 TiB. In a language where that arithmetic happens in 32-bit space, it wraps silently and your read succeeds at the wrong place. In JavaScript the trap is sharper: number is a double, fine to 2^53, but bitwise operators coerce to 32-bit signed. The instant someone writes lba << 9 instead of lba * 512, the value truncates. Multiply, never shift. Carry every offset in 64-bit-safe arithmetic end to end.
Extended partitions and the relative-offset confusion
MBR has room for four primary entries. The DOS workaround for needing more was the extended partition — type 0x05, 0x0F, or 0x85 — a container holding a singly linked list of logical partitions. Each link is an Extended Boot Record (EBR), a mini-MBR with two used entries: one describing the logical volume, one pointing to the next EBR.
The offsets are relative, and the rules differ between the two entries, which is where the bug lives. The first entry's LBA is relative to the EBR itself. The second entry's LBA — the link to the next EBR — is relative to the start of the extended partition, not the current EBR. Mix those up and your offsets drift down the chain until you either miss logical volumes or follow a pointer into nothing. Guard the walk with a visited-set and a hard iteration cap: a malformed or maliciously looped EBR chain will otherwise run a naive parser until it's out of memory.
GPT: better, still sharp
GPT lifted the size ceiling and the four-partition limit. Its layout:
- LBA 0 holds a protected MBR — a single entry of type
0xEEspanning the whole disk, there only to stop legacy tools from seeing the disk as empty and offering to initialize it. See0xEE, switch to GPT. See0xEEalongside real MBR entries and you have a hybrid MBR, which old Boot Camp setups produced; parse both tables and reconcile. - LBA 1 holds the GPT header. Signature is the eight ASCII bytes
EFI PART. It tells you where the entry array lives (usually LBA 2), how many entries there are, and how big each one is. - The backup GPT lives at the last sectors of the disk, header in the final LBA.
Do not hardcode 128 entries of 128 bytes, even though that's the near-universal default. Read NumberOfPartitionEntries and SizeOfPartitionEntry and use them. Anything that hardcodes 128×128 misparses images that deviate, and deviation is exactly what someone hiding data arranges.
GPT carries CRC32 checksums over the header and over the entry array. Validate them — not to reject data that fails, but because a mismatch means something edited the table without fixing the CRC, which real operating systems don't do and crude tampering tools often do. Parse anyway; record the mismatch as a finding.
The backup at the tail of the disk is a gift. If the primary header is zeroed, the backup usually survives, and any disagreement between primary and backup is worth flagging loudly. A tool that only reads the primary header calls a recoverable disk empty.
The type field is a label, not a fact
Both MBR type bytes and GPT type GUIDs describe intent, not content. Nothing enforces that a 0x07 partition contains NTFS, or that a 0x00 "empty" partition is actually empty. Someone hiding a volume can flag a real filesystem as empty, or flag free space as a known type to waste your time.
So the correct way to identify what's in a partition is to read the start of the partition and look for filesystem signatures, then cross-check against the declared type and note any discrepancy. The type byte tells you what someone wanted you to believe. The superblock tells you what's there — which is the next post in this series.
Discipline that keeps you out of trouble
Every read is an (offset, length) pair and every read can be wrong without announcing it. A few habits:
Convert sectors to bytes exactly once, at a single well-defined point in the code. Scatter that conversion and you get one path multiplying by 512, another shifting by 9, a third that forgot the partition base offset.
Carry the partition base explicitly. A partition declared at LBA 2048 with a filesystem inside means the superblock is at 2048 * sector_size + filesystem_offset. Forget the base and everything reads cleanly from the wrong place.
Bounds-check against the real image size before every read. Images get truncated mid-acquisition. A table happily declares a partition running past the end of a truncated image; seek there blindly and you read zero-padding and treat it as data. When a declared extent exceeds the image, that's a finding, not an exception to swallow.
And distrust overlap. Legitimate partitions don't overlap. If they do, it's a parse bug on your end, a corrupt table, or deliberate obfuscation — all three worth catching, and the check costs nothing.