Identify NTFS, FAT, exFAT and ext by signature

The partition type byte lies, or at least it's allowed to. Once you accept that, the only trustworthy way to know what's inside a partition is to read the start of it and look for the filesystem's own signature. This also finds the filesystems the partition table never mentions — volumes written directly to a disk with no table, data tucked into the alignment gap before the first partition, leftovers in unallocated space. A parser that only enumerates declared partitions walks straight past all of it.

The magic numbers, and where they actually live

Filesystems announce themselves with a magic value at a known offset relative to the start of the volume. The reliable ones:

NTFS     offset 3      "NTFS    "   (8-byte OEM ID, trailing spaces included)
FAT32    offset 82     "FAT32   "
FAT12/16 offset 54     "FAT16   " / "FAT12   "
exFAT    offset 3      "EXFAT   "
ext2/3/4 offset 1080   0x53 0xEF    (magic 0xEF53, little-endian)
XFS      offset 0      "XFSB"
APFS     offset 32     "NXSB"       (container superblock)
HFS+     offset 1024   "H+" / "HX"

The ext one is worth dwelling on because it trips people. The ext superblock starts at offset 1024 from the volume start, regardless of block size — the first 1024 bytes are reserved for boot code. The magic sits 0x38 into the superblock, putting it at byte 1080. Scan from offset 0 for 0x53EF and you'll match false positives in random data. Anchor it at 1080.

FAT detection has a version wrinkle: the filesystem-type string is at offset 82 for FAT32 but offset 54 for FAT12/16, because the BPB layout changed between them. Check both positions before deciding it isn't FAT.

A magic number is the weakest possible evidence

Anyone can write the eight bytes NTFS into a sector. If your detection stops at the OEM ID, it's trivially spoofed and it false-positives on any sector that happens to contain those bytes. The fix is to read one or two structural fields that have to be internally consistent for the filesystem to actually function, because consistency is expensive to fake and cheap to check.

For NTFS, the OEM ID at offset 3 is the soft check. The hard confirmation is the BIOS Parameter Block right after it: read bytes-per-sector (must be a sane power of two, usually 512 or 4096), sectors-per-cluster, and the cluster number of $MFT. Then seek to the MFT and confirm the first record starts with FILE — or BAAD, which marks a known-corrupt record and is itself informative. A coherent BPB plus a parseable MFT entry is much harder to forge convincingly than eight ASCII bytes.

For ext, don't stop at 0xEF53. The superblock also carries a block size (stored as a shift: 1024 << s_log_block_size), inode and block counts, and increasingly a 16-byte filesystem UUID. If the block size shift is absurd or the counts don't fit the partition, you found the magic in noise, not a real superblock.

The general rule: identify by content, then validate by consistency. Treat the partition type byte as a hint you cross-check, never as truth, and record any mismatch between the declared type and what you actually found — that discrepancy is sometimes the most interesting thing on the disk.

What you'll find that the table forgot

Reading by signature instead of by partition table surfaces the cases that matter most in an investigation.

A wiped MBR with intact filesystems behind it. The table is gone but the NTFS or ext volumes are sitting at their usual alignment offsets. Sector 2048 — the 1 MiB alignment modern tools use — is the first place to look for a "lost" first partition.

Nested containers. The outer partition type says "Linux filesystem" but the content is an LVM physical volume (LABELONE signature) holding logical volumes holding the actual filesystems. Or it's BitLocker (-FVE-FS-), or LUKS (LUKS followed by 0xBABE at offset 0). You can't stop at the first layer, and each layer compounds the offset arithmetic — one more reason to keep every conversion in wide integers and every relative-to-absolute step explicit.

Encrypted volumes look like high-entropy noise with a recognizable header. You can't read inside without keys, but you can and should report which volumes are encrypted and with what scheme, because that drives the rest of the investigation.

Doing it without a backend

All of this is range reads. To identify the filesystem in a partition you read a few kilobytes at the partition start, check the magic positions, and parse the BPB or superblock fields. That's a tiny amount of data even for a 500 GB image, which is why it works in a browser tab reading the image locally through the File System Access API — no upload, no server-side copy, nothing leaving the host. You seek, you read 4 KiB, you decide. Confirming the MFT or chasing an LVM volume costs a few more seeks, all of them small.

The payoff for reading the superblock yourself instead of trusting a label: you find the volume someone marked empty, you catch the filesystem the table never declared, and you can say in a report exactly what's on the disk and how you know — not because a tool said so, but because the bytes are internally consistent with a working filesystem. That distinction is the whole job.

The magic numbers, and where they actually live

A magic number is the weakest possible evidence

What you'll find that the table forgot

Doing it without a backend

Related articles