PUZ is a file format commonly used by commercial software for crossword puzzles. There is, to my knowledge, no other documentation of the format available online. This page (and the implementation) is the result of a few weeks' reverse engineering work, based in part on work started by Evan Martin.
As it currently stands, the documentation is far from complete. However, it is complete enough to read and write basic crossword puzzles in the format, such that you may interoperate with the common crossword applications. I have no real financial interest in this, it was just a fun hack and allows me to produce crossword puzzles for friends to play.
There is an GPLed implementation in C available: libpuz-latest.tar.gz
I welcome any input you may have after reading this. You can contact me via email at josh at joshisanerd.com. Please send me any new information you figure out, corrections or errors you find, or general comments on the library below.
| Name | Offset | End | Len | Contents |
|---|---|---|---|---|
| Checksum | 0x00 | 0x01 | 0x2 | Little-Endian short of the overall file checksum |
| File Magic | 0x02 | 0x0D | 0xC | NUL-terminated string, 4143 524f 5353 2644 4f57 4e00 |
| CIB Checksum | 0x0E | 0x0F | 0x2 | Little-Endian checksum of the puzzle's CIB |
| Masked Low Checksums | 0x10 | 0x13 | 0x4 | A set of checksums, XOR-masked against a magic string. |
| Masked High Checksums | 0x14 | 0x17 | 0x4 | A set of checksums, XOR-masked against a magic string. |
| Version String(?) | 0x18 | 0x1B | 0x4 | NUL-terminated string, "1.2\0" |
| Reserved1C(?) | 0x1C | 0x1D | 0x2 | In many files, this is uninitialized memory |
| Unknown | 0x1E | 0x1F | 0x2 | In all files, this is set to 0x0000(?) |
| Reserved20(?) | 0x20 | 0x2B | 0xB | In files where Reserved1C is garbage, this is garbage too. |
| BIC | 0x2C | 0x33 | 0x8 | I've named this range the Board Initialization Checksum
region. It contains the vitals of the puzzle, and is used
to initialize the overall checksum of the file. Note that this is not a "Real" value in the file |
| Width | 0x2C | 0x2C | 0x1 | The width of the board as a byte |
| Height | 0x2D | 0x2D | 0x1 | The height of the board as a byte |
| # of Clues | 0x2E | 0x2F | 0x2 | Little-Endian short of the number of clues for this board |
| Unknown Bitmask | 0x30 | 0x31 | 0x2 | Little-Endian short containing a bitmask. Operations unknown. |
| Unknown32 | 0x32 | 0x33 | 0x2 | Unknown short or two bytes... |
| Solution | 0x34 | 0x34+ w×h-1 | w×h | A flat string of bytes, one for each cell in the board. Rasters from the top left corner across the board, then to the second row, etc. Non-playable (ie: black) cells are denoted by '.' |
| Grid | 0x34+ w×h | 0x34+ 2(w×h)-1 | w×h | A flat string of bytes, one for each cell in the board.
Rasters from the top left corner across the board, then to
the second row, etc. Non-playable (ie: black) cells are
denoted by '.' If a cell is empty, it gets a '-'; otherwise the player's guess is stored in the cell. |
| Title | 0x34+ 2(w×h) | ?? | Delimited | A NUL-terminated string containing the title of the puzzle. |
| Author | ?? | ?? | Delimited | A NUL-terminated string containing the author of the puzzle. |
| Copyright | ?? | ?? | Delimited | A NUL-terminated string, containing the copyright statement for the puzzle. |
| Clues | ?? | ?? | Delimited | A string #-of-clues NUL-terminated strings, one right after the other. |
| Notes | ?? | EOF | ?? | Exact format unknown at present. They're there. They are NUL-terminated; if the user has opened the Notes pane without entering any, you get an empty string. This looks like an extra NUL. |
The checksumming routine used in PUZ is a variant of CRC-16. To checksum a region of memory, the following is used:
unsigned short cksum_region(unsigned char *base, int len, unsigned short cksum) {
int i;
for(i = 0; i < len; i++) {
if(cksum & 0x0001)
cksum = (cksum >> 1) + 0x8000;
else
cksum = cksum >> 1;
cksum += *(base+i);
}
return cksum;
}
To calculuate the primary checksum, you'll need to do the following:
cksum = cksum_region(CIB, 0x08, 0x0000);
cksum = cksum_region(solution, w*h, cksum);
cksum = cksum_region(grid, w*h, cksum);
cksum = cksum_region(title, strlen(title)+1, cksum);
cksum = cksum_region(author, strlen(author)+1, cksum);
cksum = cksum_region(copyright, strlen(copyright)+1, cksum);
for(i = 0; i < num_of_clues; i++)
cksum = cksum_region(clue[i], strlen(clue[i]), cksum)
The CIB checksum is simply:
cksum_cib = cksum_region(CIB, 0x08, 0x0000);
The values from 0x10-0x17 are a real pain to generate. They are the result of masking off and XORing four checksums; 0x10-0x13 are the low bytes, while 0x14-0x17 are the high bytes.
To calculate these bytes, we must first calculate four checksums:
Once these four checksums are obtained, they're stuffed into the file thusly:
file[0x10] = 0x49 ^ (c_cib & 0xFF);
file[0x11] = 0x43 ^ (c_sol & 0xFF);
file[0x12] = 0x48 ^ (c_grid & 0xFF);
file[0x13] = 0x45 ^ (c_part & 0xFF);
file[0x14] = 0x41 ^ ((c_cib & 0xFF00) >> 8);
file[0x15] = 0x54 ^ ((c_sol & 0xFF00) >> 8);
file[0x16] = 0x45 ^ ((c_grid & 0xFF00) >> 8);
file[0x17] = 0x44 ^ ((c_part & 0xFF00) >> 8);
All shorts are in little-endian format. Strings appear to be in Windows-1252 or ISO 8859-1. It's really hard to tell exactly which encoding they use.
A reference .PUZ is helpful if you're working on an implementation. I've created a Reference PUZ File, so we all have common ground to start from.
The following are unknown. I welcome suggestions or contributions!
I've (finally) completed the first draft of a GPL'd library implementing the PUZ file format. It's a plain C library with a sample application. It compiles and works in both Linux and Cygwin; other Unices should work as well.
You can download the latest version at libpuz-latest.tar.gz.