.PUZ File Format Specification

PUZ is a file format commonly used by commercial software for crossword puzzles. There is, to my knowledge, no other documentation of the format available online. This page (and the implementation) is the result of a few weeks' reverse engineering work, based in part on work started by Evan Martin.

As it currently stands, the documentation is far from complete. However, it is complete enough to read and write basic crossword puzzles in the format, such that you may interoperate with the common crossword applications. I have no real financial interest in this, it was just a fun hack and allows me to produce crossword puzzles for friends to play.

There is an GPLed implementation in C available: libpuz-latest.tar.gz

I welcome any input you may have after reading this. You can contact me via email at josh at joshisanerd.com. Please send me any new information you figure out, corrections or errors you find, or general comments on the library below.

File Format

NameOffsetEndLenContents
Checksum 0x000x010x2 Little-Endian short of the overall file checksum
File Magic 0x020x0D0xC NUL-terminated string, 4143 524f 5353 2644 4f57 4e00
CIB Checksum 0x0E0x0F0x2 Little-Endian checksum of the puzzle's CIB
Masked Low Checksums 0x100x130x4 A set of checksums, XOR-masked against a magic string.
Masked High Checksums 0x140x170x4 A set of checksums, XOR-masked against a magic string.
Version String(?) 0x180x1B0x4 NUL-terminated string, "1.2\0"
Reserved1C(?) 0x1C0x1D0x2 In many files, this is uninitialized memory
Unknown 0x1E0x1F0x2 In all files, this is set to 0x0000(?)
Reserved20(?) 0x200x2B0xB In files where Reserved1C is garbage, this is garbage too.
BIC 0x2C0x330x8 I've named this range the Board Initialization Checksum region. It contains the vitals of the puzzle, and is used to initialize the overall checksum of the file.

Note that this is not a "Real" value in the file
Width 0x2C0x2C0x1 The width of the board as a byte
Height 0x2D0x2D0x1 The height of the board as a byte
# of Clues 0x2E0x2F0x2 Little-Endian short of the number of clues for this board
Unknown Bitmask 0x300x310x2 Little-Endian short containing a bitmask. Operations unknown.
Unknown32 0x320x330x2 Unknown short or two bytes...
Solution 0x340x34+
w×h-1
w×h A flat string of bytes, one for each cell in the board. Rasters from the top left corner across the board, then to the second row, etc. Non-playable (ie: black) cells are denoted by '.'
Grid 0x34+
w×h
0x34+
2(w×h)-1
w×h A flat string of bytes, one for each cell in the board. Rasters from the top left corner across the board, then to the second row, etc. Non-playable (ie: black) cells are denoted by '.'

If a cell is empty, it gets a '-'; otherwise the player's guess is stored in the cell.
Title 0x34+
2(w×h)
??Delimited A NUL-terminated string containing the title of the puzzle.
Author ????Delimited A NUL-terminated string containing the author of the puzzle.
Copyright ????Delimited A NUL-terminated string, containing the copyright statement for the puzzle.
Clues ????Delimited A string #-of-clues NUL-terminated strings, one right after the other.
Notes ??EOF?? Exact format unknown at present. They're there. They are NUL-terminated; if the user has opened the Notes pane without entering any, you get an empty string. This looks like an extra NUL.

Checksumming Routine

The checksumming routine used in PUZ is a variant of CRC-16. To checksum a region of memory, the following is used:

unsigned short cksum_region(unsigned char *base, int len, unsigned short cksum) {
  int i;

  for(i = 0; i < len; i++) {
    if(cksum & 0x0001)
      cksum = (cksum >> 1) + 0x8000;
    else
      cksum = cksum >> 1;

    cksum += *(base+i);
  }
  
    return cksum;
}

Calculating primary PUZ checksum

To calculuate the primary checksum, you'll need to do the following:

	cksum = cksum_region(CIB, 0x08, 0x0000);
	cksum = cksum_region(solution, w*h, cksum);
	cksum = cksum_region(grid, w*h, cksum);
	cksum = cksum_region(title, strlen(title)+1, cksum);
	cksum = cksum_region(author, strlen(author)+1, cksum);
	cksum = cksum_region(copyright, strlen(copyright)+1, cksum);
	
	for(i = 0; i < num_of_clues; i++)
	  cksum = cksum_region(clue[i], strlen(clue[i]), cksum)
      

Calculating CIB checksum

The CIB checksum is simply:

	cksum_cib = cksum_region(CIB, 0x08, 0x0000);
      

Calculating the Masked Checksums

The values from 0x10-0x17 are a real pain to generate. They are the result of masking off and XORing four checksums; 0x10-0x13 are the low bytes, while 0x14-0x17 are the high bytes.

To calculate these bytes, we must first calculate four checksums:

  1. CIB Checksum with IV 0x0000:
    c_cib = cksum_region(CIB, 0x08, 0x0000)
  2. Solution Checksum with IV 0x0000:
    c_sol = cksum_region(solution, w*h, 0x0000)
  3. Grid Checksum with IV 0x0000:
    c_grid = cksum_region(grid, w*h, 0x0000)
  4. A partial board checksum with IV 0x0000:
    c_part = cksum_region(title, strlen(title)+1, 0x0000);
    c_part = cksum_region(author, strlen(author)+1, c_part);
    c_part = cksum_region(copyright, strlen(copyright)+1, c_part);
    for(i = 0; i < n_clues; i++)
      c_part = cksum_region(clue[i], strlen(clue[i]), c_part);

Once these four checksums are obtained, they're stuffed into the file thusly:

	file[0x10] = 0x49 ^ (c_cib & 0xFF);
	file[0x11] = 0x43 ^ (c_sol & 0xFF);
	file[0x12] = 0x48 ^ (c_grid & 0xFF);
	file[0x13] = 0x45 ^ (c_part & 0xFF);

	file[0x14] = 0x41 ^ ((c_cib & 0xFF00) >> 8);
	file[0x15] = 0x54 ^ ((c_sol & 0xFF00) >> 8);
	file[0x16] = 0x45 ^ ((c_grid & 0xFF00) >> 8);
	file[0x17] = 0x44 ^ ((c_part & 0xFF00) >> 8);
      

Encodings

All shorts are in little-endian format. Strings appear to be in Windows-1252 or ISO 8859-1. It's really hard to tell exactly which encoding they use.

Reference File

A reference .PUZ is helpful if you're working on an implementation. I've created a Reference PUZ File, so we all have common ground to start from.

Unknowns

The following are unknown. I welcome suggestions or contributions!

Implementation

I've (finally) completed the first draft of a GPL'd library implementing the PUZ file format. It's a plain C library with a sample application. It compiles and works in both Linux and Cygwin; other Unices should work as well.

You can download the latest version at libpuz-latest.tar.gz.


Josh Myer

Thanks to Evan Martin for his feedback on this document.

Last modified: Wed Jan 4 21:00:31 EST 2006