From 9643a1b80ac9d5c2e3b2781e7536de874ee3f2bb Mon Sep 17 00:00:00 2001 From: Patric Stout Date: Tue, 15 Jun 2021 14:54:10 +0200 Subject: Doc: explain the binary structure of our (new) savegames --- docs/savegame_format.md | 175 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 docs/savegame_format.md (limited to 'docs/savegame_format.md') diff --git a/docs/savegame_format.md b/docs/savegame_format.md new file mode 100644 index 000000000..e4e2e4000 --- /dev/null +++ b/docs/savegame_format.md @@ -0,0 +1,175 @@ +# OpenTTD's Savegame Format + +Last updated: 2021-06-15 + +## Outer container + +Savegames for OpenTTD start with an outer container, to contain the compressed data for the rest of the savegame. + +`[0..3]` - The first four bytes indicate what compression is used. +In ASCII, these values are possible: + +- `OTTD` - Compressed with LZO (deprecated, only really old savegames would use this). +- `OTTN` - No compression. +- `OTTZ` - Compressed with zlib. +- `OTTX` - Compressed with LZMA. + +`[4..5]` - The next two bytes indicate which savegame version used. + +`[6..7]` - The next two bytes can be ignored, and were only used in really old savegames. + +`[8..N]` - Next follows a binary blob which is compressed with the indicated compression algorithm. + +The rest of this document talks about this decompressed blob of data. + +## Data types + +The savegame is written in Big Endian, so when we talk about a 16-bit unsigned integer (`uint16`), we mean it is stored in Big Endian. + +The following types are valid: + +- `1` - `int8` / `SLE_FILE_I8` -8-bit signed integer +- `2` - `uint8` / `SLE_FILE_U8` - 8-bit unsigned integer +- `3` - `int16` / `SLE_FILE_I16` - 16-bit signed integer +- `4` - `uint16` / `SLE_FILE_U16` - 16-bit unsigned integer +- `5` - `int32` / `SLE_FILE_I32` - 32-bit signed integer +- `6` - `uint32` / `SLE_FILE_U32` - 32-bit unsigned integer +- `7` - `int64` / `SLE_FILE_I64` - 64-bit signed integer +- `8` - `uint64` / `SLE_FILE_U64` - 64-bit unsigned integer +- `9` - `StringID` / `SLE_FILE_STRINGID` - a StringID inside the OpenTTD's string table +- `10` - `str` / `SLE_FILE_STRING` - a string (prefixed with a length-field) +- `11` - `struct` / `SLE_FILE_STRUCT` - a struct + +### Gamma value + +There is also a field-type called `gamma`. +This is most often used for length-fields, and uses as few bytes as possible to store an integer. +For values <= 127, it uses a single byte. +For values > 127, it uses two bytes and sets the highest bit to high. +For values > 32767, it uses three bytes and sets the two highest bits to high. +And this continues till the value fits. +In a more visual approach: +``` + 0xxxxxxx + 10xxxxxx xxxxxxxx + 110xxxxx xxxxxxxx xxxxxxxx + 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx + 11110--- xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx +``` + +## Chunks + +Savegames for OpenTTD store their data in chunks. +Each chunk contains data for a certain part of the game, for example "Companies", "Vehicles", etc. + +`[0..3]` - Each chunk starts with four bytes to indicate the tag. +If the tag is `\x00\x00\x00\x00` it means the end of the savegame is reached. +An example of a valid tag is `PLYR` when looking at it via ASCII, which contains the information of all the companies. + +`[4..4]` - Next follows a byte where the lower 4 bits contain the type. +The possible valid types are: + +- `0` - `CH_RIFF` - This chunk is a binary blob. +- `1` - `CH_ARRAY` - This chunk is a list of items. +- `2` - `CH_SPARSE_ARRAY` - This chunk is a list of items. +- `3` - `CH_TABLE` - This chunk is self-describing list of items. +- `4` - `CH_SPARSE_TABLE` - This chunk is self-describing list of items. + +Now per type the format is (slightly) different. + +### CH_RIFF + +(since savegame version 295, this chunk type is only used for MAP-chunks, containing bit-information about each tile on the map) + +A `CH_RIFF` starts with an `uint24` which together with the upper-bits of the type defines the length of the chunk. +In pseudo-code: + +``` +type = read uint8 +if type == 0 + length = read uint24 + length |= ((type >> 4) << 24) +``` + +The next `length` bytes are part of the chunk. +What those bytes mean depends on the tag of the chunk; further details per chunk can be found in the source-code. + +### CH_ARRAY / CH_SPARSE_ARRAY + +(this chunk type is deprecated since savegame version 295 and is no longer in use) + +`[0..G1]` - A `CH_ARRAY` / `CH_SPARSE_ARRAY` starts with a `gamma`, indicating the size of the next item plus one. +If this size value is zero, it indicates the end of the list. +This indicates the full length of the next item minus one. +In psuedo-code: + +``` +loop + size = read gamma - 1 + if size == -1 + break loop + read bytes +``` + +`[]` - For `CH_ARRAY` there is an implicit index. +The loop starts at zero, and every iteration adds one to the index. +For entries in the game that were not allocated, the `size` will be zero. + +`[G1+1..G2]` - For `CH_SPARSE_ARRAY` there is an explicit index. +The `gamma` following the size indicates the index. + +The content of the item is a binary blob, and similar to `CH_RIFF`, it depends on the tag of the chunk what it means. +Please check the source-code for further details. + +### CH_TABLE / CH_SPARSE_TABLE + +(this chunk type only exists since savegame version 295) + +Both `CH_TABLE` and `CH_SPARSE_TABLE` are very similar to `CH_ARRAY` / `CH_SPARSE_ARRAY` respectively. +The only change is that the chunk starts with a header. +This header describes the chunk in details; with the header you know the meaning of each byte in the binary blob that follows. + +`[0..G]` - The header starts with a `gamma` to indicate the size of all the headers in this chunk plus one. +If this size value is zero, it means there is no header, which should never be the case. + +Next follows a list of `(type, key)` pairs: + +- `[0..0]` - Type of the field. +- `[1..G]` - `gamma` to indicate length of key. +- `[G+1..N]` - Key (in UTF-8) of the field. + +If at any point `type` is zero, the list stops (and no `key` follows). + +The `type`'s lower 4 bits indicate the data-type (see chapter above). +The `type`'s 5th bit (so `0x10`) indicates if the field is a list, and if this field in every record starts with a `gamma` to indicate how many times the `type` is repeated. + +If the `type` indicates either a `struct` or `str`, the `0x10` flag is also always set. + +As the savegame format allows (list of) structs in structs, if any `struct` type is found, this header will be followed by a header of that struct. +This nesting of structs is stored depth-first, so given this table: + +``` +type | key +----------------- +uint8 | counter +struct | substruct1 +struct | substruct2 +``` + +With `substruct1` being like: + +``` +type | key +----------------- +uint8 | counter +struct | substruct3 +``` + +The headers will be, in order: `table`, `substruct1`, `substruct3`, `substruct2`, each ending with a `type` is zero field. + +After reading all the fields of all the headers, there is a list of records. +To read this, see `CH_ARRAY` / `CH_SPARSE_ARRAY` for details. + +As each `type` has a well defined length, you can read the records even without knowing anything about the chunk-tag yourself. + +Do remember, that if the `type` had the `0x10` flag active, the field in the record first has a `gamma` to indicate how many times that `type` is repeated. -- cgit v1.2.3-70-g09d2