Notes on how to transport data over fax transmissions

G3 documents are sent as a number of lines. Each line contains codes
representing alternating black/white segments. A line is terminates
with 11 '0' bits (EOL). 3 or more EOL indicate end of page (?)
A line begins with a black segment.

In the following data are indicated in transmission order.

Black codes have 2 2-bit and 2 3-bit codes, which can be used to
represent 2 bits of information, with an average of 1.25 bits/bit.
The maximum segment length corresponding to these codes is 3 pixels.

White codes have 6 4-bit, 4 5-bit and 7 6-bit codes, which can be
used to represent 4 bits of information, with an average of 1.25
bits/bit (actually, slightly lower since there is an unused code
which can be used for other purposes).  The maximum segment length
corresponding to these codes is 16 pixels (17 considering the unused
code).

Following the above encoding, a 6-bit string takes up to 19 (20)
pixels. Since a G3 line can have a maximum width of 1728 pixels, we
can encode up to 540 bits/line corresponding to an average length of
686 bits (including the EOL), an average of 1.27 bits/bit.

Common sense suggests to keep lines shorter and make them an integral
number of bytes, e.g. 64 bytes -> 512 bits.

Vertically, a lowres fax can contain 98 dpi resulting in approximately
1000 lines per page.

Since one or more lines can get lost, some form of interleaving and
redundant encoding
must be used, to protect the receiver from losses. A simple encoding
could use a Reed-Solomon code where elements of each block are sent
one per line, and possibly rotated at each line to achieve a better
error correction.

Possible structure:
ENCODING:

1) pad data to an integral multiple of 128 bytes;
2) for each block build a 255,128 RS code;
3) interleave the code: the source for each line is made of
   one byte per block, rotated according to the line's number. Each
   line is additionally prepended by its number (modulo 255).
4) the source data is encoded according to g3 codes.

The shortest message is made of 255 lines, each containing a line
number and the EOL. Line numbers take on average 10 bits, for a total
of 21 bits/line, or 5 Kbit overhead (1/2 second). Data are then
expanded by roughly a factor of 2 (depending on the encoding chosen; a
(255,160) RS code makes this exactly 2.

DECODING:

1) fully missing lines are treated as erasures, partially
   received ones can be dealt with as errors/erasures. 
2) lines are aligned and interleaving removed.
3) blocks are rs-decoded;
4) if errors are present, then this can only be a fax
5) if no errors are present, parse the decoded text for a valid
   signature. If not found, this is a fax.


