EGI compression schemes

This paper discusses the two compression methods that the animation compiler/player EGI uses, plus and image/data transformations to make the source data better "compressible". All these compression/transformation routines are lossless.

EGI uses a combination of different encodings: run length encoding, zero-order Huffman, Move-To-Front plus the Burrows-Wheeler transform, and frame-shift transformation. Only the first two encodings actually compress data. The Move-To-Front and the Burrows-Wheeler transform only represent the data in a different encoding (of equal, or slightly bigger, size), in order to make it more suitable for compression. The frame-shift encoding is a series of "shifting" instructions to make an image frame more similar to its successor. In a delta-frame encoding, there is then less to compress.

Run Length Encoding

The FLIC format specifies a simple "Run Length Encoding" (RLE) compression scheme. This is a lossless compression method that produces good results on simple graphics. It does not perform well on digitized photographs or rendered images with gradual colour changes (these occur, for example, in shading).

There are several varieties of RLE: byte-oriented, word-oriented (16 bit), bit-oriented, etc. The common approach is that they use packets. A packet consists of a count byte and one or more data bytes. If the count byte is positive, the next byte (or word, or bit) is to be repeated "count" times (a replicate run). If the count byte is negative, the next "count" bytes in the input stream are to be copied verbatim to the output stream (a literal run). For the byte-oriented RLE, it is best to merge a 2-byte repeat run that is preceded and followed by literal runs into a single literal run. Word-oriented RLE does not require this special case.

The worst case behaviour refers to the amount (or ratio) by which the data is expanded for the worst imaginable set of input data. Byte-oriented RLE (if well implemented) stores at most one extra byte for every 127 input bytes. Word-oriented RLE uses one extra byte for every 254 input bytes.

Byte oriented RLE and reverse RLE

int rle(unsigned char *output,unsigned char *input,int length)
{
  int count,index,i;
  unsigned char pixel;
  int out;

  out=0;
  count=0;
  while (count<length) {
    index=count;
    pixel=input[index++];
    while (index<length && index-count<127
           && input[index]==pixel)
      index++;
    if (index-count==1) {
      /* Failed to "replicate" the current pixel. See how many to copy.
       * Avoid a replicate run of only 2-pixels after a literal run. There
       * is no gain in this, and there is a risk of loss if the run after
       * the two identical pixels is another literal run. So search for
       * 3 identical pixels.
       */
      while (index<length && index-count<127
             && (input[index]!=input[index-1]
                || index>1 && input[index]!=input[index-2]))
        index++;
      /* Check why this run stopped. If it found two identical pixels, reset
       * the index so we can add a run. Do this twice: the previous run
       * tried to detect a replicate run of at least 3 pixels. So we may be
       * able to back up two pixels if such a replicate run was found.
       */
      while (index<length && input[index]==input[index-1])
        index--;
      output[out++] = (unsigned char)(count-index);
      for (i=count; i<index; i++)
        output[out++] = input[i];
    } else {
      output[out++] = (unsigned char)(index-count);
      output[out++] = pixel;
    } /* if */
    count=index;
  } /* while */
  return out;
}

void unrle(unsigned char *output,unsigned char *input,int length)
{
  signed char count;

  while (length>0) {
    count=(signed char)*input++;
    if (count>0) {
      /* replicate run */
      memset(output,*input++,count);
    } else if (count<0) {
      /* literal run */
      count=(signed char)-count;
      memcpy(output,input,count);
      input+=count;
    } /* if */
    output+=count;
    length-=count;
  } /* while */
}

The FLIC file format uses both byte and word-oriented RLE. EGI maintains the first pass of RLE encoding, because it is a compression technique that is fast, and that has a good worst case behaviour. Optionally, other compression techniques are performed on the RLE compressed data.

Huffman Encoding

This section does not discuss the Huffman encoding and decoding algorithms in general. For a description of the algorithms, see, for example, [Gonzalez, 1987, pp.265–266] and [Stevens, 1991]. (The "Huffman" compression used by EGI is more accurately described as Shannon-Fano coding. I invented it independently in 1990, and I have kept referring to it as "Huffman". To the decompressor, it does not matter how the code table was created.) This section only covers file format of the table and the compressed chunks in the FLIC file.

The table of all Huffman codes is in a chunk with value F1FCh. The number of codes in the table need not to be 256, less codes may be defined.

Each code in the table consists of three fields: the Huffman code (a series of bits) the length (in bits) of the code and the byte value that the code represents. The code field contains length significant bits, and it is left-aligned in the 16-bit field. For example, a code with length 4 and binary value "0110" is stored in the table as 6000h.

Only three chunk types are Huffman compressed: 7 (DELTA_FLC), 15 (BYTE_RUN) and 35 (KEY_IMAGE). The chunk header (the first six bytes of a chunk that give the size and type of the chunk) is not compressed. In addition, the first two bytes of a DELTA_FLC chunk (which give the number of lines to decode) are not compressed either. The Huffman compressed block starts after the chunk header, or (for a DELTA_FLC chunk) after the line count.

The rest of the chunk is compressed in blocks (of varying size). Each block starts with 2 two-byte values that give the number of compressed and the number of uncompressed bytes in the block respectively. The number of compressed bytes is the size of the block (the header of the block is not included). The number of uncompressed bytes is useful during decoding. That is, while decoding, you keep a count of the number of bytes that you store in the output stream. Decoding stops when that count reaches the number of "uncompressed bytes".

The decoding of Huffman-compressed blocks continues until the complete chunk (with the size given in the chunk header) has been read and expanded.

Burrows-Wheeler Transform

The Burrows-Wheeler Transform is described in detail in [Nelson, 1996] and [Burrows, 1994]. EGI uses the basic algorithm as is described in these papers. Each block of BWT data is Huffman compressed after the transform. The first two bytes of each block is the primary index of the block, the remaining of the block is the "L"-column of the Burrows-Wheeler transform. Only the L-column is Huffman compressed. Another way to look at this is to say that a plain Huffman encoded block starts with a four-byte header (with the "compressed" and "expanded" number of data bytes) and a block that is transformed using BWT before being compressed starts with a six-byte header (the "compressed" and "expanded" block sizes, plus the "primary index").

Unlike Mark Nelson's implementation, EGI does not store a block length (since each BWT block is separately Huffman compressed, the length of the Huffman block and the BWT block are the same) or a "suffix character index" (because EGI does not use the "suffix sort" optimization).

After sorting, the L-column of the BWT data is encoded using a "Move-To-Front" encoding. This algorithm is also discussed in both documents that cover the Burrows-Wheeler Transform. EGI uses a slightly different algorithm, which appears to perform a little better on complex (dithered) images. Instead of moving each character to the front of the list (index 0 of the character table), it is moved halfway between its current position and the front of the list (so, NewPos = CurPos / 2; using integer arithmetic). I refer to this as the "lazy" Move-To-Front heuristic, whereas the original algorithm is dubbed the "aggressive" Move-To-Front heuristic.

C code for the reverse transformation is given below. This code is not optimized, but it has been tested. Upon comparison of this code with Mark Nelson's implementation, you will see that the decoding is much simplified (and faster) by not using the suffix sort optimization. (The suffix sort is used to speed up the compression algorithm, not the decoding of the compressed data.)

Inverse "Move To Front" and "Burrows-Wheeler Transform"

void unmtf(unsigned char *buffer,int length)
{
  unsigned char table[256];
  int i,j;

  /* Initialize */
  for (i=0; i<256; i++)
    table[i]=(char)i;

  while (length-->0) {
    i=*buffer;
    *buffer=table[i];
    if (i>0) {
      j=i/2;
      memmove(table+j+1, table+j, i-j);
      table[j]=*buffer;
    } /* if */
    buffer++;
  } /* while */
}

int unbwt(unsigned char *output,unsigned char *input,int length,int index)
{
  unsigned short FAR *P;
  unsigned short C[256];
  unsigned char ch;
  int i,count,sum;

  if ((P=malloc(sizeof(unsigned short)*length))==NULL)
    return -1;
  memset(C,0,sizeof C);

  for (i=0; i<length; i++) {
    ch = input[i];
    count = C[ch];
    C[ch] = count+1;
    P[i] = count;
  } /* for */

  sum=0;
  for (i=0; i<256; i++) {
    count = C[i];
    C[i] = sum;
    sum += count;
  } /* for */

  while (length-->0) {
    ch = input[index];
    *output++ = ch;
    index = P[index] + C[ch];
  } /* while */

  free(P);
  return 0;
}

Frame shifting

Frame shift compression is not really a compression technique, but a transformation of a source image so that it better resembles the destination image. A delta-frame compression, then, sees less deltas and needs to compress less data. Frame shift data is effective on frames that are panned (in any direction) without zooming or rotation. It also works well for objects that move over a fixed background.

The frame shift data works per scan line. For each scan line, there are two frame shift values for horizontal and vertical displacement. EGI stores the horizontal and vertical shifts in two separate lists, each of which is compressed with "Run Length Encoding", but the transformation applies both the horizontal and the vertical shifts in a single pass. For the file format, see the paper "The FLIC file format".

As implemented in EGI, the "vertical shift list" is actually a misnomer. Instead of shifting pixel colons up or down, the procedure of vertical shifting makes a copy of the scan line on the "delta" location. That is, if the fourth byte in the vertical shift list is 3, a copy of the seventh (4+3) scan line from the source image is placed at the fourth scan line of the destination image.

Horizontally shifting does what its name says: it shifts a scan line to the left (for a negative shift count) or to the right. The gap that appears on either side due to the shifting is filled up with the colour of the first pixel on the scan line when shifting to the right and the last pixel when shifting to the left.

To minimize memory movement/copying, EGI performs the frame shifting "in place" —that is, the source and destination images are actually the same image. In place transformation has the complexity that the routine could overwrite a scan line that it needs later. For example, if the fourth and fifth bytes in the vertical shift list are both -1, a simplistic algorithm may try to first copy "source" scan line 3 to "destination" scan line 4 and then copy "source" scan line 4 to "destination" scan line 5. This would work correctly if the source and destination images are separate, but if they are the same, the first copy overwrites scan line 4 —whose original contents are needed for the second copy. The EGI animation compiler addresses this by writing a third list after the vertical shift and horizontal shift lists. This third list gives the order in which the scan lines should be shifted. In the above example, the problem is easily solved by doing the two scan line copies in inverted order: first copy scan line 4 to 5, then scan line 3 to 4.

The third list, the "priority list" gives the offsets to the next scan line to process, starting from the "current" scan line. The "current" scan line at the very start of the algorithm is set to zero. Each offset is a signed 15-bit value (range -16384 .. 16383), but it is stored in the list in either one or two bytes. If a byte in the list has its highest bit set, it represents the upper 7 bits of a 15-bit value and the next byte holds 8 more bits. If the highest bit is cleared, the byte represents the low 7 bits of a value in the range -64 .. 63 (it is stored as a 7 bit two's complement value; 7Fh represents -1, 40h is -64, and 3Fh is 63).

Scan lines with a vertical shift of zero must always be done last. The EGI compiler makes use of this simple rule by not adding any scan line with a zero vertical shift in its priority list.

When not performing the frame shift "in place", the priority list may be ignored. In theory, circular dependencies may occur, for which there is no single correct scan line order. The EGI compiler ensures that such dependencies are never encoded (at the cost of, at least theoretically, a slightly less efficient encoding).

References

Burrows, M. and D.J. Wheeler; "A Block-sorting Lossless Data Compression Algorithm", SRC Research Report 124, Digital Equipment Corporation 1994.: Original paper on the Burrows-Wheeler Transform.
Gonzalez, R.C. and P. Wintz; "Digital Image Processing"; second edition; Addison-Wesley; 1987.: A book on image processing, with a chapter on image encoding and a paragraph on Huffman encoding.
Nelson, Mark; "Data Compression with the Burrows Wheeler Transform"; Dr. Dobb's Journal, September 1996 (Volume 21, issue 9).: A good description and a nice (C⁺⁺) implementation of the Burrows Wheeler Transform.
Stevens, Al; "CUA and Data Compression"; Dr. Dobb's Journal; February 1991 (Volume 16, issue 2).: A simple description, plus a working (but slow) implementation of a Huffman compressor and decompressor.