           Q L  H A C K E R ' S  J O U R N A L
----------------------------------------------------------
   Number 11                               Nov 1992
----------------------------------------------------------

   The QL Hacker's Journal (QHJ) is an amateur publication
published by Tim Swenson strictly as a hobby and as a
service to the QL Community.
   The QHJ is copyright by Tim Swenson and respective
contributers, but may be freely copied and distributed
(please do) to all QL users.
   The QHJ is always interested in article submissions.
If interested, please send mail to the address below
(Snail Mail or E-Mail).
   Electronic copies of the QHJ are available on disk
or via e-mail from:

        QL Hacker's Journal
            c/o Tim Swenson
        4773 W. Braddock Rd. #3
        Alexandria, VA 22311
        (703) 820-6657
        tswenson@sesky4102b.pl.osd.mil
        tswenson@dgis.dtic.dla.mil

Editor's Forumn

   One problem I have noticed with writing this section of
the the QHJ is that I that I don't have the words to
introduce the issue.  I try to get each QHJ done at the end
of the month before publication.  This coincides with the
publictation schedule of the CATS newsletter (the other
newsletter that I edit).  By the time I get to the QHJ I'm
sort of burnt out from CATS.

   So how about some news:  Via MausNet comes the news of
the 4th International QL Meeting in Italy, 24 Jan, Reggio
Emilia, Italy.  The meeting is being put on my Ergon
Development and QITALY Club.  Full information can be had
from Davide Santachiara, Via Emilio De Marchi 2, 42100
Reggio Emilia, Italy, +39 522 70409.

   From Jeff Kuhlmann, an American in Germany, I have
recieved three shareware emulators written by Carlos
Delhez.  They are Spectator (Spectrum Emulator for the QL),
Extricator (ZX81 Emulator for the QL), and XTender (ZX81
Emulator for MS-DOS).  If anyone is interested in these,
send a disk (5.25) plus postage for each program.

   Also from MausNet comes of news of two Russians, Andrew
Lavrov and Anatoly Tishin, who are working on a QL clone
based on the 68040 (about 40 MHz). This computer should also
be able to "clone" most any 680xx based computer (Amiga, ST,
MAC).  For the moment consider this Vaporware.  We'll have
to wait and see if anything comes of this.

   Guiseppe Zanetti has posted QLTOOLS to MausNet.  QLTOOLS
is a C program that will read a QL disk.  It was written for
UNIX but recently ported to MS-DOS.  I've received a copy
and had a friend compile it (Turbo C++).  It does read the
disks, but I have found that when reading a file to STDOUT,
all that is output is a number of spaces.  The larger the
file, the more spaces.  I have yet to talk with Guiseppe
about this problem, so it may only be localized to either my
copy of the source or the MS-DOS port.  Guiseppe has
mentioned that he does not have the time to make the program
write QL disks.  Maybe someone will pick up the challenge.

   One last bit of news, it looks like I will be moving
near the first of the year.  After four years in the
Pentagon, I decided it was time to move on.  I put in for a
job at Wright-Patterson AFB, Dayton, Ohio, and it looks
like I will get the job.  I will be moving about middle of
January. I will let everyone know of the new address when I
get it.

   Going to Wright-Patterson AFB (WPAFB), means that I may
lose my Internet connection.  I know WPAFB does have an
Internet connection, but I'm not too sure if I can get
access.  I'm pretty sure that I can swing it, but it's not
100%.

   So until next time, see you on the Internet and Happy
Hacking.

C68 Version 3.03
   By Tim Swenson

   I have received version 3.03 of C68.  Just before this I
picked up a copy of 3.01 and had problems with it.  I
expected to get version 3.02 (a serious bug fix of 3.01) but
got 3.03 instead.  Version 3.04 has been released by Dave
Walker.  He seems to put out a new release about 1-2 months,
so don't expect to be up-to-date.

   C68 is now up to 12 disks.  There are 3 disks for Runtime
and documentation, 4 disks for source, 1 disk for tutorial,
and and 4 assorted disks.  The assorted disks contain the
most interesting items.

   Debugging Tools: Headers, libraries, doc's, and source.
   Not being a user of C debugging tools, I can't really say
much on these tools.

   C Programming Tools: cproto, indent, unproto,
executables, docs, & source.
   cproto is used to generate C function prototypes and
convert function definitions.  unproto is used to convert
ANSI programs to K&R so they can be compiled with C68.

   LIBQPTR: System files, doc's, & source.
   Not knowing a lot about the QPTR pointer interface, I can
only make a few comments on this disk.  It contains Window
Manager and some code dealing with Menus (a Menu Manager?).
I've only started looking over the doc's.  I don't know if
any other software is needed to use this library.

   CPORT Support Library.
   This disk contains a library to use with programs ported
to C using CPORT, a commerical SuperBasic to C utility.
From reading the doc's, the CPORT library can not be
distributed to non-CPORT owners.  This library for use in
distributing your CPORT compiled programs.

   The first version of C68 that I received and used the
most is version 1.05.  With version 3.01 and greater, C68
comes with C68Menu, a menued front-end for C68.  It makes
C68 easier to learn, but has a few draw backs.

   From MausNet Franz Herrmann reports that Helge Kauke has
found the Trig functions of C68 math library do not work.
The article does not mention what version of C68, so I'll
add it so all can test it for themselves.

   #include <stdio.h>
   #include <math.h>
   /*  This proram hangs up when calling sin(x) at x= 2*PI.
         And the results of cos()/sin() are wrong.
   */

   int main ( void) {

      double x;

      for ( x = 0; x <= 2*PI; x += PI/2) {
         printf("%8f %8f %8f\n", x, sin(x), cos(x));
      }

      return 0;
  }


Disk Eraser
    By Herb Schaaf and Tim Swenson

   In one of my program idea brainstorming sessions, I came
up with the idea of writing a program to completely erase a
floppy disk.  The idea comes from government regulations.
When deleting classified information from a disk, the disk
must be written with all 1's, then all 0's, and then random
1's and 0's.  After this, the disk can be re-formatted and
considered clean enough for unclassified use.

   It has been shown that reformatting a disk will not
delete all of the information.  Some bit patterns will
remain, although they will be faint.  The above procedure
ensures that enough writes happen to the disk to completely
wipe out any bit patterns on the disk.

   To implement this program, I looked at Herb Schaaf's
Diskinfo program (QHJ #2).  I got a few things figured out
by looking at the code, but not enough to really give the
program a try.

   I have Herb a call to see if he would help me on the
program, or write it himself.  Herb, always ready to accept
a challenge, sent me the program a few days later.  I have
taken what Herb wrote and made it more procedural and added
a few touches of my own.

   As is, the program will only write 1's and 0's.  Adding
the random 1's and 0's should be fairly easy to do if you
really want to use this program.  Just as with a data
encryption program, I don't think that my QL users will
really need to use this program.  It's just a good exercise
in programming and does demonstrate how to address every
sector of a disk.

** wipedisk_ssb
**
**  This program will write all 0's then all 1's
**   to a disk and then format it.  This is designed
**   to totally delete all information from the disk.
**   A format does not always delete the information
**   at the lowest levels.
**
**  Government regulations decrea that all disks
**   that contain classified information go through
**   a program like this to guarrentee that all of
**   the information is gone for good.


BORDER #3,1,4
PAPER #3,2 : INK #3,1 : CLS #3

** TK2_EXT

** Write 0's to the whole disk
PRINT #3,"Writing Zero's To Disk"
null$ = FILL$(CHR$(0),512)
wipe (null$)
AT #3,6,0 : PRINT "Done"
PAUSE 200
CLS #3

** Write 1's to the whole disk
PRINT #3,"Writing One's To Disk"
full$ = FILL$(CHR$(255),512)
wipe (full$)
AT #3,6,0 : PRINT #3,"Done"
PAUSE 200
CLS #3
PRINT #3,"Formating Disk"
FORMAT "flp2_"
PRINT #3,"Disk Formatted"

STOP

DEFine PROCedure wipe (string$)
   LOCal side, sector, track, in$

   OPEN #4,'flp2_*d2d'

   FOR track = 0 TO 79
      FOR side = 0 TO 1
         FOR sector = 1 TO 9
            PUT #4\sector+(side*256)+(track*65536),string$
            AT #3,1,2: PRINT #3,"Track : ";track
            AT #3,2,2: PRINT #3,"Side  : ";side
            AT #3,3,2: PRINT #3,"Sector: ";sector
         END FOR sector
      END FOR side
   END FOR track

   CLOSE #4
END DEFine PROCedure wipe


LF/CR to LF in Editors
   by Tim Swenson

   All of the C programs that I get from non-QL sources get
to the QL via an MS-DOS disk.  As most know, MS-DOS uses
LF/CR for newline and the QL uses just LF.  I've written a
simple SuperBasic program that will strip off the last
character from each line of a file, thereby getting rid of
the unnecessary CR.

   I've found that it's a bother to have to run each file
through the program first, then edit it.  I figured there
had to be a way to get rid of the CR in editors that have a
macro language.

   The editor that I use the most for programming is
Metacomco ED.  As powerfull as MicroEmacs is, I like ED
because it can be resized to almost any size.  Below is an
ED macro that will delete the last character of each line of
the resident file.  You should be at the top of the file
before executing the macro.  Since ED does not have a way of
checking for End-Of-File, the macro will stop when it comes
to an error, which will happen when the macro reaches the
end of the file.

    RP ( CE; CL; DC; N; )

   This macro will also work in QED, the freeware ED
work-a-like that comes with C68.  One caveot about the
macro, don't worry if the display looks kind of weird while
the macro is executing, everything will come back to normal.
In fact, it is a good idea to do a screen redraw (F4) to
make sure the display looks as it should.

   My next task was to write a macro for MicroEmacs.  Since
I did not find a End-Of-File check in MicroEmacs, it took me
a while to realise that MicroEmacs, like ED, will stop a
macro when an error is encountered.

   After coming up with a script that I felt will work, I
loaded a file into MicroEmacs and gave it a try.  I found
the Execute-File macro, but it is not bound to a key in
version 3.9.  To run the macro stored in a file, hit M-X
(Esc X) to Execute-Program, enter execute-file, and then
enter the file name to execute.

   Once I figured this out, I thought I had it made. Wrong.
Once I entered the file name to execute, nothing happened.
I've tried it many times and still Nada.  Obviously I must
be doing something wrong, but I can't figure it out.  Below
is the script that I think should work in MicroEmacs.
Someone out there should know how to use MicroEmacs better
than I and can test it out.

    beginning-of-file
    !while
       end-of-line
       delete-previous-character
       next-line
    !endwhile


Random ASCII Stereograms
   By Tim Swenson

   In QHJ #9, Herb Schaaf wrote a program that created
Random Dot Stereograms based on a Mathmatica program
published in a Mathmatica magazine.  A number of RDS's have
been posted to alt.3d in postscript form.  Herb and I have
ruined our eyes looking at these things.  All of the posted
RDS's have a high dot resolution.

   Recently, David Coleman posted a different approach to
these stereograms called Random ASCII Stereograms (RAS).
His program takes an input file that describes the object to
show in 3D, and produces an ASCII pattern, that when viewed
correctly, shows the object in 3D.

   The stereograms are not as complicated as the RDS's.
They only show two levels of 3D, seen as a flat object above
a flat surface.  The RAS's are not as interesting as RDS's
but are easily implemented on systems that may not display
graphics, easy to port, and simpler to understand than
RDS's.

   Upon showing a associate at work, who has a big
background in computer security, we wondered if text
messages can be embedded into RDS's and "invisible" to the
average person.  With RAS's, this can easily be done.  The
input file can be a simple representation of characters and
then easily e-mailed to others in "private". Granted, not
very private, but private unless you know the trick.

   The only changes I have made to Dave's program are:

      1) Use the proper Rand function for C68.
      2) Send output to a file (flp1_ras_out) instead of
         stdout.

/*
ras.c   A simple random ascii stereogram generator
  08/24/92    David Coleman, coleman@netmail.austin.ibm.com
  10/05/92    Ported by QL by Tim Swenson
              added code to send output to flp1_ras_out
              instead of stdout.
              tswenson@se3c763a.pl.osd.mil
*/
#include <stdio_h>
#include <stdlib_h>
#include <math_h>

main(argc,argv)
  int argc;
  char *argv[];
{

   int c,d,i,j,k,row,col,flag,done;
   char rstr[17];
   char line[24][80];
   FILE *fp, *fd2;

   flag =  done = 0;
   if (2!=argc)  {
      fprintf(stderr,"usage: %s datafile\n",argv[0]);
      exit(1);
   }

   srand(getpid());

   for (i=0;i<20;i++) {
     for (j=1;j<16;j++) {
       done=0;
       while(0==done) {
         c = (43+rnd(79));
         flag = 0;
         for (k=1;k<j;k++) {
           if(rstr[k-1]==c) flag=1;
         }
         if (0==flag) done=1;
       }
       rstr[j-1]=c;
     }
     rstr[16]= '\0';
   sprintf(line[i],"%s%s%s%s%s",rstr,rstr,rstr,rstr,rstr);
   }

   fp=fopen(argv[1],"r");
   col = row = 0;
   while(!(feof(fp))) {
    c=fgetc(fp);
    if ('\n'==c) {
      col = 0;
      ++row;
    }
    else {
      ++col;
      if (' '!=c) {
        line[row][col]=line[row][col+1];
        d = (43+rnd(79));
        line[row][col+1] = d;
        if((col+16)<80) {
        line[row][col+15]=line[row][col+16];
        line[row][col+16] = d;
      }
      if((col+31)<80) {
         line[row][col+30]=line[row][col+31];
         line[row][col+31] = d;
      }
      if((col+46)<80) {
         line[row][col+45]=line[row][col+46];
         line[row][col+46] = d;
      }
    }
   }
  }
  fclose(fp);
  for(i=0;i<20;i++) {
     for(j=60;j<75;j++) {
       if((' '==line[i][j])||(0==line[i][j])) line[i][j]=
          line[i][j-15];
     }
     line[i][75]=0;
  }

  fd2 = fopen("flp1_ras_out","w");
  fprintf(fd2," (32 spaces)   O              O\n");
  for(i=0;i<20;i++) fprintf(fd2,"%s\n",line[i]);
  fclose(fd2);
}

int rnd(num)
{
   rand();
   return (rand() % num);
}

Random ASCII Stereogram Sample Data File:
   ras_dat

                          #
                          ##
                          ###
                          ####          #######
                          #####         #######
                          ######        #######
                          #######       #######
                          ########
                          #########
                          ##########
                          ###########
                          ############
                          #############
                          ##############


LZW Compression
  by Tim Swenson

   In the June and Oct 1992 issues of Dr. Dobb's Journal,
there were two articles on LZW compression.  The second
article is an improvement to the first program.

   The original source code was formatted for 80 columns.
Since the QHJ is 60 columns, I've had to do some
reformatting of the code.  Some comments follow the code
that they refer to.  I've put in ^'s to mark this.

   The only changes made to the code to port it to the QL
was those sections relating to files.  Instances of . were
changed to _ and reference was made to FLP1_.  The program
would not use the TOOLKIT II default of PROG_USE or
DATA_USE.

/* Basic LZW Data Compression program published in DDJ
 * October 1989 issue.
 * Original Author: Mark R. Nelson
 * Updated by: Shawn M. Regan, January 1990
 * Added: - Method to clear table when compression ratio
 *          degrades
 *        - Self adjusting code size capability (up to
 *          14 bits)
 * Updated functions are marked with "MODIFIED". main() has
 * been updated also
 * Compile with -ml (large model) for MAX_BITS == 14 only
 */

#include <stdio_h>

#define INIT_BITS 9
#define MAX_BITS 14
    /* Do not exceed 14 with this program */
#define HASHING_SHIFT MAX_BITS - 8

#if MAX_BITS == 14
 /* ^ Set the table size. Must be a prime    */
#define TABLE_SIZE 18041
 /* ^ number somewhat larger than 2^MAX_BITS.*/
#elif MAX_BITS == 13
#define TABLE_SIZE 9029
#else
#define TABLE_SIZE 5021
#endif

#define CLEAR_TABLE 256
 /* ^ Code to flush the string table */
#define TERMINATOR  257
 /* ^ To mark EOF Condition, instead of MAX_VALUE */
#define FIRST_CODE  258
 /* ^ First available code for code_value table */
#define CHECK_TIME  100
 /* ^ Check comp ratio every CHECK_TIME chars input */
#define MAXVAL(n) (( 1 <<( n )) -1)
 /* ^ max_value formula macro */

unsigned input_code();
void *
malloc();

int *code_value;
 /* ^ This is the code value array */
unsigned int *prefix_code;
 /* ^ This array holds the prefix codes */
unsigned char *append_character;
 /* ^ This array holds the appended chars */
unsigned char decode_stack[4000];
 /* ^ This array holds the decoded string */

int num_bits=INIT_BITS;
 /* ^ Starting with 9 bit codes */
unsigned long bytes_in=0,bytes_out=0;
 /* ^ Used to monitor compression ratio */
int max_code;
 /* ^ old MAX_CODE */
unsigned long checkpoint=CHECK_TIME;
 /* ^ For compression ratio monitoring */

main(int argc, char *argv[])
{
   FILE *input_file, *output_file, *lzw_file;
   char input_file_name[81];
 /* The three buffers for the compression phase.  */
   code_value=malloc(TABLE_SIZE*sizeof(unsigned int));
   prefix_code=malloc(TABLE_SIZE*sizeof(unsigned int));
   append_character=malloc(TABLE_SIZE*sizeof
     (unsigned char));

   if (code_value==NULL || prefix_code==NULL ||
         append_character==NULL) {
      printf("Error allocating table space!\n");
      exit(1);
   }
/* Get the file name, open it, and open the output file. */
   if (argc>1)
      strcpy(input_file_name,argv[1]);
   else {
      printf("Input file name: ");
      scanf("%s",input_file_name);
   }
   input_file=fopen(input_file_name,"rb");
   lzw_file=fopen("test_lzw","wb");
   if (input_file == NULL || lzw_file == NULL) {
      printf("Error opening files\n");
      exit(1);
   }
   max_code = MAXVAL(num_bits);
    /* ^ Initialize max_value & max_code */
   compress(input_file,lzw_file);
    /* ^ Call compression routine */

   fclose(input_file);
   fclose(lzw_file);
   free(code_value);
   /* ^ Needed only for compression */

   lzw_file=fopen("test_lzw","rb");
   output_file=fopen("test_out","wb");
   if (lzw_file == NULL || output_file == NULL) {
      printf("Error opening files\n");
      exit(1);
   }
   num_bits=INIT_BITS;
   /* ^ Re-initialize for expansion */
   max_code = MAXVAL(num_bits);
   expand(lzw_file,output_file);
    /* ^ Call expansion routine */

   fclose(lzw_file);
    /* ^ Clean it all up */
   fclose(output_file);
   free(prefix_code);
   free(append_character);
}
/* MODIFIED This is the new compression routine. The
 * first two 9-bit codes
 * have been reserved for communication between the
 * compressor and expander.
 */
compress(FILE *input, FILE *output)
{
   unsigned int next_code=FIRST_CODE;
   unsigned int character;
   unsigned int string_code;
   unsigned int index;
   int i,             /* All purpose integer */
   ratio_new,
    /* ^ New compression ratio as a percentage */
   ratio_old=100;     /* Original ratio at 100% */

   for (i=0;i<TABLE_SIZE;i++)
    /* ^ Initialize the string table first */
      code_value[i]=-1;
   printf("Compressing\n");
   string_code=getc(input);     /* Get the first code */

 /* This is the main compression loop. Notice when the
  * table is full we try
  * to increment the code size. Only when num_bits ==
  * MAX_BITS and the code
  * value table is full do we start to monitor the
  * compression ratio.
  */
   while((character=getc(input)) != (unsigned)EOF) {
      if (!(++bytes_in % 1000)) {
       /* ^ Count input bytes and pacifier */
         putchar('.');
      }
      index=find_match(string_code,character);
      if (code_value[index] != -1)
         string_code=code_value[index];
      else {
         if (next_code <= max_code ) {
            code_value[index]=next_code++;
            prefix_code[index]=string_code;
            append_character[index]=character;
         }
         output_code(output,string_code);
      /* ^ Send out current code */
         string_code=character;
         if (next_code > max_code) {
      /* ^ Is table Full? */
            if ( num_bits < MAX_BITS) {
         /* ^ Any more bits? */
               putchar('+');
               max_code = MAXVAL(++num_bits);
            /* ^ Increment code size then */
            }
            else if (bytes_in > checkpoint) {
         /* ^ At checkpoint? */
               if (num_bits == MAX_BITS ) {
                ratio_new = bytes_out*100/bytes_in;
         /* ^ New compression ratio */
                if (ratio_new > ratio_old) {
         /* ^ Has ratio degraded? */
                  output_code(output,CLEAR_TABLE);
           /* ^ YES,flush string table */
                  putchar('C');
                  num_bits=INIT_BITS;
                  next_code=FIRST_CODE;
           /* ^ Reset to FIRST_CODE */
                  max_code = MAXVAL(num_bits);
           /* ^ Re-Initialize this stuff */
                  bytes_in = bytes_out = 0;
                  ratio_old=100;
           /* ^ Reset compression ratio */
                  for (i=0;i<TABLE_SIZE;i++)
           /* ^ Reset code value array */
                        code_value[i]=-1;
               }
               else          /* NO, then save new */
               ratio_old = ratio_new;
            /* ^ compression ratio */
            }
            checkpoint = bytes_in + CHECK_TIME;
         /* ^ Set new checkpoint */
            }
         }
      }
   }
   output_code(output,string_code);
    /* ^ Output the last code */
   if (next_code == max_code) {
    /* ^ Handles special case for bit */
      ++num_bits;             /* increment on EOF */
      putchar('+');
   }
   output_code(output,TERMINATOR);
    /* ^ Output the end of buffer code */
   output_code(output,0);
    /* ^ Flush the output buffer */
   output_code(output,0);
   output_code(output,0);
   putchar('\n');
}
/* UNCHANGED from original
 * This is the hashing routine.
 */
find_match(int hash_prefix, unsigned int hash_character)
{
   int index, offset;

   index = (hash_character<<HASHING_SHIFT )^hash_prefix;
   if (index == 0 )
      offset=1;
   else
      offset = TABLE_SIZE - index;
   while(1) {
      if (code_value[index] == -1 )
         return(index);
      if (prefix_code[index] == hash_prefix &&
            append_character[index] == hash_character)
         return(index);
      index -= offset;
      if (index < 0)
         index += TABLE_SIZE;
   }
}
/* MODIFIED This is the modified expansion routine. It
 * must now check for the
 * CLEAR_TABLE code and know when to increment the code
 * size.
 */
expand(FILE *input, FILE *output)
{
   unsigned int next_code=FIRST_CODE;
   unsigned int new_code;
   unsigned int old_code;
   int character,
   counter=0,
   clear_flag=1;
    /* ^ Need to clear the code value array */
   unsigned char *string;
   char *decode_string(unsigned char *buffer, unsigned
      int code);

   printf("Expanding\n");

   while((new_code=input_code(input)) != TERMINATOR) {
      if (clear_flag) {
       /* ^ Initialize or Re-Initialize */
         clear_flag=0;
         old_code=new_code;
      /* ^ The next three lines have been moved */
         character=old_code;  /* from the original */
         putc(old_code,output);
         continue;
      }
      if (new_code == CLEAR_TABLE) {
       /* ^ Clear string table */
         clear_flag=1;
         num_bits=INIT_BITS;
         next_code=FIRST_CODE;
         putchar('C');
         max_code = MAXVAL(num_bits);
         continue;
      }
      if (++counter == 1000) {           /* Pacifier */
         counter=0;
         putchar('.');
      }
      if (new_code >= next_code) {
       /* ^ Check for string+char+string */
         *decode_stack=character;
         string=decode_string(decode_stack+1,old_code);
      }
      else
         string=decode_string(decode_stack,new_code);

      character = *string;
       /* ^ Output decoded string in reverse */
      while (string >= decode_stack)
         putc(*string--,output);

      if (next_code <= max_code) {
       /* ^ Add to string table if not full */
         prefix_code[next_code]=old_code;
         append_character[next_code++]=character;
         if (next_code==max_code && num_bits < MAX_BITS) {
            putchar('+');
            max_code = MAXVAL(++num_bits);
         }
      }
      old_code=new_code;
   }
   putchar('\n');
}
/* UNCHANGED from original
 * Decode a string from the string table, storing it in a
 * buffer.
 * The buffer can then be output in reverse order by the
 * expansion program.
 */
char *decode_string(unsigned char *buffer, unsigned
   int code)
{
   int i=0;

   while(code > 255 ) {
      *buffer++ = append_character[code];
      code=prefix_code[code];
      if (i++ >= 4000 ) {
         printf("Error during code expansion\n");
         exit(1);
      }
   }
   *buffer=code;
   return(buffer);
}

/* UNCHANGED from original
 * Input a variable length code.
 */
unsigned input_code(FILE *input)
{
   unsigned int return_value;
   static int input_bit_count=0;
   static unsigned long input_bit_buffer=0L;

   while (input_bit_count <= 24 ) {
     input_bit_buffer |= (unsigned long) getc(input) <<
         (24 - input_bit_count);
     input_bit_count += 8;
   }
   return_value=input_bit_buffer >> (32-num_bits);
   input_bit_buffer <<= num_bits;
   input_bit_count -= num_bits;
   return(return_value);
}
/* MODIFIED Output a variable length code.
 */
output_code(FILE *output, unsigned int code)
{
   static int output_bit_count=0;
   static unsigned long output_bit_buffer=0L;

   output_bit_buffer |= (unsigned long) code << (32 -
    num_bits - output_bit_count);
   output_bit_count += num_bits;
   while (output_bit_count >= 8) {
      putc(output_bit_buffer >> 24, output);
      output_bit_buffer <<= 8;
      output_bit_count -= 8;
      bytes_out++;
       /* ^ ADDED for compression monitoring */
   }
}


Token Reconstruction
   by Tim Swenson

   Through work I have come to know one Alex Bocast.  When
I first met Alex, he mentioned working on a Pattern
Matching Algorithm, which he calls Token Reconstruction and
that he was pursuing a patent on it. About a few months ago
I saw Alex again and queried him on how his patent was
going.  After informing me that it had been granted, he
gave me some brief details on the algorithm.  Since he had
a copy of the patent with him, he provide me a copy and
challenged me to work out the code.

    Now, patent documents are not the easiest things to
read, but there were some easy to follow flow charts that
really helped.  After a few days (with many interuptions),
I had the program running and generating the same results
listed in the patent.

    The patent is titled "Method and Apparatus for
Reconstructing a Token from a Token Fragment."  The
application that the patent was applied for is this: given
an incoming word/string/token, find what word/string/token
it matches closest.  The incoming word/token may be
misspelled or have lost some characters in transmission, so
a pattern match must be made.

   The core of the algorith is a fast pattern match that
determines the percent of alikeness between two tokens, the
unreconstructed token and the vocabulary token.  The
algorithm was designed to be computationally fast.  The
algorithm appears not to be robust, but Alex has some run
some tests to show that it produces good results.
Other algorithms are more robust, but at the cost of being
computationally slow.

   Alex best described the algorithm is this way:  both
tokens are compared from the left (beginning/forward) and
right (end/rear).  A "vector" is calculated during both
matches and a calculation is made to see if forward and
rear "vectors" are pointing toward each other, as in the
vocabulary token.  The "vectors" will diverge if there are
few matches.  The results are then calculated given a
tolerance.  The tolerance defines how "tight" you want the
algorithm to be.  Tolerance usually goes up for longer
strings.

   The algorithm described in the patent can have many
variations.  It can be fast or slow, forward-biased,
rearward-biased, or balanced.  I picked the slow and
balanced variation.

   I ran some tests with this algorithm verses the
Ratcliff/Obershelp Pattern Matching algorithm publishd in
the first QHJ.  Both algorithms have been written in
SuperBasic.  My tests showed that the Bocast algorithm was
faster, less correct, and the R/O algorithm to be slower and
more correct.

   After I ran these tests, I provided Alex with the draft
of my document and my source code.  I sent him three pages
of text and he returned six pages of comments.

   Alex feels that is not valid to compare individual
results from different algorithms, but better to compare the
overall results of what token/word that they determine to be
the correct one.  It's now how one determines the result as
long as the result is correct.  Alex does have a good point.

   He prefers to describe it as:  "The measure returned by
Token Reconstruction, called the Reconstruction Index, is
the relative likelihood that a vocabulary token V is a
reconstruction of the unreconstructed token U, given a
tolerance for error T."

   So, I will not reveal my results.  Mostly based on the
above and the fact that my code needed a few changes, and
this would affect my initial results.  Instead I'll list
some results that Alex has come up with in testing his
alorighm with others.

     Technique              521-Word  1142-Word  1872-Word

Token Reconstruction           80%       78%         75%
Minimum Edit Distance          64%       62%         60%
Simple Correlation Matrix
  Dot Product                  58%       54%         52%
  Hamming Distance             69%       68%         67%
  Cosine Distance              76%       75%         74%
SVD Correlation Matrix
  Cosine Distance              81%       76%         74%

   These results are testing published in Communications of
the ACM May 1992.  Personally, I'm not familiar with the
various algorithms mentioned.

   Here are some results testing Token Reconstruction
verses the spelling checkers in some popular programs.

 Position of Intended word in list of
    suggested Replacements.
                       1     2    3    4    5    6+
Token Reconst.        78%   89%  94%  95%  95%  97%
Alis/Proximity        77%   85%  85%  87%  87%  89%
WordPerfect 5.0       64%   80%  82%  84%  85%  88%
Enable                72%   78%  80%  80%  80%  80%

   Alex had a few comments about my inital results that do
show somemore insight into the algorithm.  Alex took the
examples in my test and provided these results:

       U vs. V                   f(U,V,T)        f(V,U,T)
pencilvania vs. pennsylvania    67.5% (T=2)     44.4% (T=3)
misisipi vs. mississippi        73.9% (T=1)     9%    (T=4)
*abcdef vs. abcdef              60.6% (T=1)     89.7% (T=1)

U = unreconstructed token
V = vocabulary token
T = tolerance
  The second column of results ( f(V,U,T) ) is the the same
algorithm with the two tokens reversed.


   One thing that I did discover about Token Reconstruction
when implementing, is that it has two almost-worst cases and
one worst case.  But, in Alex's testing, he showed that by
reversing the strings ( switching U and V ) the worst case
goes away.  Alex has asked me not to reveal the worst case,
as this is one way he can determine patent infringement.
I'll leave it up the the reader to figure it out.  I must
point out that the worst case is very unlikely to come up in
real life.  It only causes the algorithm problems if any
proof is applied to it.

Now for the legal stuff:
   Mr. Bocast has assured me that anyone is free to use this
patented algorithm for purely personal, non-commercial use
without paying a license fee.  Anyone who requests to use
the algorithm for such non-commercial personal use will be
gladly granted a royalty-free exploratory license by writing
that request to Design Services Group, Inc., 1357 MacBeth
Street, McLean, VA 22102 USA.  Alex will provide mail/phone
support to these licensed users -- the algorithm is simple
to code but requires care to configure correctly for optimum
performance on any given problem.
   However, all rights are reserved for all commercial
applications.  This means that software that contains this
patented algorithm may not be distributed in any way to any
person without a licensing agreement to cover such
distrobution.  In other words, this algorithm is not in the
public domain.

   If your application is dependent on speed, then the
Bocast alogithm may fit the bill.  Just keep in mind the
caveots that come with it.  For those interested, I can
make copies of the patent, or they can be ordered from the
U.S. Patent Office for $1.50 each.  The patent number is
5,008,818.


** bocast_ssb
**   Impementation of Bocast Pattern Matching Algorithm
**     described in U.S. Patent 5,008,818
**     Method and Apparatus for Reconstructing a Token From
**       a Token Fragment
**

INPUT "Enter Check String :";s$
INPUT "Enter Vocabulary String :";v$
INPUT "Enter Tolerance :";tolerance

** Optional Code to make the shorter
** string as s$
** IF LEN(s$) > LEN(v$) THEN
**    temp$ = s$
**    s$ = v$
**    v$ = temp$
** END IF

j = 1 : jlim = LEN(s$) : jlen = jlim
k = 1 : klim = LEN(v$) : klen = klim : lastk = 0
forward_index = 0
rear_index = 0

PRINT s$;" vs. ";v$

** Front-Emphasis IDA/FCR

REPEAT loop
   IF  (j <= jlim) AND (k <= klim) THEN
      IF s$(j) = v$(k) THEN
         j = j + 1
         lastk = k
      END IF
      k = k + 1
   ELSE
      EXIT loop
   END IF
END REPEAT loop

** Front-Emphasis RDA/LCR

REPEAT loop
   IF (j <= jlim) AND (lastk < klim) THEN
      IF s$(jlim) = v$(klim) THEN
         jlim = jlim - 1
      END IF
      klim = klim - 1
   ELSE
      EXIT loop
   END IF
END REPEAT loop

IF ( jlim < LEN(s$)) OR ( j > 1 ) THEN
   IF (jlim - j) < tolerance THEN
      jj = j + jlim
      kk = k + klim
      IF jj < kk THEN
         forward_index = jj/kk
      ELSE
         forward_index = kk/jj
      END IF
   END IF
END IF

j = 1 : jlim = LEN(s$)
k = 1 : klim = LEN(v$) : lastk = klen + 1

** Rear-Emphasis

REPEAT loop
   IF (jlim > 0) AND (klim > 0) THEN
      IF s$(jlim) = v$(klim) THEN
         jlim = jlim - 1
         lastk = klim
      END IF
      klim = klim - 1
   ELSE
      EXIT loop
   END IF
END REPEAT loop

** Rear-Emphasis RDA/LCR

REPEAT loop
   IF (j <= jlim) AND (k < lastk) THEN
      IF s$(j) = v$(k) THEN
         j = j + 1
      END IF
      k = k + 1
   ELSE
      EXIT loop
   END IF
END REPEAT loop

IF (jlim < LEN(s$)) OR ( j > 1) THEN
   IF (jlim - j) < tolerance THEN
      jj = jlen*2 - j - jlim + 2
      kk = klen*2 - k - klim + 2
      IF jj < kk THEN
         rear_index = jj/kk
      ELSE
         rear_index = kk/jj
      END IF
   END IF
END IF

answer = (forward_index + rear_index)/2
PRINT "Forward Index is ";forward_index
PRINT "Rear Index is    ";rear_index
PRINT "Result is        ";answer
