
				BASE64 GUIDE FOR UUENCODING TYPES
                             _______________________________________


	This brief guide to working with Base64 encoded images found on	USENET newsgroups,
	as well as the free decoder that should accompany it, were written and posted in
	response to the many "What the @#$%* is Base64" postings one sees in newsgroups
	that have a lot of binaries.  While UUEncoding is familiar to most DOS\Windows 
        users on the Net, Base64 encoding is a mystery of sorts outside of the UNIX world. 
        Hopefully, this document will help with this problem. There is also a lack of 
	intelligent Windows based software for Base64 encoding and decoding. If you got this 
	file from an archive called SJHB64.ZIP, there's a free decoder in the archive that
	tries to address that need.



	WHAT THE !@#$%* IS BASE64!?
	___________________________

	Like UUencoding, Base 64 is a method of taking binary data like image, audio, video,  
	Postscript or executable files and encoding it into 7 bit ASCII, a text format. Once
	the former binary is encoded, you should be able to load it in a text editor like
	Windows Notepad and view the ASCII encoded data. Encoded binaries can be sent through 
	Internet Mail gateways by mail and news reader programs which cannot read or forward
	normal binary data. The recipient has to find a way to convert(decode) the ASCII data 
        back to it's original binary format.



	WHY DOESN'T EVERYBODY USE UUENCODING?
	_____________________________________

	You've often run across UUEncoded files that will not decode properly. While this is
	often due to buggy encoding software or bad phone lines that hack files in transmission,
	the way UNIX Mail Gateways work is also a problem. Many gateways strip non-alphanumeric
	characters from mail packets(like commas and semi-colons for example), making a UUencoded
	file unusable. Your decoder will OFTEN decode it normally, but you'll find the resulting
	binary file, be it JPEG, AVI, etc. is corrupted. Since Base64 uses only alphanumeric
	characters(upper and lowercase a-z, the numbers 0-9 and /), this problem is eliminated.
	This is one reason why Base64 is among the most popular method for encoding binaries on 
	UNIX systems.

	Base64 is also the encoding method of choice for embedding binaries in UNIX Mail files.
	MIME(Multipurpose Internet Mail Extensions) is the most popular UNIX mail format, and 
	the MIME specification supports embedded Base64 encoded binaries. One of MIME's most
	powerful features is it's support for multiple embedded binaries Like UUE, MIME files 
	are 7 or 8 bit ASCII, so you can embed UUEncoded binaries in a MIME file as 7 bit ASCII
	data. 
        If you load a MIME file into a text editor, you'll see something like the fragment below.
        Note that the first five lines are a typical USENET header tacked on to the beginning 
        of the MIME file by the uploader's News reader:



	Newsgroups: comp.binaries.images
	From: Somebody <somebody@some.com>
	Subject: I can't decode this image..
	Message-ID: <3157F28F.4E1@some.com>
	Date: Tue, 26 Mar 1996 13:35:11 GMT

	This is a multi-part message in MIME format.

	--------------5213E5135CD
        MIME-Version: 1.0
        Content-Description: "Is this image corrupt?"
	Content-Type: text/plain; charset=us-ascii
	Content-Transfer-Encoding: 7bit

	Hi..I'm a programmer working on a jpeg decoder and I can't get it to
	decode this particular image. Is the image corrupt or is it my program??

        Thanks,

	Scott

	--------------5213E5135CD
	MIME-Version: 1.0
	Content-Type: image/jpeg
	Content-Description: "Base64 encode of hacked.jpg"
        Content-Transfer-Encoding: base64
	Content-Disposition: inline; filename="HACKED.JPG"

	/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRof
	Hh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwh
        [More Base64 data follows] 


	The two lines above starting with "/9j/" are the start of the Base64 encoded binary, 
	in this case, a .JPG image file. The lines that start with "---------" are boundaries
	which seperate the header from the text message and in turn seperate the text message
	from the Base64 encoded binary. The binary and text sections each have a subheader
	that describes the type of data the mail reader is about to encounter. According to
        the MIME specification, if an embedded binary is present, the keyword INLINE and the
        name of the embedded file should appear in the CONTENT-DISPOSITION field. Intrestingly,
        the MIME specification supports multiple messages and binaries within the same file. 
        Different types of binaries such as a .WAV audio file and an .MPEG video file can also 
        be combined in the same file. One common problem Windows and DOS users run into is 
	multiple binaries-most of the shareware Windows based decoders that claim to support 
	Base64 are unable to extract more than one binary from a message file containing multiple 
	binaries. The only way you'll know if a MIME message file has more binaries than your 	
	decoder can handle is by loading the file into a text editor and manually finding 
	each header. If your decoder can't handle multiple binaries, your only choice is to cut 
	and paste the binaries into separate files and then try to decode them.


	Even though millions of Windows and DOS users are using the Internet, it's important 
	to remember that the servers, hubs and gateways that actually ARE the net are still
	predominately UNIX sites which support the MIME standard which in turn uses Base64
	for binary encoding. It's safe to assume for the time being that this encoding method
	won't go away, so the remainder of this document will focus on how to work with Base64
	encoded binaries.


	
	TYPES OF FILES CONTAINING BASE64 ENCODED BINARIES
	_________________________________________________

	Base64 encoded binaries are generally found in three forms on Usenet:

	a) Embedded in a MIME or other type of Email file as shown above. These are
	   usually created in an EMail program and the user posts the resulting file
	   to a Usenet newsgroup. Most UUEncoders which support Base64 will include
           some or all of the header fields found in MIME files.

	b) Appended to a short USENET header. This type of file is usually comes
	   from a News Reader program. Unlike MIME files, the header does not always
	   conform to a specified format. Here's an example:


	   From: Somebody <somebody@some.com>
   	   Newsgroups: comp.binaries.images
	   Subject: I can't decode this image..
	   Date: Tue, 26 Mar 1996 13:35:11 GMT
	   Message-ID: <3157F28F.4E1@some.com>


           R0lGODdhrgCcAPcAAA4EBBMIBxgMBBMIDh0IBx0QCygMBy0MDDwKCUINCVEQB1sMB0IUClsQ
           DkcYC0wmFlsYDmIZCF8UEWYYEnYSCW4cDnkbDIgZC18lEXIlEIMlC44hFXIpFI0pEXsuF4JE
           [More Base64 data follows]

	   As in our sample MIME file, the two lines above starting with "R0LGOD" are the 
	   start of the Base64 encoded binary, in this case, a .GIF image file. Notice that
	   there is no filename or file type specified in the header. This is often the
	   case with files created by News readers- most of them handle all data as if it's
	   a text message and have no awareness of encoded binaries. Most users who post
	   Newsreader files containing binaries specify a filename in the description you
	   see when you browse newsgroup postings. This however, is not much help to a 
           program that has to decode this stuff. Most Base64 decoders will take a file like
	   this and extract and decode a binary and give it a generic name(like UNKNOWN), 
           with no extension. Unless the user remembers the file type from the Newsgroup 
	   description, he or she won't even know what type of program to load the decoded 
           binary into.

	c) As raw Base64 data. Sometimes you'll encounter a newsgroup posting that looks
           like this:  


  	   UklGRoI0AABXQVZFZm10IBAAAAABAAEA8FUAAPBVAAABAAgAZGF0YV00AAB3dXNxb25ubm5vcHFz
           dXd5e32AgoWHioyOkJGRkpOUlZaWl5eYmJiYmJiZmZqampqZmZiXlpSTkY6MiYaDf3x5dnNxb21r
           [More Base64 data follows]
	   
	   This is simply encoded Base64 data with no header, in this case a .WAV file.
	   The decoding program has no idea what type of binary this is and if the person who 
	   posted it didn't specify the filename in the Newsgroup description, neither do
	   you. Sometimes users create these themselves by copying the raw data and pasting
           it into their Newsreader for posting with no header information, but most encoders
	   will allow you to create Base64 binaries without a header.  Files like this with 
	   incomplete or missing header drive end users crazy as they are often left to guess 
	   what type of binary they just decoded. Base64 encoded binaries are usually one of the 
	   eight binary types supported in the original MIME specification. They are  :

                  1) Application : Octet-Stream (usually executables or archive files)
	          2) Application : Postscript (formatted text and EPS files)
	          3) Image : JPEG (.JPG images)
		  4) Image : GIF  (.GIF images)
	          5) Image : X-BMP (other rasterized images)
                  6) Video : MPEG (compressed video data)
                  7) Audio : X-WAV (.WAV sound files)
                  8) Audio : X-VOC (.VOC sound files)
                  (This list does not include MAC-specifc formats and some more 
		   exotic data types added when the MIME spec was recently updated.)

	   God only knows why, but many encoders have an option to encode a binary file without 
	   header data. If for some reason you use this option, you'll wind up with a raw Base64
	   file like the one above. Usually, a program courageous enough to decode this will write 
	   a file with a generic name and no extension and it's up to you to figure out what it is.
           A feature lacking in the Windows based decoders is the ability to "recognize" the common 
	   binary types in encoded form when there's no header information available. If you have 
	   an "unknown" on your hard drive, load the original Base64 file into a text editor, take 
	   a look at the list of encoded binary signatures below and see if you can match your 
	   mystery file with a signature. If you can, all you have to do is rename the decoded file 
	   in order to(hopefully) use it.

	
	DECODING BASE64 ENCODED DATA 
	____________________________

	To do this, obviously you need a decoder. If you're a UNIX type, you're in luck-
	there's a number of quality decoders available, many of them free. If you're a Windows 
	user, your Base64 choices are limited to DOS command line decoders that are awkward to 
	use or Windows based UUDecoders that don't support Base64 very well, unless you want to
	spend the money for a commercial MIME compliant mail program. Eudora and Pegasus are
	the best known Windows based mailers. 
	

	WHAT IF THE DECODED FILE DOESN'T WORK?
	______________________________________

	There are a number of reasons this could happen. The encoded data could have
	been corrupt to start with, or could have been hacked by one of the many buggy
	encoders out there or could have been corrupted by a bad phone line during
	up or downloading. If the Base64 data itself is corrupted, most decoders will show 
	an error message of some kind. If however, the source data(the original binary file)
	was corrupt, most decoders have no way of knowing this and will merrily decode hacked 
	data which you won't be able to use. Some commercial mailers run CRC(Cyclical Redundancy
	Check)Tests on decoded binaries to verify their integrity. Even this method is not
	foolproof, especially with compressed data like JPEG or GIF images or MPEG video.

	Another common problem is the way in which some Newsreaders and EMail programs
	interpet the data they're handling. Some programs will convert 7 bit encoded ASCII
	data to 8 bit or strip carriage return and\or linefeed characters, making it harder
	for your decoder to read the data properly(the EMail software used by the major online
        services often does this). If you load a file like this into a text editor,
	you'll see lines that are hundreds or thousands of characters long(Base64 encoded data
	is usually arranged in lines of  72 characters each). Sometimes you can recover a file 
	like this by loading it into a binary mode text editor (like the Windows95 Wordpad app) 
	and saving it, which restores the stripped control characters. Many programmer's text 
	editors also have this capability to load text files in Binary mode and it's worth a shot 
	to try this  when you have a file that decoded without errors but simply doesn't work. 
	(Note: 	While Notepad always saves files as ASCII text, you must specify Save As Text 
	when using Wordpad. As a rule it's a good idea to stay away from Word Processors since 
	they often strip Carriage Returns and insert control characters that can throw a decoder 
	off. Notepad also has a limitation that makes it less than ideal for editing Base64 files-
	it can't load a file larger than 40K or so, and many encoded binaries are larger than that)



	USING AN EDITOR WITH PROBLEM BASE64 FILES
	_________________________________________

	One common situation where a text editor comes in handy is when you've decoded a file and 
        don't know what type of binary it contains. If, for whatever reason, a decoder can't 
        extract a filename from an encoded file, it will write a file with a default name and 
        no extension. Now that you've decoded this mystery file, how do you figure out what it 
        is? Sometimes the only way is to load the Base64 encoded file in a text editor. Once 
        it's up there on your screen, look for a header-MIME compliant files will always have 
        a line in the header that identifies the file type, such as : 

             Content-Type: image/jpeg 
	
	If you find this, you know your mystery file is a JPEG file, and all you have to do is 
	add the extension .JPG to the decoded binary. MIME compliant headers also list the full 
	filename of the encoded binary-look for the line: 

             Content-Disposition: inline; filename=

	or   Content-Disposition: attachment; filename=

	If the encoded file has an incomplete or non-existent header, you'll need to do some 
	detective work. Every type of Binary file begins with a signature unique to that file
	type. Although this signature will be different after the file is Base64 encoded, it
	is still unique to that file type. You must find the start of the Base64 encoded data
	and check the beginning of the first line. Chances are, you'll find one of the following
	signatures:


		FILE TYPE                    ENCODED SIGNATURE
		_________                    _________________

		JPEG                         /9j/4AAQSkZJRgABAQ

		GIF                          R0lGODdh

		BMP                          QK

		WAV                          UklGR

		MPEG                         AAABsxQAyBH//

		EXE                          TV

		ZIP                          UEsDB

                EPS                          JSFQUy1BZG

       

	This list is far from complete, but these are among the most common types.
	If you know where to find the start of the encoded data and can identify
	one of these signatures, then you've solved the mystery.
	                  
	Another common situation where a good text editor can help is one that I call 
	"Mailer Syndrome". Many EMail and Newsreader programs default to a "quoted" 
	mode, where the program attaches a greater than[ > ]sign at the beginning of each 
        line of an existing USENET thread it's are responding to. If the thread happens
	to contain base64 encoded data , some poor soul will eventually download a file
	that won't decode properly. If you load one of these files into a text editor, 
	you'll see something like this:
	
           >/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsL
           >DBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/
           >2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy
	
	Remember, Base64 encoded data should consist only of the upper and lowercase
	characters a-z, numbers 0-9, +, -, and the / character. The only exception to this 
	is at the end of the encoded data where sometimes you'll see one or two equal[==]
	signs , which your decoder should ignore. If you see any other characters, chances
	are the file that won't decode correctly.	


	Thanks for your time, and hopefully we'll all see less "What the @#$%* is Base64" posts.

	

	Scott Hanrahan
	SJHDesign, Inc.

	EMail : SJHDES@ibm.com   


	
