*** *** *** *** *** *** *** *** *** *** *** ***

FAST! File Weeder 2.3 - Copyright (C) 2002,2003 Bumblebee

        Contents:

        1. Introduction
        2. Features
        3. Usage
        4. History
        5. Known Issues

*** *** *** *** *** *** *** *** *** *** *** ***


        1. Introduction

This is a file weeder using MD5  (RFC1321). It's intended to be fast and
low memory consuming. For this purpose it is optimized for win32 systems
(if you're very interested  about porting it to  another 32 bits systems
lemme know). If you wanna send me a  bug report or whatever, try to find
someone that can find me.

This program is freeware and is provided "as is". Any kind of implied or
express warranties are disclaimed.  This program is intended to be used,
so feel free to redistribute it in any way.

I coded  this   weeder  as  challenge  between  me  and my  good  friend
VirusBuster.   Many   thanks to  him  for his  support  and  help  while
developing this little tool.

At current point of the development,  fweeder is one of the fastest file
weeders using MD5 under win32 systems.

You can get fweeder  releases from: VS2000  distribution, VS2000 site as
independent package and/or simtel.net (search for 'file weeder').


        2. Features

o Intended to be fast and low memory consuming.

o It uses MD5 as secure hash function.

o Has common functions required for  a file weeder (create database, add
files  to  existing  database,   check for  duplicate   files  aganist a
database, manage different databases, delete duplicated files, ...).

o Statistical reports.

o It's a small and lightweight win32 console application.

o Support for long filenames  (including spaces  in path, just use "s in
the command line).

Other interesting things:

o VirusBuster and me try  to make databases  compatible (since VirWeeder
Plus version 1.2, next may follow).

o Easy to work with several databases.

o CRC32 + file size as alternative 'secure' hash.

o It's able to weed inside archives (with CRC32 + file size hash).


        3. Usage

  fweeder <command> [<switch>]

  available command list:

  -h                Little help screen

  -c <path>         Create a database for this path.

                    It will dump a report with the names of the
                    duplicated files in a quite standard format:

                    FILE_DUPLICATE is a duplicate of FILE_ORIGINAL

  -a <path>         Add files to current database.

                    Old database is saved with .old extension. A report
                    will be created with duplicated files in the same
                    way than -c command does.

  -v <path>         Look for duplicated files using current database.

                    Results will be placed into found.log.

  -i <database>     Look for new files comparing current database
                    and external database.

                    Results will be placed into found.log.

                    That command is intended to compare two different
                    databases. It won't support -k or -0 switches.

  -m <database>     Merge current database with an external database.

                    Works in the same way -a command but a database is used
                    instead of a path. It won't support -k or -0 switches.
                    
                    The databases must have the same hash function in order
                    to merge them. You can use -o command to check which
                    hash was used in a database.
                    
                    Old database is saved with .old extension.

  -o                Optimizes current database.

                    This is a must when you use databases from other
                    weeder (such as VirWeeder Plus), but not really needed
                    if you use a database created with fweeder. Indeed you'll
                    get better performance if you optimize the database with
                    great amount of files.

                    You should use it before -a, -v, -i or -m, and it won't
                    support -k or -0 switches.

                    If you notice fweeder spends too much time loading the
                    database should be a nice idea to optimize the database.

                    With this new database you will avoid fweeder's worst
                    case. Just think 1024 records needs about 10 tests in
                    worst case for one adding with an optimized database,
                    and up to 512 tests if the database is the worst
                    database possible.

                    You must use -x to optimize crc32 databases and in the
                    same way don't use -x with MD5 databases. Fweeder will
                    report such mistakes.

  available switch list:

  -d <database>     Use database as current database
  -k                Delete duplicated files
  -0                Delete zero size files
  -n                Use normal priority for fweeder thread
  -s                Detailed scan (for use with -c, -a and -v)
  -b                Beep at exit

  -x                Use CRC32 + file size instead of MD5

                    CRC32 is ok for enough people due it's fast, even
                    is less secure than MD5. I do not recommend its use
                    unless the machine you're using is very slow or
                    your collection is compressed.
                    
                    Fweeder is intended to manage both MD5 and CRC32 +
                    file size databases in the right way, so if you don't
                    remember to put the -x switch while addind files to
                    a CRC32 database, fweeder will add it / remove it for
                    you as needed.
  
                    Notice databases in CRC32 form ARE NOT compatible with
                    VirWeeder databases. Fweeder is only compatible with
                    VirWeeder Plus (using MD5).
                    
  -z                Weed inside archives: ZIP RAR ARJ (only with -x)
  
                    When this switch is used, fweeder will weed inside
                    achives. This switch will work only with CRC32 hash, so
                    -x switch is needed.
                    
                    -k, -0, and -s will work with files inside archives. The
                    archive will be deleted if it has zero size and -0 is
                    used.


fweeder  uses by  default  'current' as  database  name, so  you'll find
following files (if you're not using -d switch):

  current.db        database
  current.log       report (-c, -a, -m)
  found.log         report (-v, -i)

Examples:

  fweeder -c c:\folder

  Creates a database for c:\folder using 'current' as db name.

  fweeder -a c:\folder -d mdb -k

  Adds files found into c:\folder using 'mdb' as db name, duplicated files
  will be deleted.

  fweeder -v c:\tmp -k

  Looks for duplicated files into c:\tmp using current database, duplicated
  files will be deleted.

  fweeder -c \collection -d my_collection -k -0

  Creates a database for \collection directory in current drive (the drive
  won't be added, so you can work with relative paths). The database will
  be my_collection.db and the report will be in my_collection.log.
  Duplicated files and zero size files will be deleted.

  fweeder -i external -d my_database

  Looks for new files into external.db database using my_database.db
  for the test. Results will be placed into found.log.

  fweeder -c d:\files -d files -k -0 -x -s

  Creates a database for d:\files folder using files as database (files.db)
  and the report will be into files.log. Duplicated files and zero size
  files will be deleted. CRC32 will be used instead MD5. The directory names
  will be shown in the screen due detailed scan switch.

  fweeder -c z:\collection -d packed -k -0 -x -z

  Creates a database for z:\collection using packed.db as database name.
  The used hash will be CRC32 + file size and it will weed inside archives.
  Duplicated and zero size files will be deleted.

  Command/switch order is flexible, so you can use:

  fweeder -k -c z:\collection -0 -x -d packed -z
  
  and the result will be the same.
  
  fweeder -c \files 2> warning.txt
  
  Creates a database for \files. All warning/error messages are redirected
  into file warning.txt. That's interesting if you wanna track zero size
  files or bad archives.

IMPORTANT: Notice  '.db' is the std  ext for databases  and fweeder will
append  it to  the name  provided  using  -d  switch.  If you  wanna use
VIRWEEDP.CRC, just rename it to  VIRWEEDP.DB. Notice also that VirWeeder
Plus should  support  many  hash  functions,  be sure  the  database was
generated using MD5.


        4. History

2.3 - released 29 January 2003

o Notice you  cannot  weed files  being used by  fweeder  (databases and
logs) and that  should report  an error while  doing it  (unable to open
file blah blah), and fweeder will  continue without problems. I've added
more descriptive  error  messages (plus a  description  instead of error
code).

o Little fixes into RAR routines.

o Switch -k has been  reviewed when used with -v.  It deletes duplicated
files. ALL duplicated files, not only  those compared with the database.
That was kinda  confusing.  Now, when it's  used along  with -v, -k only
deletes duplicated files  checking the selected  database. With the rest
of the commands the switch works as usual.

o Severe bug fixed into ARJ routines. In some cases, fweeder crashes due
a memory leak while parsing ARJ  structure. The archive was not damaged,
anyway.  Thanks to  VirusBuster  for his  help while  testing  the whole
thing.

o Bug fix into -v. Safety check was enabled even without kill switch.

2.2 - released 14 January 2003

o Added a safety  check to avoid  processing (with kill)  the same file.
Now fweeder won't delete the file that is in the database.

o RAR and ARJ support added (scan, -k and -0). Now you can weed into RAR
and ARJ archives  (plus ZIP  archives, since  previous  version) with -z
switch.

RAR Volumes  are  not  supported   (moreover,  fweeder  only  checks RAR
extension for RAR archives). RAR archives tested with RAR 3.00 beta 4 (4
Mar 2002).

Even fweeder manages ARJ archives fine, this format is kinda outdated. I
do not  recommend its  use.  There are  some free  applications  able to
manage ZIP files here:  http://www.info-zip.org/pub/infozip/. Well, it's
true RAR is a better archiver, but i don't know good free tools (and ZIP
is in fact a widely used  standard). ARJ archives  tested with ARJ 3.08a
(ARJ32, 11 Oct 2000).

o Little fix into ZIP routines (with  -s switch). Now directories inside
ZIP archives are not  listed (even with -s).  That's to work in the same
way with all the archives (mainly due ARJ support).

2.1 - released 17 December 2002

o IMPORTANT: polynomial for CRC32 has been changed to be compatible with
PKWARE's CRC32. CRC32 databases from fweeder 2.0 ARE NOT COMPATIBLE with
current hash  function. Since  fweeder cannot  recognize  the difference
between old and new  hash, processing  old CRC32 databases  with fweeder
2.1 will render  into wrong and  unpredictable results.  Please, rebuild
your old CRC32 databases.

o New -z switch. With  this switch you can weed  inside archives without
decompresing  them and  without  external tools.  Only ZIP  archives are
supported in this release.

Tested with ZIP  archives created  with: WinZip 7.0,  PKZIP 2.50 (WIN32)
and PKZIP 2.04g (DOS). Others should work, try them at your own risk.

Notice span disks are not  supported. If a file  is split in two or more
archive files, fweeder will fail in case the file must be deleted.

ZIP archives must be well formed. That means it's possible some ZIP that
are ok for your compressor  of choice are not  nice for fweeder. This is
reasonably acceptable:  fweeder cannot guess if  a file is an archive or
not and fix it if 'it seems an archive'. Use the proper tool for this.

I will add support for other archive formats in next releases.

o New command -m  added. With this  command you can  merge two different
databases with the same  hash function, looking  for duplicated files in
the process.

o Better dabatase saving routines.  I've improved its speed, but i guess
you won't  notice it  unless you  play with huge  databases.  Almost all
fweeder  commands  need to  save a  database,  and due  to that  it's an
important improvement. Now fweeder is even faster!

o Little -s switch review (plus ZIP support).

o Statistical data  are not more in  the logs. Using  screen instead. If
you wanna save stats, just redirect fweeder output to a file.

o Fweeder now  manages  absolute  paths in the  databases.  That is done
mainly to improve the compatibility with VirWeeder Plus.

2.0 - released 23 September 2002

Why version 2.0? Well,  fweeder is pretty mature.  I've added almost all
the features i think a file weeder  must have. It's stable and ready for
daily development.  With this version  i've added CRC32  support to make
happy the people that like to use  that hash. That's an important change
in the tool, so i wanna distinguish it from previous versions.

o -p switch removed and -n switch added. I'm not happy with this change,
but it's needed. You know there are  other weeders out there, and if you
check two and fweeder is  slower, you won't use  it. The problem is most
users don't use -p  switch while other  weeders use  highest priority by
default. Now  fweeder uses  it by default  too, but i've  added a switch
(-n) to run  the weeder  without  steroids. With  -n it can  be, in some
cases, about 30% slower.  However it's interesting  if you wanna run the
weeder in background in a low resource machine (like mine).

o -s switch added.  It provides a  detailed scan: you  can see directory
names while scanning.

o -b switch added. Fweeder beeps when  the tool exits. Nice when you run
it in a minimized window.

o CRC32 + file size  support added:  -x switch. Now you  can use CRC32 +
size instead MD5 as hash function. Even the database format is quite the
same, fweeder  checks  stored data  to guess  which hash was  used. Many
thanks to  Roadkill,  VirusBuster,  Perikles and  VirusP for  their help
while testing the CRC32 stuff.

o Little fixes in path processing.

o Bug fix in optimizer.

1.6 - released 29 August 2002

o Little  (but  important  after  all)  bug  fixed. Thanx  to  Yello and
VirusBuster for their reports.

1.5 - released 30 Juny 2002

o Some changes  oriented to gain more  speed, i586  opcodes used so keep
safe your  1.4 version  for  386/486  computers.  Indeed check  that new
version ;)

o New switch: -p. Now you can run fweeder at highest priority using this
switch. It can be applied to any  command. The program will be more time
in the CPU, so be  careful if you're  running lots of  apps and use this
switch.

1.4 - released 12 may 2002

o New -i command.  Now you can compare  two different  databases looking
for new files. This command is for  traders: you can send a database for
the files you  want to  exchange and  the other  collector  can check it
using his database.

o Little improvements loading databases.

1.3 - released 10 may 2002

o Fixed issue #004.  Now works great  under Windows XP.  Win2k and WinNt
not tested, but let's say it works (until someone reports error). Notice
i cannot support systems i have not available for testing.

o Some little bug fixes for rare cases.

1.2 - released 4 may 2002

o Many optimizations.

o Fixed issue #003. Now  -o command is available  to optimize databases,
so you can use VirWeeder Plus databases without problems.

1.1 - released 1 may 2002

o Documentation moved to an external file (this file!).

o Many little optimizations.

o Now  elapsed  time  calculation  is more  accurate  (i don't  know why
loading database was excluded, and  the result was wrong most of times).
Thanks to VirusBuster for the tips coding the time routines :)

o Added -0 switch that deletes files with zero size.

o Fixed issue #002. Zero size files  now are managed better (the program
will show  it's  ignoring  a  file, but  won't  appear  the awful  error
message).

o Fixed issue #001. Spaces in path works great.

o Fixed issue #000. Now  adding files *should*  be faster working with a
db created by  fweeder. If  you use a database  optimized  for VirWeeder
Plus, just wait some time because  that's the worst case again (it needs
about one minute to rebuild a database  with 60000 files). Issue #003 is
still there  :/ Now i  know how to  make a  'complete tree'  to optimize
addition, but the  algorithm is pretty  complex, so  i'll add it to next
release. By now is more important  release a fix for severe issues (#000
mainly).

1.0 - released 28 apr 2002

o First release


        5. Known Issues
        
2.3 for next 2.4

There are not known issues.

*** *** *** *** *** *** *** *** *** *** *** ***

