Using the power of the PC to find new species in lists

am 12.02.2009 08:38:54 von Bill Mudry

------=_NextPart_000_0015_01C98CBB.0C44AC90
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I have spent years now trying to find as many species of plants that=20
are wood and woody. A common practice has been to manually compare=20
the woods listed in one file (usually text (.ascii, doc, comma =
delimited, etc.) ,=20
spreadsheet or database file) that lists the species I already have =
found=20
(ie. a master list to compare against) against a target list of species. =

The major task of each comparison has been to separate out new species
not yet in the master list and --- write them into a new file.=20

Since the master list now has over 6,000 species listed, you can just=20
imagine that manual comparisons took me hours and hours if not even=20
days! I know, too, that rushing through to compare two lists is one of=20
the most classical applications for which you could use a computer.=20
What takes me hours to do it should be able to whip through in=20
seconds! I have a large collection of target files, so having computer=20
power to do this would be equivalent to lighting a rocket under the =
project!

I am surprised that it has been that hard for me to find something that=20
would work when this should be quite basic coding.

There are all kinds of file comparison programs out there. The largest=20
problem I have come across using them is that they show the results
side by side as only color differences --- where I need all new records
to the master file to be written into a new file instead. I have been a =
bit
surprised how hard it has been to find such programs since this is=20
one of the tenants of basic computer training, such as how ISAM files
used to be merged and compared very frequently in mainframe days
to update company records.

Any suggestions on code in PHP or even a finished application=20
that would do this? If I could get the code, that would be even better=20
so later I can tweak it to fit what I do even more. A simple comparison=20
would be just fine for now. Later I hope to improve on it in number of=20
ways such as:
- Eliminating duplicates before comparison
- Being able to do comparisons on a variety of file formats
- Facilities to massage target files into a common set of
fields before trying to compare them. The target files can
come in all kinds of file formats and layouts that need
massaging before comparisons would mean anything.
- Cleaning out control characters and other garbage from a
target file before comparisons.

--- but for now and for some time to come, all this extra is a lot of
coding work that can wait ;-) . I would be so elated just to be able
to run though such file comparisons in record breaking speeds=20
compared to my manual methods.

P.S. -- I am just starting to gain some proficiency in using PHP.
Studying such code would also be a good learning exercise.

Always in thanks for any help given,

Bill Mudry
Mississauga, Ontario

------=_NextPart_000_0015_01C98CBB.0C44AC90--