MD5 checksums from downloaded pdfs to prevent duplication

am 21.04.2008 03:53:28 von Andre Steinert

I end up downloading duplicate (or more!) copies of journal papers (pdf)
since sometimes one forgets that one already has a copy. Annoying. I was
trying to think of a way to prevent this. My existing bibliographic s/w
(Endnote) is not too good at helping me out.

Would MD5 checksums be a good workaround? Every time I do a download:

create a MD5 checksum
if absent in log:
download pdf
add MD5 checksum to log
else:
blow up.

A lot of this work is on a WinXP machine but since I have the drive
Samba-mapped to my RHEL box I guess I can generate / check the MD5-sums
from the much-better Linux command line.

Any opinions / caveats? ....or "stupid idea"?

--
Rahul

Re: MD5 checksums from downloaded pdfs to prevent duplication

am 21.04.2008 09:26:54 von PK

On Monday 21 April 2008 03:53, Rahul wrote:

> Would MD5 checksums be a good workaround? Every time I do a download:
>
> create a MD5 checksum
> if absent in log:
> download pdf
> add MD5 checksum to log
> else:
> blow up.
>
> A lot of this work is on a WinXP machine but since I have the drive
> Samba-mapped to my RHEL box I guess I can generate / check the MD5-sums
> from the much-better Linux command line.
>
> Any opinions / caveats? ....or "stupid idea"?

If you download stuff from the command line, then I see no major problems in
doing what you want (apart from the obvious fact that, to calculate the md5
of a file, you still have to download it...but you can do that in a
temporary directory).
How were you thinking to implement that?

And, this is way OT, but I suggest you try zotero instead of endnote.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.