wget for downloading scientific papers

wget for downloading scientific papers

am 01.11.2007 15:32:59 von jonas

Hi,

WHAT I WANT:
I would like to download scientific papers by using the program
"wget".

THE PROBLEM:
Scientific papers are usually protected via usernames and password.
>From the university I can download them, because the IP from the
university is registrated. When I am at home I can't download them.
Some how I have to tell wget that it shall tunnel to the university
and then download the paper.

The following is NOT what I want:
1) ssh-connection to the university
2) download the paper onto the computer at the university
3) copy the paper to my local hard drive via scp

I would like to do everything from my local computer using VNC (?) or
a proxy (?) via a script.

Unfortunetly I don't know why "http://www.quantenblog.net/physics/read-
articles" is not working for my.

e.g.:
How can I download the pdf of this paper "http://prola.aps.org/
abstract/PRL/v83/i13/p2541_1"
The pdf-file is located at "http://prola.aps.org/pdf/PRL/v83/i13/
p2541_1"


Cheers.
Jonas

Re: wget for downloading scientific papers

am 01.11.2007 15:55:48 von Bill Marcum

On 2007-11-01, Jonas wrote:
> Hi,
>
> WHAT I WANT:
> I would like to download scientific papers by using the program
> "wget".
>
> THE PROBLEM:
> Scientific papers are usually protected via usernames and password.
>>From the university I can download them, because the IP from the
> university is registrated. When I am at home I can't download them.
> Some how I have to tell wget that it shall tunnel to the university
> and then download the paper.
>
You can't get the papers with http://user:password@hostname/... ?
You are looking for a way to automate getting these files; can you get
them manually using Firefox or another browser?

Re: wget for downloading scientific papers

am 01.11.2007 16:30:08 von jellybean stonerfish

On Thu, 01 Nov 2007 07:32:59 -0700, Jonas wrote:

> Hi,
>
> WHAT I WANT:
> I would like to download scientific papers by using the program
> "wget".
>
> THE PROBLEM:
> Scientific papers are usually protected via usernames and password.

Have you read the wget manpage? 'man wget'

From the wget manpage


--user=user
--password=password
Specify the username user and password password for both FTP and
HTTP file retrieval. These parameters can be overridden using the
--ftp-user and --ftp-password options for FTP connections and the
--http-user and --http-password options for HTTP connections.


stonerfish

Re: wget for downloading scientific papers

am 01.11.2007 17:32:09 von jonas

On 1 Nov., 15:55, Bill Marcum wrote:
> On 2007-11-01, Jonas wrote:> Hi,
>
> > WHAT I WANT:
> > I would like to download scientific papers by using the program
> > "wget".
>
> > THE PROBLEM:
> > Scientific papers are usually protected via usernames and password.
> >>From the university I can download them, because the IP from the
> > university is registrated. When I am at home I can't download them.
> > Some how I have to tell wget that it shall tunnel to the university
> > and then download the paper.
>
> You can't get the papers withhttp://user:password@hostname/... ?
> You are looking for a way to automate getting these files; can you get
> them manually using Firefox or another browser?

Thank you for the suggestion, but this will not work out, because I
don't know the password. When I use my account at the university I
don't have to enter a username or a password, because the university
IP is registered and I am automatically forwarded. So what I have to
do is to change my IP-address when I am at home, to the one which I
have when I am at the university.

This could be done with a proxy-server and a browser (like firefox),
but that is not the solution to my problem:
I want to use wget, because I wrote a script, which does not only
download the pdf-file, but many other useful information as well
(BibTex, I am allowed to enter doi-number, ...). So I would like to
use this script to get the complete information. Therefore I need to
find a way to use a tunnel through the university.

Jonas

Re: wget for downloading scientific papers

am 01.11.2007 17:46:15 von jonas

On 1 Nov., 16:30, jellybean stonerfish
wrote:
> On Thu, 01 Nov 2007 07:32:59 -0700, Jonas wrote:
> > Hi,
>
> > WHAT I WANT:
> > I would like to download scientific papers by using the program
> > "wget".
>
> > THE PROBLEM:
> > Scientific papers are usually protected via usernames and password.
>
> Have you read the wget manpage? 'man wget'
>
> From the wget manpage
>
> --user=user
> --password=password
> Specify the username user and password password for both FTP and
> HTTP file retrieval. These parameters can be overridden using the
> --ftp-user and --ftp-password options for FTP connections and the
> --http-user and --http-password options for HTTP connections.
>
> stonerfish

I don't know the username and the password. When I am at the
university I don't have to enter them, because the IP-address is
registered and therefore I am automatically forwared/allowed to
download the file.

I could use a proxy and a browser, but that is not the solution I am
looking for. I would like to use wget, because I wrote a script, which
downloads not only the pdf-file, but lots of other information as
well. The script is pretty handy and therefore I would like to tunnel
through my university account using an ssh-connection (?).

Jonas

PS: Yes I read the man-page of wget and found as well the option "--
proxy-server=on". But I could not download the file -- might be that I
am not using the right options. So it would be very nice of you, if
you could write the command which is working for you.

Re: wget for downloading scientific papers

am 01.11.2007 18:43:45 von Jan Schampera

Jonas wrote:

> I don't know the username and the password. When I am at the
> university I don't have to enter them, because the IP-address is
> registered and therefore I am automatically forwared/allowed to
> download the file.

I'd say it's time to contact your admins.

J.

Re: wget for downloading scientific papers..possible solution

am 01.11.2007 18:51:09 von jellybean stonerfish

On Thu, 01 Nov 2007 09:46:15 -0700, Jonas wrote:


>
> PS: Yes I read the man-page of wget and found as well the option "--
> proxy-server=on". But I could not download the file -- might be that I
> am not using the right options.

I hope I didn't offend. I assumed you didn't have a clue. Very rude of
me. Now that I know you have some skill I have a better solution at
bottom of post.

> So it would be very nice of you, if
> you could write the command which is working for you.

I didn't get it to work. I thought you had a password.

Maybe you could automate it from home with a script, and still use your
schools computer? You don't need to save the file on your schools
computer, but pass it straight through? Something like this may work..

ssh username@school wget --output-document=- \
http://prola.aps.org/pdf/PRL/v83/i13/p2541_1 > filename.pdf

This will ssh you into school, and run the given command in the sshell.
--output-document=- tells wget to save the file to standard output.
You direct your standard out to filename.pdf.


This could be put in a script.....


stonerfish

Re: wget for downloading scientific papers

am 01.11.2007 19:00:28 von Bill Marcum

On 2007-11-01, Jonas wrote:
> On 1 Nov., 15:55, Bill Marcum wrote:
>> On 2007-11-01, Jonas wrote:> Hi,
>>
>> > WHAT I WANT:
>> > I would like to download scientific papers by using the program
>> > "wget".
>>
>> > THE PROBLEM:
>> > Scientific papers are usually protected via usernames and password.
>> >>From the university I can download them, because the IP from the
>> > university is registrated. When I am at home I can't download them.
>> > Some how I have to tell wget that it shall tunnel to the university
>> > and then download the paper.
>>
>> You can't get the papers withhttp://user:password@hostname/... ?
>> You are looking for a way to automate getting these files; can you get
>> them manually using Firefox or another browser?
>
> Thank you for the suggestion, but this will not work out, because I
> don't know the password. When I use my account at the university I
> don't have to enter a username or a password, because the university
> IP is registered and I am automatically forwarded. So what I have to
> do is to change my IP-address when I am at home, to the one which I
> have when I am at the university.
>
> This could be done with a proxy-server and a browser (like firefox),
> but that is not the solution to my problem:
> I want to use wget, because I wrote a script, which does not only
> download the pdf-file, but many other useful information as well
> (BibTex, I am allowed to enter doi-number, ...). So I would like to
> use this script to get the complete information. Therefore I need to
> find a way to use a tunnel through the university.
>
I'm not sure this is possible unless you can arrange for the university
to give you a password or register your home IP, or use the solution
that you rejected in the original post.

Re: wget for downloading scientific papers..possible solution

am 01.11.2007 21:00:08 von jellybean stonerfish

Or make a script at school, that when passed the document title, grabs the
various files and makes them into a tarball using the "--to-stout" option
to tar. Using pipes, and the "--output-document=-" option to wget, you
could do this without saving any files on your school computer. Then call
this script, and pipe through tar to expand

ssh username@school graberscript docname | tar -vx


stonerfish

Re: wget for downloading scientific papers

am 01.11.2007 21:18:52 von Barry Margolin

In article <1193934729.634075.176110@22g2000hsm.googlegroups.com>,
Jonas wrote:

> This could be done with a proxy-server and a browser (like firefox),
> but that is not the solution to my problem:
> I want to use wget, because I wrote a script, which does not only
> download the pdf-file, but many other useful information as well
> (BibTex, I am allowed to enter doi-number, ...). So I would like to
> use this script to get the complete information. Therefore I need to
> find a way to use a tunnel through the university.

And why can't you do this using a proxy-server? Set the HTTP_PROXY
environment variable and wget will use it.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

Re: wget for downloading scientific papers

am 02.11.2007 03:49:47 von cfajohnson

On 2007-11-01, Jonas wrote:
>
> WHAT I WANT:
> I would like to download scientific papers by using the program
> "wget".
>
> THE PROBLEM:
> Scientific papers are usually protected via usernames and password.
>>From the university I can download them, because the IP from the
> university is registrated. When I am at home I can't download them.
> Some how I have to tell wget that it shall tunnel to the university
> and then download the paper.
>
> The following is NOT what I want:
> 1) ssh-connection to the university
> 2) download the paper onto the computer at the university
> 3) copy the paper to my local hard drive via scp
>
> I would like to do everything from my local computer using VNC (?) or
> a proxy (?) via a script.
>
> Unfortunetly I don't know why "http://www.quantenblog.net/physics/read-
> articles" is not working for my.
>
> e.g.:
> How can I download the pdf of this paper "http://prola.aps.org/
> abstract/PRL/v83/i13/p2541_1"
> The pdf-file is located at "http://prola.aps.org/pdf/PRL/v83/i13/
> p2541_1"

ssh YourUniversity "wget -O- http://prola.aps.org/pdf/PRL/v83/i13/p2541_1" |
cat > p2541_1.pdf

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: wget for downloading scientific papers

am 02.11.2007 06:06:49 von jellybean stonerfish

On Thu, 01 Nov 2007 22:49:47 -0400, Chris F.A. Johnson wrote:

> ssh YourUniversity "wget -O- http://prola.aps.org/pdf/PRL/v83/i13/p2541_1" |
> cat > p2541_1.pdf

What does the cat do?

stonerfish

Re: wget for downloading scientific papers

am 02.11.2007 21:03:20 von cfajohnson

On 2007-11-02, jellybean stonerfish wrote:
>
> On Thu, 01 Nov 2007 22:49:47 -0400, Chris F.A. Johnson wrote:
>
>> ssh YourUniversity "wget -O- http://prola.aps.org/pdf/PRL/v83/i13/p2541_1" |
>> cat > p2541_1.pdf
>
> What does the cat do?

Nothing that couldn't be done without it.

ssh YourUniversity "wget -O- http://prola.aps.org/pdf/PRL/v83/i13/p2541_1" > p2541_1.pdf

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: wget for downloading scientific papers

am 03.11.2007 18:53:29 von Allodoxaphobia

On Fri, 02 Nov 2007 05:06:49 GMT, jellybean stonerfish wrote:
> On Thu, 01 Nov 2007 22:49:47 -0400, Chris F.A. Johnson wrote:
>
>> ssh YourUniversity "wget -O- http://prola.aps.org/pdf/PRL/v83/i13/p2541_1" |
>> cat > p2541_1.pdf
>
> What does the cat do?

It provokes nominations for the UUoC Award.