Capturing output (domain name) of wget after http redirect

Capturing output (domain name) of wget after http redirect

am 26.09.2007 14:17:20 von JamesG

Hi,

I've been battling with this problem for hours.

Basically, I have a long list of domain names, for each domain, i want
to get the "final" url.

So i can use wget -S --spider http://url and i get a long list of HTTP
requests, eventually settling on a 200 OK which is the final one.

I need to caputure the domain of this url and save it to a file.

Tried using grep but I couldn't get it to work. Can anyone help?

Thanks

Re: Capturing output (domain name) of wget after http redirect

am 26.09.2007 14:22:37 von Joachim Schmitz

"JamesG" schrieb im Newsbeitrag
news:1190809040.834660.122030@d55g2000hsg.googlegroups.com.. .
> Hi,
>
> I've been battling with this problem for hours.
>
> Basically, I have a long list of domain names, for each domain, i want
> to get the "final" url.
>
> So i can use wget -S --spider http://url and i get a long list of HTTP
> requests, eventually settling on a 200 OK which is the final one.
>
> I need to caputure the domain of this url and save it to a file.
>
> Tried using grep but I couldn't get it to work. Can anyone help?
wget writes it's output to stderr, so try
wget ... 2>&1 | grep ...

Bye, Jojo

Re: Capturing output (domain name) of wget after http redirect

am 26.09.2007 15:10:02 von JamesG

On Sep 26, 1:22 pm, "Joachim Schmitz"
wrote:
> "JamesG" schrieb im Newsbeitragnews:1190809040.834660.122030@d55g2000hsg.googleg roups.com...> Hi,
>
> > I've been battling with this problem for hours.
>
> > Basically, I have a long list of domain names, for each domain, i want
> > to get the "final" url.
>
> > So i can use wget -S --spiderhttp://urland i get a long list of HTTP
> > requests, eventually settling on a 200 OK which is the final one.
>
> > I need to caputure the domain of this url and save it to a file.
>
> > Tried using grep but I couldn't get it to work. Can anyone help?
>
> wget writes it's output to stderr, so try
> wget ... 2>&1 | grep ...
>
> Bye, Jojo

Thanks Jojo.

There are now 2 remaining problems.
Because there are multiple hops (redirects) I get a list of several
entries.

For example:
Connecting to www.perfiliate.com|62.190.8.171|:80... connected.
Connecting to ad.uk.doubleclick.net|209.62.178.57|:80... connected.
Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
Connecting to fls.doubleclick.net|216.73.87.48|:80... connected.
Connecting to www.theaa.com|213.225.133.206|:80... connected.

I need it to only select the last entry.
The final problem is seperating www.theaa.com out. I think I can do
this with "sed".

Any ideas on the first problem? Thanks

Re: Capturing output (domain name) of wget after http redirect

am 26.09.2007 15:32:37 von Icarus Sparry

On Wed, 26 Sep 2007 06:10:02 -0700, JamesG wrote:

> On Sep 26, 1:22 pm, "Joachim Schmitz"
> wrote:
>> "JamesG" schrieb im
>>
Newsbeitragnews:1190809040.834660.122030@d55g2000hsg.googleg roups.com...>
>> Hi,
>>
>> > I've been battling with this problem for hours.
>>
>> > Basically, I have a long list of domain names, for each domain, i
>> > want to get the "final" url.
>>
>> > So i can use wget -S --spiderhttp://urland i get a long list of HTTP
>> > requests, eventually settling on a 200 OK which is the final one.
>>
>> > I need to caputure the domain of this url and save it to a file.
>>
>> > Tried using grep but I couldn't get it to work. Can anyone help?
>>
>> wget writes it's output to stderr, so try wget ... 2>&1 | grep ...
>>
>> Bye, Jojo
>
> Thanks Jojo.
>
> There are now 2 remaining problems.
> Because there are multiple hops (redirects) I get a list of several
> entries.
>
> For example:
> Connecting to www.perfiliate.com|62.190.8.171|:80... connected.
> Connecting to ad.uk.doubleclick.net|209.62.178.57|:80... connected.
> Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> Connecting to fls.doubleclick.net|216.73.87.48|:80... connected.
> Connecting to www.theaa.com|213.225.133.206|:80... connected.
>
> I need it to only select the last entry. The final problem is seperating
> www.theaa.com out. I think I can do this with "sed".
>
> Any ideas on the first problem? Thanks

"tail -1", or if you are going to be using sed anyhow

sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'

Re: Capturing output (domain name) of wget after http redirect

am 26.09.2007 15:51:51 von JamesG

On Sep 26, 2:32 pm, Icarus Sparry wrote:
> On Wed, 26 Sep 2007 06:10:02 -0700, JamesG wrote:
> > On Sep 26, 1:22 pm, "Joachim Schmitz"
> > wrote:
> >> "JamesG" schrieb im
>
> Newsbeitragnews:1190809040.834660.122030@d55g2000hsg.googleg roups.com...>
>
>
>
> >> Hi,
>
> >> > I've been battling with this problem for hours.
>
> >> > Basically, I have a long list of domain names, for each domain, i
> >> > want to get the "final" url.
>
> >> > So i can use wget -S --spiderhttp://urlandi get a long list of HTTP
> >> > requests, eventually settling on a 200 OK which is the final one.
>
> >> > I need to caputure the domain of this url and save it to a file.
>
> >> > Tried using grep but I couldn't get it to work. Can anyone help?
>
> >> wget writes it's output to stderr, so try wget ... 2>&1 | grep ...
>
> >> Bye, Jojo
>
> > Thanks Jojo.
>
> > There are now 2 remaining problems.
> > Because there are multiple hops (redirects) I get a list of several
> > entries.
>
> > For example:
> > Connecting towww.perfiliate.com|62.190.8.171|:80... connected.
> > Connecting to ad.uk.doubleclick.net|209.62.178.57|:80... connected.
> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> > Connecting to fls.doubleclick.net|216.73.87.48|:80... connected.
> > Connecting towww.theaa.com|213.225.133.206|:80... connected.
>
> > I need it to only select the last entry. The final problem is seperating
> >www.theaa.comout. I think I can do this with "sed".
>
> > Any ideas on the first problem? Thanks
>
> "tail -1", or if you are going to be using sed anyhow
>
> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'

Thanks for the help.

Using that sed command doesn't seem to do the trick :(

I'm running: wget -S --spider http://www.google.com 2>&1 | sed -n '$s/
Connecting to \([^|]*\)|.*/\1/p'
Am I missing something stupid?

Thanks

Re: Capturing output (domain name) of wget after http redirect

am 27.09.2007 07:25:08 von Icarus Sparry

On Wed, 26 Sep 2007 06:51:51 -0700, JamesG wrote:

> On Sep 26, 2:32 pm, Icarus Sparry wrote:

>> > There are now 2 remaining problems.
>> > Because there are multiple hops (redirects) I get a list of several
>> > entries.
>>
>> > For example:
>> > Connecting towww.perfiliate.com|62.190.8.171|:80... connected.
>> > Connecting to ad.uk.doubleclick.net|209.62.178.57|:80... connected.
>> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
>> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
>> > Connecting to fls.doubleclick.net|216.73.87.48|:80... connected.
>> > Connecting towww.theaa.com|213.225.133.206|:80... connected.
>>
>> > I need it to only select the last entry. The final problem is
>> > seperating
>> >www.theaa.comout. I think I can do this with "sed".
>>
>> > Any ideas on the first problem? Thanks
>>
>> "tail -1", or if you are going to be using sed anyhow
>>
>> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'
>
> Thanks for the help.
>
> Using that sed command doesn't seem to do the trick :(
>
> I'm running: wget -S --spider http://www.google.com 2>&1 | sed -n '$s/
> Connecting to \([^|]*\)|.*/\1/p'
> Am I missing something stupid?
>
> Thanks

I was rather expecting you to keep the grep in the command

wget -S --spider http://www.google.com 2>&1 >/dev/null |
grep '^Connecting to' |
sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'

Re: Capturing output (domain name) of wget after http redirect

am 28.09.2007 12:32:28 von JamesG

On Sep 27, 6:25 am, Icarus Sparry wrote:
> On Wed, 26 Sep 2007 06:51:51 -0700, JamesG wrote:
> > On Sep 26, 2:32 pm, Icarus Sparry wrote:
> >> > There are now 2 remaining problems.
> >> > Because there are multiple hops (redirects) I get a list of several
> >> > entries.
>
> >> > For example:
> >> > Connecting towww.perfiliate.com|62.190.8.171|:80... connected.
> >> > Connecting to ad.uk.doubleclick.net|209.62.178.57|:80... connected.
> >> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> >> > Connecting to ad.doubleclick.net|65.205.8.52|:80... connected.
> >> > Connecting to fls.doubleclick.net|216.73.87.48|:80... connected.
> >> > Connecting towww.theaa.com|213.225.133.206|:80... connected.
>
> >> > I need it to only select the last entry. The final problem is
> >> > seperating
> >> >www.theaa.comout. I think I can do this with "sed".
>
> >> > Any ideas on the first problem? Thanks
>
> >> "tail -1", or if you are going to be using sed anyhow
>
> >> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'
>
> > Thanks for the help.
>
> > Using that sed command doesn't seem to do the trick :(
>
> > I'm running: wget -S --spiderhttp://www.google.com2>&1 | sed -n '$s/
> > Connecting to \([^|]*\)|.*/\1/p'
> > Am I missing something stupid?
>
> > Thanks
>
> I was rather expecting you to keep the grep in the command
>
> wget -S --spiderhttp://www.google.com2>&1 >/dev/null |
> grep '^Connecting to' |
> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'

Thanks Icarus,

If I try running this command, it replaces every entry with just one-
the last replacement. I need it to replace each line on an independant
basis. Can you help with this?

Thanks

Re: Capturing output (domain name) of wget after http redirect

am 28.09.2007 17:23:04 von Icarus Sparry

On Fri, 28 Sep 2007 03:32:28 -0700, JamesG wrote:

> On Sep 27, 6:25 am, Icarus Sparry wrote:
>> On Wed, 26 Sep 2007 06:51:51 -0700, JamesG wrote:
>> > On Sep 26, 2:32 pm, Icarus Sparry wrote:
>> >> > There are now 2 remaining problems.
>> >> > Because there are multiple hops (redirects) I get a list of
>> >> > several entries.
>>
>> >> > For example:
>> >> > Connecting towww.perfiliate.com|62.190.8.171|:80... connected.
>> >> > Connecting to ad.uk.doubleclick.net|209.62.178.57|:80...
>> >> > connected. Connecting to ad.doubleclick.net|65.205.8.52|:80...
>> >> > connected. Connecting to ad.doubleclick.net|65.205.8.52|:80...
>> >> > connected. Connecting to fls.doubleclick.net|216.73.87.48|:80...
>> >> > connected. Connecting towww.theaa.com|213.225.133.206|:80...
>> >> > connected.
>>
>> >> > I need it to only select the last entry. The final problem is
>> >> > seperating
>> >> >www.theaa.comout. I think I can do this with "sed".
>>
>> >> > Any ideas on the first problem? Thanks
>>
>> >> "tail -1", or if you are going to be using sed anyhow
>>
>> >> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'
>>
>> > Thanks for the help.
>>
>> > Using that sed command doesn't seem to do the trick :(
>>
>> > I'm running: wget -S --spiderhttp://www.google.com2>&1 | sed -n '$s/
>> > Connecting to \([^|]*\)|.*/\1/p'
>> > Am I missing something stupid?
>>
>> > Thanks
>>
>> I was rather expecting you to keep the grep in the command
>>
>> wget -S --spiderhttp://www.google.com2>&1 >/dev/null |
>> grep '^Connecting to' |
>> sed -n '$s/Connecting to \([^|]*\)|.*/\1/p'
>
> Thanks Icarus,
>
> If I try running this command, it replaces every entry with just one-
> the last replacement. I need it to replace each line on an independant
> basis. Can you help with this?
>
> Thanks

email me (the address in the header is valid), with some real sample
addresses that you want to use and the output that you want. At the
moment I am finding it difficult to understand your exact requirement.
You want "I need it to only select the last entry", but not "it replaces
every entry with just one - the last replacement"