Simple string search

Simple string search

am 01.11.2007 15:57:36 von jack

hi guys,
A little problem here. I am very new to perl and i am having a problem
search for a substring in a file. So here is a sample

(this is my id for id="wksOI*84sk_")
(this is my id for id="@s3dSSos_")
(this is my id for id="dksWDkps_")

So i have page with 20 of these lines. all I am interested in the id
part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
part of each line is 12 char and it always ends with _"). I think the
regex must be for an expression that starts with id=" and ends with
") with 12 letters in the middle. So once this id has been found I
need to write it in a file.

I know with m/regex/ I can find stuff, but I don' t how to return the
cryptic id.

Any solutions.

Thanks

Re: Simple string search

am 01.11.2007 16:20:12 von glex_no-spam

Jack wrote:
> hi guys,
> A little problem here. I am very new to perl and i am having a problem
> search for a substring in a file. So here is a sample
>
> (this is my id for id="wksOI*84sk_")
> (this is my id for id="@s3dSSos_")
> (this is my id for id="dksWDkps_")
>
> So i have page with 20 of these lines. all I am interested in the id
> part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
> part of each line is 12 char and it always ends with _"). I think the
> regex must be for an expression that starts with id=" and ends with
> ") with 12 letters in the middle. So once this id has been found I
> need to write it in a file.
>
> I know with m/regex/ I can find stuff, but I don' t how to return the
> cryptic id.
>
> Any solutions.

Many solutions.

perldoc perlretut

See "Extracting matches"

Re: Simple string search

am 01.11.2007 19:25:39 von Josef Moellers

Jack wrote:
> hi guys,
> A little problem here. I am very new to perl and i am having a problem
> search for a substring in a file. So here is a sample
>
> (this is my id for id="wksOI*84sk_")
> (this is my id for id="@s3dSSos_")
> (this is my id for id="dksWDkps_")
>
> So i have page with 20 of these lines. all I am interested in the id
> part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
> part of each line is 12 char and it always ends with _"). I think the
> regex must be for an expression that starts with id=" and ends with
> ") with 12 letters in the middle. So once this id has been found I
> need to write it in a file.

This is somewhat inconsistent. The example you gave (wksOl*84sk_) is
only 11 characters long.

> I know with m/regex/ I can find stuff, but I don' t how to return the
> cryptic id.
>
> Any solutions.

if ($string =~ m/id="(.{12})"/) {
$desired_id = $1;
}

--
Mails please to josef dot moellers
and I'm on gmx dot de.

Re: Simple string search

am 01.11.2007 23:33:18 von jordilin

On 1 nov, 18:25, Josef Moellers <5502109103600...@t-online.de> wrote:
> Jack wrote:
> > hi guys,
> > A little problem here. I am very new to perl and i am having a problem
> > search for a substring in a file. So here is a sample
>
> > (this is my id for id="wksOI*84sk_")
> > (this is my id for id="@s3dSSos_")
> > (this is my id for id="dksWDkps_")
>
> > So i have page with 20 of these lines. all I am interested in the id
> > part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
> > part of each line is 12 char and it always ends with _"). I think the
> > regex must be for an expression that starts with id=" and ends with
> > ") with 12 letters in the middle. So once this id has been found I
> > need to write it in a file.
>
> This is somewhat inconsistent. The example you gave (wksOl*84sk_) is
> only 11 characters long.
>
> > I know with m/regex/ I can find stuff, but I don' t how to return the
> > cryptic id.
>
> > Any solutions.
>
> if ($string =~ m/id="(.{12})"/) {
> $desired_id = $1;
>
> }
>
> --
> Mails please to josef dot moellers
> and I'm on gmx dot de.

I have quickly written the following and tested it successfully:

while (<>) {
if (/^\(.* id="(.*)"\)/) {
print "$1\n";
}
}

This works.
Best regards,
jordi

Re: Simple string search

am 02.11.2007 19:36:58 von Josef Moellers

jordilin wrote:
> On 1 nov, 18:25, Josef Moellers <5502109103600...@t-online.de> wrote:
>> Jack wrote:
>>> hi guys,
>>> A little problem here. I am very new to perl and i am having a problem
>>> search for a substring in a file. So here is a sample
>>> (this is my id for id="wksOI*84sk_")
>>> (this is my id for id="@s3dSSos_")
>>> (this is my id for id="dksWDkps_")
>>> So i have page with 20 of these lines. all I am interested in the id
>>> part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
>>> part of each line is 12 char and it always ends with _"). I think the
>>> regex must be for an expression that starts with id=" and ends with
>>> ") with 12 letters in the middle. So once this id has been found I
>>> need to write it in a file.
>> This is somewhat inconsistent. The example you gave (wksOl*84sk_) is
>> only 11 characters long.
>>
>>> I know with m/regex/ I can find stuff, but I don' t how to return the
>>> cryptic id.
>>> Any solutions.
>> if ($string =~ m/id="(.{12})"/) {
>> $desired_id = $1;
>>
>> }
>>
>> --
>> Mails please to josef dot moellers
>> and I'm on gmx dot de.
>
> I have quickly written the following and tested it successfully:
>
> while (<>) {
> if (/^\(.* id="(.*)"\)/) {
> print "$1\n";
> }
> }
>
> This works.

Fine. TMTOWTDI.

This regex also works:
\(this is my id for id="(.*_)"\)

It's a question of the requirement: how is the input structured and how
much of the input has to be matched in order to avoid false positives.

--
Mails please to josef dot moellers
and I'm on gmx dot de.

Re: Simple string search

am 02.11.2007 20:50:24 von jordilin

On 1 nov, 18:25, Josef Moellers <5502109103600...@t-online.de> wrote:
> Jack wrote:
> > hi guys,
> > A little problem here. I am very new to perl and i am having a problem
> > search for a substring in a file. So here is a sample
>
> > (this is my id for id="wksOI*84sk_")
> > (this is my id for id="@s3dSSos_")
> > (this is my id for id="dksWDkps_")
>
> > So i have page with 20 of these lines. all I am interested in the id
> > part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
> > part of each line is 12 char and it always ends with _"). I think the
> > regex must be for an expression that starts with id=" and ends with
> > ") with 12 letters in the middle. So once this id has been found I
> > need to write it in a file.
>
> This is somewhat inconsistent. The example you gave (wksOl*84sk_) is
> only 11 characters long.
>
> > I know with m/regex/ I can find stuff, but I don' t how to return the
> > cryptic id.
>
> > Any solutions.
>
> if ($string =~ m/id="(.{12})"/) {
> $desired_id = $1;
>
> }
>
> --
> Mails please to josef dot moellers
> and I'm on gmx dot de.

Yes, there is more than one way to do it. This is Perl, isn't it? By
the way, can you explain the following regex?
m/id="(.{12})"/
I am not sure about this 12 between brackets.
Thanks in advance,
Jordi

Re: Simple string search

am 02.11.2007 20:54:34 von glex_no-spam

jordilin wrote:
[...]
> By the way, can you explain the following regex?
> m/id="(.{12})"/
> I am not sure about this 12 between brackets.

When in doubt, read the documentation, or a book.

perldoc perlretut

Look for 'Matching repetitions'.

Re: Simple string search

am 02.11.2007 21:10:47 von jordilin

On 2 nov, 19:54, "J. Gleixner"
wrote:
> jordilin wrote:
>
> [...]
>
> > By the way, can you explain the following regex?
> > m/id="(.{12})"/
> > I am not sure about this 12 between brackets.
>
> When in doubt, read the documentation, or a book.
>
> perldoc perlretut
>
> Look for 'Matching repetitions'.

The reason I am asking is because I have tried this particular regex
and it does not work in this particular example and I want the
explanation of the author.
It is very easy saying look at the docs. I recommend you Mastering
Regular Expressions from Oreilly, by the way.
best regards,
Jordi

Re: Simple string search

am 02.11.2007 22:02:26 von jurgenex

Doug Miller wrote:
> So .{12} means "any sequence of exactly 12 characters",

So far so good

> and (.{12})
> means "open paren, followed by any sequence of exactly 12 characters,
> followed by close paren".

Aehmmm, no.

jue

Re: Simple string search

am 02.11.2007 22:06:14 von jordilin

On 2 nov, 21:39, spamb...@milmac.com (Doug Miller) wrote:
> In article <1194033024.617604.170...@19g2000hsx.googlegroups.com>, jordilin wrote:
>
> >Yes, there is more than one way to do it. This is Perl, isn't it? By
> >the way, can you explain the following regex?
> >m/id="(.{12})"/
> >I am not sure about this 12 between brackets.
>
> . means "any character"
> {12} means "whatever came just before this, we're looking for 12 of it".
>
> So .{12} means "any sequence of exactly 12 characters", and (.{12}) means
> "open paren, followed by any sequence of exactly 12 characters, followed by
> close paren".
>
> --
> Regards,
> Doug Miller (alphageek at milmac dot com)
>
> It's time to throw all their damned tea in the harbor again.

Well, I understand. The problem is that, in this example the ids
differ in length, so it does not work here. We should write sth like

m/id="(.{7,})"/

match at least 7 times, taking into account there are no ids with less
than 7 chars.
Thanks
jordi

Re: Simple string search

am 02.11.2007 22:10:03 von jurgenex

jordilin wrote:
> Well, I understand. The problem is that, in this example the ids
> differ in length, so it does not work here. We should write sth like
>
> m/id="(.{7,})"/
>
> match at least 7 times, taking into account there are no ids with less
> than 7 chars.

Taking into account that HTML is not a regular language only a fool would
try to parse HTML using Regular Expressions. Even with the non-regular
enhancements in Perl REs are the wrong tool to parse HTML. This has been
discussed in this NG gazillions of times.
Or do you also use a hammer to fasten a screw? It works, ... sort of.

Use a tool that is meant to parse HTML if you want to parse HTML, e.g.
HTML::Parse.

jue

Re: Simple string search

am 02.11.2007 22:16:33 von jordilin

On 2 nov, 21:10, "Jürgen Exner" wrote:
> jordilin wrote:
> > Well, I understand. The problem is that, in this example the ids
> > differ in length, so it does not work here. We should write sth like
>
> > m/id=3D"(.{7,})"/
>
> > match at least 7 times, taking into account there are no ids with less
> > than 7 chars.
>
> Taking into account that HTML is not a regular language only a fool would
> try to parse HTML using Regular Expressions. Even with the non-regular
> enhancements in Perl REs are the wrong tool to parse HTML. This has been
> discussed in this NG gazillions of times.
> Or do you also use a hammer to fasten a screw? It works, ... sort of.
>
> Use a tool that is meant to parse HTML if you want to parse HTML, e.g.
> HTML::Parse.
>
> jue

I think you have posted in the wrong thread mate. This is not about
html,
Best regards,
Jordi

Re: Simple string search

am 02.11.2007 22:25:45 von jurgenex

jordilin wrote:
> On 2 nov, 21:10, "Jürgen Exner" wrote:
>> Taking into account that HTML is not a regular language only a fool
>> would try to parse HTML using Regular Expressions.
> I think you have posted in the wrong thread mate. This is not about
> html,


Oooops, indeed.
Sorry, I got two threads confused. You are right.

jue

Re: Simple string search

am 02.11.2007 22:32:37 von Josef Moellers

jordilin wrote:
> On 2 nov, 21:39, spamb...@milmac.com (Doug Miller) wrote:
>> In article <1194033024.617604.170...@19g2000hsx.googlegroups.com>, jordilin wrote:
>>
>>> Yes, there is more than one way to do it. This is Perl, isn't it? By
>>> the way, can you explain the following regex?
>>> m/id="(.{12})"/
>>> I am not sure about this 12 between brackets.
>> . means "any character"
>> {12} means "whatever came just before this, we're looking for 12 of it".
>>
>> So .{12} means "any sequence of exactly 12 characters", and (.{12}) means
>> "open paren, followed by any sequence of exactly 12 characters, followed by
>> close paren".
>>
>> --
>> Regards,
>> Doug Miller (alphageek at milmac dot com)
>>
>> It's time to throw all their damned tea in the harbor again.
>
> Well, I understand. The problem is that, in this example the ids
> differ in length, so it does not work here. We should write sth like
>
> m/id="(.{7,})"/
>
> match at least 7 times, taking into account there are no ids with less
> than 7 chars.

But "Jack" writes in the original post "all I am interested in the id
part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
part of each line is 12 char and it always ends with _")."
So I thought that whatever is between the "" is the id and it's supposed
to be 12 characters long.
If you now state that it should have been 7 or more, please re-read the
original post.

If the requirement is "at least 7", then, indeed, ".{7,}" is correct, as
can be found in "predoc perlre. If the requirement is "12", then ".{12}"
is correct. If the requirement were "anything between the quote signs,
no matter how much", then ".*" would be correct.
I was under the assumption that the OP wanted to filter out illegal ids
which are not 12 characters long.

YMMV,

Josef
--
Mails please to josef dot moellers
and I'm on gmx dot de.

Re: Simple string search

am 02.11.2007 22:39:02 von spambait

In article <1194033024.617604.170330@19g2000hsx.googlegroups.com>, jordilin wrote:

>Yes, there is more than one way to do it. This is Perl, isn't it? By
>the way, can you explain the following regex?
>m/id="(.{12})"/
>I am not sure about this 12 between brackets.

means "any character"
{12} means "whatever came just before this, we're looking for 12 of it".

So .{12} means "any sequence of exactly 12 characters", and (.{12}) means
"open paren, followed by any sequence of exactly 12 characters, followed by
close paren".

--
Regards,
Doug Miller (alphageek at milmac dot com)

It's time to throw all their damned tea in the harbor again.

Re: Simple string search

am 02.11.2007 23:42:57 von spambait

In article , "Jürgen Exner" wrote:
>Doug Miller wrote:
>> So .{12} means "any sequence of exactly 12 characters",
>
>So far so good
>
>> and (.{12})
>> means "open paren, followed by any sequence of exactly 12 characters,
>> followed by close paren".
>
>Aehmmm, no.
>
My fault -- you're right. It *would* mean that if the parens were escaped,
i.e. \( and \). As is, it just means a sequence of 12 characters.
>jue
>
>

--
Regards,
Doug Miller (alphageek at milmac dot com)

It's time to throw all their damned tea in the harbor again.

Re: Simple string search

am 03.11.2007 00:03:18 von jordilin

On 2 nov, 21:32, Josef Moellers <5502109103600...@t-online.de> wrote:
> jordilin wrote:
> > On 2 nov, 21:39, spamb...@milmac.com (Doug Miller) wrote:
> >> In article <1194033024.617604.170...@19g2000hsx.googlegroups.com>, jor=
dilin wrote:
>
> >>> Yes, there is more than one way to do it. This is Perl, isn't it? By
> >>> the way, can you explain the following regex?
> >>> m/id=3D"(.{12})"/
> >>> I am not sure about this 12 between brackets.
> >> . means "any character"
> >> {12} means "whatever came just before this, we're looking for 12 of it=
"
>
> >> So .{12} means "any sequence of exactly 12 characters", and (.{12}) me=
ans
> >> "open paren, followed by any sequence of exactly 12 characters, follow=
ed by
> >> close paren".
>
> >> --
> >> Regards,
> >> Doug Miller (alphageek at milmac dot com)
>
> >> It's time to throw all their damned tea in the harbor again.
>
> > Well, I understand. The problem is that, in this example the ids
> > differ in length, so it does not work here. We should write sth like
>
> > m/id=3D"(.{7,})"/
>
> > match at least 7 times, taking into account there are no ids with less
> > than 7 chars.
>
> But "Jack" writes in the original post "all I am interested in the id
> part of each line ie, wksOl*84sk_ . As you maybe able to tell the id
> part of each line is 12 char and it always ends with _")."
> So I thought that whatever is between the "" is the id and it's supposed
> to be 12 characters long.
> If you now state that it should have been 7 or more, please re-read the
> original post.
>
> If the requirement is "at least 7", then, indeed, ".{7,}" is correct, as
> can be found in "predoc perlre. If the requirement is "12", then ".{12}"
> is correct. If the requirement were "anything between the quote signs,
> no matter how much", then ".*" would be correct.
> I was under the assumption that the OP wanted to filter out illegal ids
> which are not 12 characters long.
>
> YMMV,
>
> Josef
> --
> Mails please to josef dot moellers
> and I'm on gmx dot de.

Well, you are absolutely right. The original poster should state
clearly what does he want, but he doesn=B4t.
In any case, I think we have already answered several options that the
original poster can take to solve his problem.
regards,
jordi