Simple find question

Simple find question

am 22.10.2007 23:13:10 von Anoop kumar V

Can somebody please help me with the find command using regex. For
some reason it is not behaving as a regular expression should.

I have a directory with files as below:

$ ls -ltr | grep "TRPT"
-rw------- 1 xx other 0 Oct 22 16:23
TRPT_SCH_TABLE_20072356723.xml.gz
-rw------- 1 xx other 0 Oct 22 16:24
TRPTSCHTABLE_20072356723.xml.gz
-rw------- 1 xx other 0 Oct 22 16:43
TRPTSCHTABLE_20072456923.xml.gz
$

The name of the files begin with TRPT, have an optional _, then SCH,
again an optional _ and the rest of the filename.
I want to invoke find so that I can find these (all 3) files. I tried
these commands:

$ find . -name "TRPT_?*" -print
../TRPT_SCH_TABLE_20072356723.xml.gz
$ find . -name "TRPT_?S*" -print
$ find . -name "TRPT_?SCH_?TABLE_2007*" -print
$ find . -name "*TRPT_?SCH_?TABLE_2007*" -print
$

I should have got all 3 files listed if I find "TRPT_?S*" but I find
only the one which has _ . The ? should treat it as optional but it
isnt working that way. What may be the problem with my command?

If I remove the _ altogether I get this:

$ find . -name "TRPT?SCH?TABLE_2007*" -print
../TRPT_SCH_TABLE_20072356723.xml.gz

I am using SunOS: 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire

Thank you

Re: Simple find question

am 22.10.2007 23:22:36 von Stephane CHAZELAS

2007-10-22, 21:13(-00), Anoop:
> Can somebody please help me with the find command using regex. For
> some reason it is not behaving as a regular expression should.

For the simple reason that find -name's patterns are not regular
expressions but fnmatch(3) like wildcards (as in shell
globbing).


> I have a directory with files as below:
>
> $ ls -ltr | grep "TRPT"
> -rw------- 1 xx other 0 Oct 22 16:23
> TRPT_SCH_TABLE_20072356723.xml.gz
> -rw------- 1 xx other 0 Oct 22 16:24
> TRPTSCHTABLE_20072356723.xml.gz
> -rw------- 1 xx other 0 Oct 22 16:43
> TRPTSCHTABLE_20072456923.xml.gz
> $
>
> The name of the files begin with TRPT, have an optional _, then SCH,
> again an optional _ and the rest of the filename.
> I want to invoke find so that I can find these (all 3) files. I tried
> these commands:
[...]

find . \( -name 'TRPT_SCH_TABLE*' -o \
-name 'TRPT_SCHTABLE*' -o \
-name 'TRPTSCHTABLE*' -o \
-name 'TRPTSCH_TABLE*' \) -print

Or you could install GNU find that has the -regex predicate:

find . -regex '.*/TRPT_?SCH_?TABLE[^/]*' -print

Or you could use zsh:

print -rl -- **/TRPT(_|)SCH(_|)TABLE*

--
Stéphane

Re: Simple find question

am 23.10.2007 00:10:38 von Steffen Schuler

Hi Anoop, hello netlanders!

On Mon, 22 Oct 2007 21:13:10 +0000, Anoop wrote:

> Can somebody please help me with the find command using regex. For some
> reason it is not behaving as a regular expression should.
>
> I have a directory with files as below:
>
> $ ls -ltr | grep "TRPT"
> -rw------- 1 xx other 0 Oct 22 16:23
> TRPT_SCH_TABLE_20072356723.xml.gz
> -rw------- 1 xx other 0 Oct 22 16:24
> TRPTSCHTABLE_20072356723.xml.gz
> -rw------- 1 xx other 0 Oct 22 16:43
> TRPTSCHTABLE_20072456923.xml.gz
> $
>
> The name of the files begin with TRPT, have an optional _, then SCH,
> again an optional _ and the rest of the filename. I want to invoke find
> so that I can find these (all 3) files. I tried these commands:
>
> $ find . -name "TRPT_?*" -print
> ./TRPT_SCH_TABLE_20072356723.xml.gz
> $ find . -name "TRPT_?S*" -print
> $ find . -name "TRPT_?SCH_?TABLE_2007*" -print $ find . -name
> "*TRPT_?SCH_?TABLE_2007*" -print $
>
> I should have got all 3 files listed if I find "TRPT_?S*" but I find
> only the one which has _ . The ? should treat it as optional but it isnt
> working that way. What may be the problem with my command?
>
> If I remove the _ altogether I get this:
>
> $ find . -name "TRPT?SCH?TABLE_2007*" -print
> ./TRPT_SCH_TABLE_20072356723.xml.gz
>
> I am using SunOS: 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire
>
> Thank you

find -name patterns are no full regular expressions. In a -name pattern
'?' matches exactly one character and '*' matches any string of
characters inclusively the empty string.

In extended regular expressions '?' and '*' are quantifiers. If 'r' is a
regex '(r)?' matches any string which is empty or matches r. And '(r)*'
matches any string s1s2...sn where n >= 0 and si matches r for each i in
{0,...,n}.

So in the following lines the lhs pattern is equivalent to rhs regex:

pattern regex

'ab?c' '^ab.?c$'
'ab*c' '^ab.*c$'

In the regexes '^' and '$' are anchors which match an empty substring at
the begin ('^') or end ('$') of the string in question and '.' matches
"any" character. "any" is dependent from the the regex application.


In your case simply use:

find . -name "TRPT*SCH*" | egrep '^.*/TRPT_?SCH[^/]*$'

if you dislike Stephane's scripts.

This is tested on Solaris 10 (Intel).

Kind regards,

Steffen "goedel" Schuler

Re: Simple find question

am 23.10.2007 00:34:50 von Steffen Schuler

On Mon, 22 Oct 2007 22:10:38 +0000, Steffen Schuler wrote:

> So in the following lines the lhs pattern is equivalent to rhs regex:
>
> pattern regex
>
> 'ab?c' '^ab.?c$'

sorry, this should be:

'ab?c' '^ab.c$'

>
> In your case simply use:
>
> find . -name "TRPT*SCH*" | egrep '^.*/TRPT_?SCH[^/]*$'

and this is shorter:

find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'

>
> if you dislike Stephane's scripts.
>
> This is tested on Solaris 10 (Intel).
>
> Kind regards,
>
> Steffen "goedel" Schuler

Re: Simple find question

am 23.10.2007 00:58:14 von Anoop kumar V

On Oct 22, 6:34 pm, Steffen Schuler
wrote:
> On Mon, 22 Oct 2007 22:10:38 +0000, Steffen Schuler wrote:
> > So in the following lines the lhs pattern is equivalent to rhs regex:
>
> > pattern regex
>
> > 'ab?c' '^ab.?c$'
>
> sorry, this should be:
>
> 'ab?c' '^ab.c$'
>
>
>
> > In your case simply use:
>
> > find . -name "TRPT*SCH*" | egrep '^.*/TRPT_?SCH[^/]*$'
>
> and this is shorter:
>
> find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'
>
>
>
> > if you dislike Stephane's scripts.
>
> > This is tested on Solaris 10 (Intel).
>
> > Kind regards,
>
> > Steffen "goedel" Schuler

Thank you so much both of you - I assumed find would work with regex
and I wish I could install GNU find, but I cant.

And the solution:
find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'

works, but so does:

find . -name "TRPT*SCH*"

so a bit uncertain about the reason we need to include the egrep part.

Thanks!

Re: Simple find question

am 23.10.2007 02:04:23 von Ed Morton

Anoop wrote:
> On Oct 22, 6:34 pm, Steffen Schuler
> wrote:
>
>>On Mon, 22 Oct 2007 22:10:38 +0000, Steffen Schuler wrote:
>>
>>>So in the following lines the lhs pattern is equivalent to rhs regex:
>>
>>>pattern regex
>>
>>>'ab?c' '^ab.?c$'
>>
>>sorry, this should be:
>>
>> 'ab?c' '^ab.c$'
>>
>>
>>
>>
>>>In your case simply use:
>>
>>>find . -name "TRPT*SCH*" | egrep '^.*/TRPT_?SCH[^/]*$'
>>
>>and this is shorter:
>>
>> find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'
>>
>>
>>
>>
>>>if you dislike Stephane's scripts.
>>
>>>This is tested on Solaris 10 (Intel).
>>
>>>Kind regards,
>>
>>>Steffen "goedel" Schuler
>
>
> Thank you so much both of you - I assumed find would work with regex
> and I wish I could install GNU find, but I cant.
>
> And the solution:
> find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'
>
> works, but so does:
>
> find . -name "TRPT*SCH*"
>
> so a bit uncertain about the reason we need to include the egrep part.

TRhe find alone would find files with any sequence of characters between
"TRPT" and "SCH" plus any sequence of characters after "SCH", e.g. it'd
find a file named:

TRPThereswordsSCHandmorewords

The subsequent egrep reduces the output to files which can only have a
single underscore or nothing at all between the "TRPT" and "SCH" and
where "SCH" is followed by a single non-forward slash character then
only a sequence of zero or more consecutive blanks.

Ed.

Re: Simple find question

am 23.10.2007 05:46:57 von Anoop kumar V

On Oct 22, 8:04 pm, Ed Morton wrote:
> Anoop wrote:
> > On Oct 22, 6:34 pm, Steffen Schuler
> > wrote:
>
> >>On Mon, 22 Oct 2007 22:10:38 +0000, Steffen Schuler wrote:
>
> >>>So in the following lines the lhs pattern is equivalent to rhs regex:
>
> >>>pattern regex
>
> >>>'ab?c' '^ab.?c$'
>
> >>sorry, this should be:
>
> >> 'ab?c' '^ab.c$'
>
> >>>In your case simply use:
>
> >>>find . -name "TRPT*SCH*" | egrep '^.*/TRPT_?SCH[^/]*$'
>
> >>and this is shorter:
>
> >> find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'
>
> >>>if you dislike Stephane's scripts.
>
> >>>This is tested on Solaris 10 (Intel).
>
> >>>Kind regards,
>
> >>>Steffen "goedel" Schuler
>
> > Thank you so much both of you - I assumed find would work with regex
> > and I wish I could install GNU find, but I cant.
>
> > And the solution:
> > find . -name "TRPT*SCH*" | egrep '/TRPT_?SCH[^/]*$'
>
> > works, but so does:
>
> > find . -name "TRPT*SCH*"
>
> > so a bit uncertain about the reason we need to include the egrep part.
>
> TRhe find alone would find files with any sequence of characters between
> "TRPT" and "SCH" plus any sequence of characters after "SCH", e.g. it'd
> find a file named:
>
> TRPThereswordsSCHandmorewords
>
> The subsequent egrep reduces the output to files which can only have a
> single underscore or nothing at all between the "TRPT" and "SCH" and
> where "SCH" is followed by a single non-forward slash character then
> only a sequence of zero or more consecutive blanks.
>
> Ed.

Perfect - thanks Ed for the clarification.

Anoop