A shorter regexp?

A shorter regexp?

am 01.10.2004 22:12:08 von Deane.Rothenmaier

This is a multipart message in MIME format.
--===============1817452575==
Content-Type: multipart/alternative;
boundary="=_alternative 006EF9AB86256F20_="

This is a multipart message in MIME format.
--=_alternative 006EF9AB86256F20_=
Content-Type: text/plain; charset="us-ascii"

Hi, all.

Sorry to've ignited such a firestorm with my questions regarding stat() vs
ls for file dating... ;-)) I hope this question will be less provocative.

At one point in my script, I still use ls to get a line of data for files,
what I'm trying for is a regexp that'll pull the filename out--including
soft links. What I have right now--while it works--is rather, um, long:

$line =~
/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?:\d {4}|\d\d:\d\d)\s+(.*)$/;

I had a shorter regexp, actually *much* shorter:

$line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;

but it ran into a problem with the following file line:

lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614

It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as the
filename. Anyone have any suggestions for shortening that monster regexp
listed above?

Thanks!

Deane
--=_alternative 006EF9AB86256F20_=
Content-Type: text/html; charset="us-ascii"



Hi, all.



Sorry to've ignited such a firestorm with my questions regarding stat() vs ls for file dating... ;-))  I hope this question will be less provocative.



At one point in my script, I still use ls to get a line of data for files, what I'm trying for is a regexp that'll pull the filename out--including soft links. What I have right now--while it works--is rather, um, long:



$line =~

/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?:\d {4}|\d\d:\d\d)\s+(.*)$/;



I had a shorter regexp, actually *much* shorter:



$line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;



but it ran into a problem with the following file line:



lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614



It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as the filename. Anyone have any suggestions for shortening that monster regexp listed above?



Thanks!



Deane
--=_alternative 006EF9AB86256F20_=--


--===============1817452575==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============1817452575==--

RE: A shorter regexp?

am 01.10.2004 22:33:47 von marms

Not shorter, but maybe more readable :-)
This is a regex I have to match "ls -la" output on
Unix/Linux/Cygwin systems for directories, files,
and symlinks.

/^
([-dl]\S{9}) # perms
(\+)?\s+ # access control list present
(\d+)\s+ # number of links
(\S+)\s+ # owner
(\S+)\s+ # group
(\d+)\s+ # size
([A-Z][a-z][a-z]\s+\d\d?\s+(?:\d\d:\d\d|\d{4,}))\s # mtime
(.*?) # name
(?:\s+->\s(.*))? # symlink target
$/x;

--
Mike Arms


-----Original Message-----
From: Deane.Rothenmaier@walgreens.com
[mailto:Deane.Rothenmaier@walgreens.com]
Sent: Friday, October 01, 2004 2:12 PM
To: activeperl@listserv.ActiveState.com
Subject: A shorter regexp?


Hi, all.

Sorry to've ignited such a firestorm with my questions regarding stat() vs
ls for file dating... ;-)) I hope this question will be less provocative.

At one point in my script, I still use ls to get a line of data for files,
what I'm trying for is a regexp that'll pull the filename out--including
soft links. What I have right now--while it works--is rather, um, long:

$line =~
/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?:\d {4}|\d\d:\d\d)\s
+(.*)$/;

I had a shorter regexp, actually *much* shorter:

$line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;

but it ran into a problem with the following file line:

lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614

It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as the
filename. Anyone have any suggestions for shortening that monster regexp
listed above?

Thanks!

Deane

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: A shorter regexp?

am 01.10.2004 22:58:23 von Christopher.J.Gerber

> At one point in my script, I still use ls to get a line of data for
> files, what I'm trying for is a regexp that'll pull the filename
> out--including soft links. What I have right now--while it works--is
> rather, um, long:
>
> $line =~ /^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+
> (?:\d{4}|\d\d:\d\d)\s+(.*)$/;
>
> I had a shorter regexp, actually *much* shorter:
>
> $line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;
>
> but it ran into a problem with the following file line:
>
> lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614

Deane,

Just taking a quick look, and ignoring a whole host of assumptions, it seems
like you're looking for everything after the date:

$line = 'lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614';
$line =~ /\w{3} \d+ \d+:\d+ (.*)$/;
print $1;

Chris


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: A shorter regexp?

am 01.10.2004 23:06:11 von Todd Beverly

Deane.Rothenmaier@walgreens.com wrote:

>At one point in my script, I still use ls to get a line of data for files,
>what I'm trying for is a regexp that'll pull the filename out--including
>soft links. What I have right now--while it works--is rather, um, long:
>
>$line =~
>/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?:\ d{4}|\d\d:\d\d)\s+(.*)$/;
>
>I had a shorter regexp, actually *much* shorter:
>
>$line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;
>
The long ls on my machine looks like it's a fixed format. It's 53
characters from the line start to the first character of the time, so
start the regexp with the exact number of "any" characters :

$line =~ /^.{52}(\d\d:\d\d)\s+(.*)$/;

Either that or use unpack:
my ($time, $name) = unpack("x52a5xa*", $line);

>
>but it ran into a problem with the following file line:
>
>lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614
>
>It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as the
>filename. Anyone have any suggestions for shortening that monster regexp
>listed above?
>
>


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: A shorter regexp?

am 02.10.2004 00:20:13 von marms

Todd Beverly [todd.beverly@kodak.com] wrote:
>Deane.Rothenmaier@walgreens.com wrote:
>>At one point in my script, I still use ls to get a line of data for files,

>>what I'm trying for is a regexp that'll pull the filename out--including
>>soft links. What I have right now--while it works--is rather, um, long:
>>
>>$line =~
>>/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?: \d{4}|\d\d:\d\d)
\s+(.*)$/;
>>
>>I had a shorter regexp, actually *much* shorter:
>>
>>$line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;
>>
>The long ls on my machine looks like it's a fixed format. It's 53
>characters from the line start to the first character of the time, so
>start the regexp with the exact number of "any" characters :
>
>$line =~ /^.{52}(\d\d:\d\d)\s+(.*)$/;
>
>Either that or use unpack:
>my ($time, $name) = unpack("x52a5xa*", $line);
>
>>
>>but it ran into a problem with the following file line:
>>
>>lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614
>>
>>It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as the
>>filename. Anyone have any suggestions for shortening that monster regexp
>>listed above?

First off, I think Todd is off on his 53 characters (at least
from my tests). My test shows that the HH:MM of the mtime starts
at 51 characters in. This is true for Cygwin, Solaris, and Linux.

> ls -l z
-rw-rw-r-- 1 marms marms 30 Sep 3 16:01 z
1 2 3 4 5 6
123456789012345678901234567890123456789012345678901234567890

Also, if you have a big file (over 99,999,999 bytes), you will
find that the size gets jammed in and upsets the nice columns.
Thus for big files, the HH:MM will start later (depending on how
many extra digits it takes to represent the file size):

> ls -l TheOpenCD-v1.4.iso.zip TheOpenCD-v1.4.iso.zip.md5
-r--r--r-- 1 marms marms 279681009 May 3 14:53
TheOpenCD-v1.4.iso.zip
-r--r--r-- 1 marms marms 57 May 3 14:53
TheOpenCD-v1.4.iso.zip.md5
1 2 3 4 5 6
123456789012345678901234567890123456789012345678901234567890

So this solution is not safe.

--
Mike Arms

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: A shorter regexp?

am 04.10.2004 13:42:44 von Brian Raven

Deane.Rothenmaier@walgreens.com wrote:
> Hi, all.
>
> Sorry to've ignited such a firestorm with my questions regarding
> stat() vs ls for file dating... ;-)) I hope this question will be
> less provocative.
>
> At one point in my script, I still use ls to get a line of data for
> files, what I'm trying for is a regexp that'll pull the filename
> out--including soft links. What I have right now--while it works--is
> rather, um, long:
>
> $line =~
>
/^[-l][-rswx]{9}\s+\d+\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+(?:\d {4}|\d\d:\d\
d)\s+(.*)$/;
>
> I had a shorter regexp, actually *much* shorter:
>
> $line =~ /^.+(?:\d{4}|\d\d:\d\d)\s+(.*)$/;
>
> but it ran into a problem with the following file line:
>
> lrwxrwxrwx ... Sep 10 15:15 1614 -> /opt/K/SCO/...1614
>
> It tripped on the 1614 and returned only "-> /opt/K/SCO/...1614" as
> the filename. Anyone have any suggestions for shortening that monster
> regexp listed above?

If you are using the GNU ls, then you could use the -Q switch to quote
the file name(s), which should make them easier to locate with a regex,
or possibly Text::Balanced.

However, I don't know why you would want to use ls and a regex. The
following seems easier and more reliable to me:

foreach (<*>) {
print $_;
print " -> ", readlink if -l;
print "\n";
}

HTH

--
Brian Raven



------------------------------------------------------------ -----------
The information contained in this e-mail is confidential and solely
for the intended addressee(s). Unauthorised reproduction, disclosure,
modification, and/or distribution of this email may be unlawful. If you
have received this email in error, please notify the sender immediately
and delete it from your system. The views expressed in this message
do not necessarily reflect those of LIFFE Holdings Plc or any of its subsidiary companies.
------------------------------------------------------------ -----------


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs