pattern matching in ksh

pattern matching in ksh

am 22.01.2008 12:27:59 von hyperboogie

Hello all

I'm trying to find a better way to harness the kornshell pattern
matching abilities.
since kornshell patterns are builtin and run as part of the process,
they should also be faster then sed/grep etc...
However they seem a little awkward to use when trying to substitute
grep for example...
Is there a comfortable way to search for a string in a file???

The only way I managed to do this was iterating with a loop over the
lines in the file and look for the pattern using an if statement, like
so

I created a dummy file:
> cat junk
hello there, this is just for testing
pattern matching in kornshell
this line will not match

>cat junk | while read line ; do
> if [[ $line = *+(korn)* ]];then
> print "$line"
> fi
>done
pattern matching in kornshell

is there an easier way to do this?

Re: pattern matching in ksh

am 22.01.2008 14:39:53 von Stephane CHAZELAS

On Tue, 22 Jan 2008 03:27:59 -0800 (PST), hyperboogie wrote:
[...]
> I'm trying to find a better way to harness the kornshell pattern
> matching abilities.
> since kornshell patterns are builtin and run as part of the process,
> they should also be faster then sed/grep etc...
[...]

No, that's misunderstanding what a shell is. A shell is before
all a command line interpreter. It's a tool to run commands. If
you want a text processing programming language instead,
consider perl, ruby or awk.

Your while loop has at least 5 bugs in it and will be less
efficient and less legible than grep.

--
Stephane

Re: pattern matching in ksh

am 22.01.2008 15:20:48 von hyperboogie

On Jan 22, 3:39 pm, Stephane Chazelas
wrote:
> On Tue, 22 Jan 2008 03:27:59 -0800 (PST), hyperboogie wrote:
>
> [...]> I'm trying to find a better way to harness the kornshell pattern
> > matching abilities.
> > since kornshell patterns are builtin and run as part of the process,
> > they should also be faster then sed/grep etc...
>
> [...]
>
> No, that's misunderstanding what a shell is. A shell is before
> all a command line interpreter. It's a tool to run commands. If
> you want a text processing programming language instead,
> consider perl, ruby or awk.
>
> Your while loop has at least 5 bugs in it and will be less
> efficient and less legible than grep.
>
> --
> Stephane

Hi Stephane and thanks for your reply - this is not the first time you
help me :-)

Obviously, using the loop is not nearly as efficient as say:
cat junk | grep korn ...

If it was just text processing I would use grep/perl etc. But ... I
write a lot of korn shell scripts which involve more than (but
including) text processing.
Normally, spawning a new sub-shell is MUCH more expensive than using a
builtin and I can say for a fact (and from experience) that using some
of the kornshell pattern matching operators (for example ${variable%
%pattern} ) is more both more effective and more comfortable than
using say ... sed for the job. That's why I wanted to know if the was
a better way of using the pattern matching abilities without the
awkward loop for text files.

By the way ... what are the bugs in the loop??? it seems to work
fine ...

Re: pattern matching in ksh

am 22.01.2008 15:54:16 von Stephane CHAZELAS

On Tue, 22 Jan 2008 06:20:48 -0800 (PST), hyperboogie wrote:
[...]
> Obviously, using the loop is not nearly as efficient as say:
> cat junk | grep korn ...

and even less than

grep korn < junk

(why are you concatenating a single file?!)

> If it was just text processing I would use grep/perl etc. But ... I
> write a lot of korn shell scripts which involve more than (but
> including) text processing.
> Normally, spawning a new sub-shell is MUCH more expensive than using a
> builtin and I can say for a fact (and from experience) that using some
> of the kornshell pattern matching operators (for example ${variable%
> %pattern} ) is more both more effective and more comfortable than
> using say ... sed for the job. That's why I wanted to know if the was
> a better way of using the pattern matching abilities without the
> awkward loop for text files.
[...]

Sounds like you're trying to use the shell as you would use a
programming language. You may need to think differently.

${xxx%xxx} may be faster than using sed, if you use them as many
times as you spawn a sed process.

But in a shell script, you will call sed only a few times and in
parallel with other commands, not sequentially, because sed
processes a stream of lines and does its job on every line of
the stream, while you need to use ${xxx%xxx} for every line.

Start with getting rid of all your shell loops. That's not the
proper way of doing shell scripting.

while IFS= read -r line; do
line=$(echo "$line" | sed '...')
echo "$line"
done < file

is nonsense and unreliable, it can be written instead:

sed '...' < file

--
Stephane

Re: pattern matching in ksh

am 22.01.2008 18:24:43 von hyperboogie

> and even less than
>
> grep korn < junk
>
> (why are you concatenating a single file?!)
>
Obviously you're right - just a silly mistake - point taken :-)


> ${xxx%xxx} may be faster than using sed, if you use them as many
> times as you spawn a sed process.
>
> But in a shell script, you will call sed only a few times and in
> parallel with other commands, not sequentially, because sed
> processes a stream of lines and does its job on every line of
> the stream, while you need to use ${xxx%xxx} for every line.
>

Not necessarily (BUT MOSTLY CORRECT). for relatively small files,
ksh93 offers an alternative:

$ wc -c < junkp
5394
$ time (var=$( temp1)

real 0m0.00s
user 0m0.00s
sys 0m0.00s
$ time sed 's|kornshell|ksh|g' < junkp > temp

real 0m0.01s
user 0m0.00s
sys 0m0.01s

As you can see because there was no need to fork sed, the kornshell
version is slightly faster, but this is only true for relatively small
files.
when I tried this on a large enough log file, the sed version was
almost an ORDER OF A MAGNITUDE FASTER - not surprising, since as you
said sed works on a stream and with the ksh93 version the files
content had to be copied first to the variable and then printed out
using the ${va//x/y} quasi-sed operator.

> Start with getting rid of all your shell loops. That's not the
> proper way of doing shell scripting.
>
> while IFS= read -r line; do
> line=$(echo "$line" | sed '...')
> echo "$line"
> done < file
>
> is nonsense and unreliable, it can be written instead:
>
> sed '...' < file
>
God forbid I use a loop for such a task - It's an abomination :-)
I think I was misunderstood. I actually made a point of saying that
its an awkward way to do it. I just used it to demonstrate the
question, and ask for a reasonable kornshell builtin alternative

In any case your points are well taken - I just hoped there was a ksh
builtin way to handle such tasks.

Thanks

Re: pattern matching in ksh

am 22.01.2008 18:31:58 von OldSchool

On Jan 22, 12:24=A0pm, hyperboogie wrote:
> > and even less than
>
> > grep korn < junk
>
> > (why are you concatenating a single file?!)
>
> Obviously you're right - just a silly mistake - point taken :-)
>
> > ${xxx%xxx} may be faster than using sed, if you use them as many
> > times as you spawn a sed process.
>
> > But in a shell script, you will call sed only a few times and in
> > parallel with other commands, not sequentially, because sed
> > processes a stream of lines and does its job on every line of
> > the stream, while you need to use ${xxx%xxx} for every line.
>
> Not necessarily (BUT MOSTLY CORRECT). for relatively small files,
> ksh93 offers an alternative:
>
> $ wc -c < junkp
> =A0 =A0 5394
> $ time (var=3D$( temp1)
>
> real =A0 =A00m0.00s
> user =A0 =A00m0.00s
> sys =A0 =A0 0m0.00s
> $ time sed 's|kornshell|ksh|g' < junkp > temp
>
> real =A0 =A00m0.01s
> user =A0 =A00m0.00s
> sys =A0 =A0 0m0.01s
>

Re: pattern matching in ksh

am 22.01.2008 18:34:41 von OldSchool

> $ wc -c < junkp
> 5394
> $ time (var=$( temp1)


> real 0m0.00s
> user 0m0.00s
> sys 0m0.00s
> $ time sed 's|kornshell|ksh|g' < junkp > temp


> real 0m0.01s
> user 0m0.00s
> sys 0m0.01s


what I had intended to post is:

The example shown is mostly meaningless as they are performing
different tasks. you would have to find and substitute the strings
within the shell command for them to be comparable

Re: pattern matching in ksh

am 22.01.2008 18:35:28 von Stephane CHAZELAS

On Tue, 22 Jan 2008 09:24:43 -0800 (PST), hyperboogie wrote:
[...]
> Not necessarily (BUT MOSTLY CORRECT). for relatively small files,
> ksh93 offers an alternative:
>
> $ wc -c < junkp
> 5394
> $ time (var=$( temp1)
>
> real 0m0.00s
> user 0m0.00s
> sys 0m0.00s
> $ time sed 's|kornshell|ksh|g' < junkp > temp
>
> real 0m0.01s
> user 0m0.00s
> sys 0m0.01s
>
> As you can see because there was no need to fork sed, the kornshell
> version is slightly faster, but this is only true for relatively small
> files.
[...]

But that's still sacrificing legibility and reliability ($(<
strips trailing LFs, print adds a trailing LF and expands \x and
fails if junkp starts with -) for less than 10ms.

> when I tried this on a large enough log file, the sed version was
> almost an ORDER OF A MAGNITUDE FASTER - not surprising, since as you
> said sed works on a stream and with the ksh93 version the files
> content had to be copied first to the variable and then printed out
> using the ${va//x/y} quasi-sed operator.
[...]

Thanks for that good illustration of my point :).

${xx//xx} may make you save 10ms when it doesn't matter. Using
the proper tool for the task (like sed) will make you save
seconds or minutes when it matters.

--
Stephane

Re: pattern matching in ksh

am 22.01.2008 19:59:46 von cfajohnson

On 2008-01-22, hyperboogie wrote:
>
> I'm trying to find a better way to harness the kornshell pattern
> matching abilities.
> since kornshell patterns are builtin and run as part of the process,
> they should also be faster then sed/grep etc...

The loop needed to read the file will take much longer unless it
is a small file.

> However they seem a little awkward to use when trying to substitute
> grep for example...
> Is there a comfortable way to search for a string in a file???

Use grep unless it's a small file.

> The only way I managed to do this was iterating with a loop over the
> lines in the file and look for the pattern using an if statement, like
> so
>
> I created a dummy file:
>> cat junk
> hello there, this is just for testing
> pattern matching in kornshell
> this line will not match
>
>>cat junk | while read line ; do
>> if [[ $line = *+(korn)* ]];then
>> print "$line"
>> fi
>>done
> pattern matching in kornshell
>
> is there an easier way to do this?

This works in any POSIX shell:

while read line
do
case $line in
*korn*) printf "%s\n" "$line" ;;
esac
done < junk

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: pattern matching in ksh

am 23.01.2008 22:56:56 von brian_hiles

hyperboogie wrote:
> > ...
> Not necessarily (BUT MOSTLY CORRECT). for relatively small files,
> ksh93 offers an alternative:
> [snip]
> As you can see because there was no need to fork sed, the kornshell
> version is slightly faster, but this is only true for relatively small
> files.
> when I tried this on a large enough log file, the sed version was
> almost an ORDER OF A MAGNITUDE FASTER - not surprising, since as you
> said sed works on a stream and with the ksh93 version the files
> content had to be copied first to the variable and then printed out
> using the ${va//x/y} quasi-sed operator.

Ah yes, the classic example of the tradeoff of overhead versus
execution times. See my past post which addresses (and confirms)
this issue:

http://groups.google.com/group/comp.unix.shell/msg/420faabca 9bf31f9

(Only when sed(1) gets actually reading input, does it then
become quite efficient....)

As a rule-of-thumb, a "sufficient" builtin (as ${var//}) with
a trivial data set (say, one invocation, versus a multi-megabyte
file) will execute between 2 and 4 orders of magnitude faster
than the equivalent (but frequently more capable, of course)
external software tool (e.g. sed(1)), when process forking,
script scanning and parsing, etcetera, is taken into account.

E.g.: I try to write all my library functions invoking no
outside processes, but this can be problematic when seeking
portability to older shells. The simplest case I have from a
portable abstraction framework I have written is:

function expr { print -r -- "$(($@))"; } # IIRC

which has the desirable attribute that "portable" (hah!) scripts
written in sh(1), which typically have to use expr(1) to
accomplish arithmetric evaluation, don't have to be changed.
When the above function is autoloaded, it is transparently used
instead -- for a 100-fold speedup.

=Brian

Re: pattern matching in ksh

am 24.01.2008 00:54:18 von James Michael Fultz

* bsh :
[ ... ]
> portability to older shells. The simplest case I have from a
> portable abstraction framework I have written is:
>
> function expr { print -r -- "$(($@))"; } # IIRC
>
> which has the desirable attribute that "portable" (hah!) scripts
> written in sh(1), which typically have to use expr(1) to
> accomplish arithmetric evaluation, don't have to be changed.
> When the above function is autoloaded, it is transparently used
> instead -- for a 100-fold speedup.

That function isn't a real drop-in replacement for expr since it lacks
support for the regex pattern matching feature of expr.

% expr abc : a
1

zsh% print -r -- "$((abc : a))"
zsh: ':' without '?'

ksh$ print -r -- "$((abc : a))"
ksh: abc : a: unexpected `:'

--
James Michael Fultz
Remove this part when replying ^^^^^^^^

Re: pattern matching in ksh

am 25.01.2008 01:44:46 von brian_hiles

James Michael Fultz wrote:
> bsh :
> >
> That function isn't a real drop-in replacement for expr since it lacks
> support for the regex pattern matching feature of expr.

Yes, I know, but it would have been a trivial addenda to the
function. I could have easily posted the function which was a
proper emulation of POSIX expr(1), but as I said, it was my
simplest example, and from memory.

Indeed, before ksh(1) became ubiquitous -- and the programming
of a software tool frequently called from sh(1) no longer worth the
time I would spend on it -- and before discovering dc.sed by Greg
Ubben, I thought to write a trigonometric calculator of indefinite
precision utilizing, of all things, sed(1).

"General Decimal Arithmetic":
http://www2.hursley.ibm.com/decimal/

"dc.sed":
http://sed.sourceforge.net/grabbag/scripts/dc.sed

=Brian