showmatch script from Unix Power Tools gives error

showmatch script from Unix Power Tools gives error

am 18.10.2007 00:02:46 von BartlebyScrivener

On page 648 of Unix Power Tools 3rd, there's a script called showmatch
for testing regexp and the like.

When run it is supposed to highlight the match on stdout with carets
under the match:

match
^^^^^^

#! /bin/sh
# showmatch -- mark string that matches pattern
pattern=$1; shift
nawk 'match($0,pattern) > 0 {
s = substr($0,1,RSTART-1)
m = substr($0,1,RLENGTH)
gsub (/[^\b- ]/, " ", s)
gsub (/./, "^", m)
printf "%s\n%s%s\n", $0, s, m
}' pattern="$pattern" $*

However when I run it, I get

nawk: cmd. line:3: fatal: Invalid range end: /[- ]/

I'm still learning Unix so don't know what's causing the error is.

Any help appreciated.

Thanks,

rpd

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 02:37:49 von Maxwell Lol

BartlebyScrivener writes:

> #! /bin/sh
> # showmatch -- mark string that matches pattern
> pattern=$1; shift
> nawk 'match($0,pattern) > 0 {
> s = substr($0,1,RSTART-1)
> m = substr($0,1,RLENGTH)
> gsub (/[^\b- ]/, " ", s)

The error is on this line^^^^
Try
gsub (/[^\b]/, " ", s)
or perhaps
gsub (/[^\b ]/, " ", s)


Let me try an explanation

When regular expressions have a '-' between [ and ] and it's not at
the end, it's a range. if it says [a-Z] it says anything between A and
z. Trouble is, the value of 'Z' might be less than 'a' - which is true
if you use ASCII characters. That's an illegal range. '[a-zA-z]' is
okay, as is the newer [[:alpha:]]

In this case, it's a range between '\b' and ' '. the '\b' is a word
boundry, and not really a character which can be used as a range.
It matches a 'nil' character between words.

I deleted the '-' as I didn't think a range is used, and I didn't
think the character '-' should be in the set.
Try that....

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 04:50:13 von Icarus Sparry

On Wed, 17 Oct 2007 22:02:46 +0000, BartlebyScrivener wrote:

> On page 648 of Unix Power Tools 3rd, there's a script called showmatch
> for testing regexp and the like.
>
> When run it is supposed to highlight the match on stdout with carets
> under the match:
>
> match
> ^^^^^^
>
> #! /bin/sh
> # showmatch -- mark string that matches pattern pattern=$1; shift
> nawk 'match($0,pattern) > 0 {
> s = substr($0,1,RSTART-1)
> m = substr($0,1,RLENGTH)
> gsub (/[^\b- ]/, " ", s)
> gsub (/./, "^", m)
> printf "%s\n%s%s\n", $0, s, m
> }' pattern="$pattern" $*
>
> However when I run it, I get
>
> nawk: cmd. line:3: fatal: Invalid range end: /[- ]/
>
> I'm still learning Unix so don't know what's causing the error is.

Well it works for me, using "mawk" rather than "nawk".
The awk program works by first setting "s" to be the characters before
the match, and "m" being a string that happens to be the length of the
match. It then changes any character whose value is not between 8 and 32
inclusive (assuming an ASCII character set) in s to a space, and all the
characters in m to a caret. It then prints out the line, followed by the
s and m strings.

The reason for keeping certain characters in the "s" string is to
preserve tabs and backspaces in particular as they affect the column in
the output.

Try changing the [^\b- ] to [^\010-\040], and see if that helps.

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 10:37:21 von Stephane CHAZELAS

2007-10-17, 20:37(-04), Maxwell Lol:
[...]
>> gsub (/[^\b- ]/, " ", s)
[...]
> In this case, it's a range between '\b' and ' '. the '\b' is a word
> boundry, and not really a character which can be used as a range.
> It matches a 'nil' character between words.
[...]

No, \b is the BS (backspace character) aka ^H, 0x8. That's why
the error message displays as "[- ]" (the BS character, when
sent to a terminal moved the cursor to the left, that's why the
^ has been erased). In ASCII, it is before space which is 0x20.
So [^\b- ] should be OK, in an ASCII, iso8859-x or UTF8
locale.

But maybe it is not an ASCII SPC we are seeing above?

--
Stéphane

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 13:45:55 von Maxwell Lol

Stephane CHAZELAS writes:

> 2007-10-17, 20:37(-04), Maxwell Lol:
> [...]
> >> gsub (/[^\b- ]/, " ", s)
> [...]
> > In this case, it's a range between '\b' and ' '. the '\b' is a word
> > boundry, and not really a character which can be used as a range.
> > It matches a 'nil' character between words.
> [...]
>
> No, \b is the BS (backspace character) aka ^H, 0x8.


That makes sense, but I guess \b has different meanings in regular expressions.
I checked a few on-line references before replying, and these say \b is a word boundry.

http://www.regular-expressions.info/wordboundaries.html
http://www.regular-expressions.info/reference.html

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 14:21:46 von Ed Morton

Maxwell Lol wrote:
> Stephane CHAZELAS writes:
>
>
>>2007-10-17, 20:37(-04), Maxwell Lol:
>>[...]
>>
>>>> gsub (/[^\b- ]/, " ", s)
>>
>>[...]
>>
>>>In this case, it's a range between '\b' and ' '. the '\b' is a word
>>>boundry, and not really a character which can be used as a range.
>>>It matches a 'nil' character between words.
>>
>>[...]
>>
>>No, \b is the BS (backspace character) aka ^H, 0x8.
>
>
>
> That makes sense, but I guess \b has different meanings in regular expressions.
> I checked a few on-line references before replying, and these say \b is a word boundry.
>
> http://www.regular-expressions.info/wordboundaries.html
> http://www.regular-expressions.info/reference.html
>
>
>

Not in awk, see how GNU awk uses \y instead of \b in:

http://www.gnu.org/software/gawk/manual/gawk.html#Escape-Seq uences
http://www.gnu.org/software/gawk/manual/gawk.html#GNU-Regexp -Operators

Regards,

Ed.

Re: showmatch script from Unix Power Tools gives error

am 18.10.2007 16:59:52 von BartlebyScrivener

On Oct 17, 9:50 pm, Icarus Sparry wrote:

> Well it works for me, using "mawk" rather than "nawk".

Hey, mawk works for me too! Thanks!

This stuff is WAY over my head. I've been Linux only for a year or so,
but I work in Vim most of the time as a writer. Now I'm trying to
learn sed and awk etc. So I still have a long way to go. I don't know
the varieties of these yet. How many awks are there? :) Don't answer,
I'll work away at this. But this showmatch script really helps testing
the regexps first.

Thanks for all of the responses.

rpd

Re: showmatch script from Unix Power Tools gives error

am 22.10.2007 23:25:43 von brian_hiles

BartlebyScrivener wrote:
> This stuff is WAY over my head. I've been Linux only for a year or so,
> but I work in Vim most of the time as a writer.

Merely understanding the _usefulness_ of REs is the crucial 10% of
the iceberg that appears above the surface. One need not have an
understanding of the internals of REs to effectively use them. Kudos
to you, a writer, for using vim(1) and with it, REs -- a fabulously
useful tool that, alas, PCers have never adequately exploited.

> Now I'm trying to
> learn sed and awk etc. So I still have a long way to go.

You understand the necessity of a good but simple debugger. I am the
author of an advanced sed(1) debugger, in case anyone wants to ever
use it :( but sedcheck.sed will probably be more practical.

"sd.ksh": my interactive sed(1) debugger
http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt
(not currently accessible -- get from Google cache)

"sedcheck.sed": sed(1) debugger: static checker
http://lvogel.free.fr/sed/
http://lvogel.free.fr/jsed/check.html

"sedsed.py": sed(1) debugger: trace/indent/tokenize/HTMLize
http://freshmeat.net/projects/sedsed/
http://sedsed.sf.net/

Also, I recommend the book:

Dougherty & Robbins. "Sed And Awk" 2nd ed. O'Reilly. oreilly.com/catalog/sed2> .

> But this showmatch script really helps testing the regexps first.

I have not been able to find and experiment with showmatch.awk, but
there are so many tools to accomplish the similar goal. I throw out
a few links of such tools that you or other readers may find useful,
depending on your installed interpreters, with the ones I presume
applicable to your issue sorted to the top.

"highlite.{c,pl}": highlite matching regex
http://sf.net/projects/highlite

"Kiki.py": regex debugger: under Linux/Mac/Win32; utilizes wxPython
http://project5.freezope.org/kiki/

"kodos.py": regex debugger: requires PyQT and SIP
http://kodos.sf.net;http://fresh.t-systems-sfr.com/unix/src/ privat2/

"Redet.tcl": Regular Expression Development and Execution Tool
http://www.billposer.org/Software/redet.html

"The Regex Coach.lisp": regex debugger
http://www.weitz.de/regex-coach/

"Regexpviewer.tcl": regex debugger
http://freshmeat.net/projects/regexpviewer/

"rx.pl": regex debugger
http://perl.plover.com/Rx/
http://perl.plover.com/yak/rx/

"Viewglob.": glob tracking, visualization, and completion: GUI
http://sf.net/projects/viewglob/

"visual-regexp.tcl": interactively debug regular expressions
http://packages.debian.org/stable/devel/visual-regexp/

"regexp-gen.awk": read regexes and write the strings in the language
associated with the regexp read
http://groups.google.com/group/comp.lang.awk/browse_thread/t hread/9c0808a23d591d8c/237b9518b2061b84?lnk=st&q=#237b9518b2 061b84

"Kregexpeditor.": generate regular expression: implements Qt toolkit
or emacs(1)
http://docs.kde.org/stable/en/kdeutils/KRegExpEditor/

"rebug.pl": regex debugger


"reWork: A Regular Expression Workbench": NFA toolkit
http://osteele.com/tools/rework/
"reAnimator.js": NFA visualizer
http://osteele.com/tools/reanimator/

"shellpat.sh": convert shell patterns to grep regular expressions
http://www.shelldorado.com/scripts/cmds/shellpat.txt

"^txt2regex$.bash": convert NL to regex: multiplatform
http://txt2regex.sf.net/
https://sf.net/projects/txt2regex/
http://freshmeat.net/projects/txt2regex/

> I don't know the varieties of these yet. How many awks are there? :)

http://www.shelldorado.com/articles/awkcompat.html

(You can't go wrong using any modern, native nawk or awk preinstalled
on your OS, but gawk(1) or "The One True Awk" are best).

http://www.gnu.org/software/gawk/
"The One True Awk": http://cm.bell-labs.com/cm/cs/awkbook/index.html

Are you aware of the awk-only Usenet group comp.lang.awk ?

Good luck!

=Brian