Writing a "substring"/"replace substring" function in ksh88

am 12.09.2007 19:24:25 von andrew.fabbro

I'm having the hardest time writing substring functions in ksh88
without resorting to cut.

What I'm trying to do is write functions to:
- print characters from within a string given a range (e.g., substring
(string, start, length))
- replace a character in a string given an index (e.g., replace
(string, position, replacement_ch))

I'm aware I could call out to the shell and use cut -c (or sed/awk/
grep regex, etc.)...but I was trying to do it without an external
process call because I imagine it'd be faster without it. It'll be a
frequently-used library function. (Why not just write it in perl?
Don't ask).

In ksh88, there's no substring operator I can find. And regular
expressions in ksh88 don't support beginning- or end-of-line (^ and
$), or repeat factors like {N} and {M}, which would be pretty
essential to addressing the string by a position index.

Is there a way to have IFS set to null? i.e., to use it to break a
string down into its component characters? If so, I could march
through them, reporting or replacing and that would be easy. i.e.,

string="abcdef"
for char in $(some magic that transforms $string into "a b c d e f") ;
do
process each $char, pick out characters I want via a counter, etc.
done

My last avenue of attack was to use typeset -L and typeset -R to chop
strings up. I thought if I wanted to replace character 2 of a 20-
character string, I could typeset -L1 to get character 1, typeset -R18
to get characters 3-18, and then echo left, new character, right.

But it breaks badly on spaces. Apparently typeset -R ignores leading
spaces and the function must handle spaces. For example, here's an
example of a broken "replace" script:

#!/bin/ksh

# set up a 20-character string
string="a "
echo "start: |$string|"
echo "len : ${#string}"

# say we want to replace the 2nd character with "B"
# so let's try taking left 1, right 18, and then mush them together
with the B in the middle

typeset -L1 lefty
lefty=$string
echo "lefty: |$lefty|"
typeset -R18 righty
righty=$string
echo "right: |$righty|"
new="${lefty}B${righty}"
echo "new : |$new|"

This gives me "new : |aB a|" which is obviously
broken. typeset -R apparently ignores spaces when right-justifying.
Gack. Same problems in just extracting and reporting a substring.

I think I'm running out of options and should just get a life and use
cut -c ;-) There's nothing wrong with that, of course...just thought
I'd ask the brain trust here if that is really the only way.

Re: Writing a "substring"/"replace substring" function in ksh88

am 12.09.2007 19:41:49 von Stephane CHAZELAS

2007-09-12, 10:24(-07), Andrew Fabbro:
> I'm having the hardest time writing substring functions in ksh88
> without resorting to cut.
>
> What I'm trying to do is write functions to:
> - print characters from within a string given a range (e.g., substring
> (string, start, length))
> - replace a character in a string given an index (e.g., replace
> (string, position, replacement_ch))
[...]

Do the whole thing in awk which is a text processing tool
contrary to ksh which is a command running tool.

--
Stéphane

Re: Writing a "substring"/"replace substring" function in ksh88

am 12.09.2007 22:20:16 von cfajohnson

On 2007-09-12, Andrew Fabbro wrote:
> I'm having the hardest time writing substring functions in ksh88
> without resorting to cut.

See the string-funcs library from Chapter 3 of my book, Shell
Scripting Recipes. The scripts are available at
.

> What I'm trying to do is write functions to:
> - print characters from within a string given a range (e.g., substring
> (string, start, length))
> - replace a character in a string given an index (e.g., replace
> (string, position, replacement_ch))
>
> I'm aware I could call out to the shell and use cut -c (or sed/awk/
> grep regex, etc.)...but I was trying to do it without an external
> process call because I imagine it'd be faster without it. It'll be a
> frequently-used library function. (Why not just write it in perl?
> Don't ask).
>
> In ksh88, there's no substring operator I can find. And regular
> expressions in ksh88 don't support beginning- or end-of-line (^ and
> $), or repeat factors like {N} and {M}, which would be pretty
> essential to addressing the string by a position index.

The library I mentioned above has sub, gsub and substr functions
written entirely in the shell.

> Is there a way to have IFS set to null?

Just set it to an empty string:

IFS=

> i.e., to use it to break a string down into its component
> characters? If so, I could march through them, reporting or
> replacing and that would be easy. i.e.,
>
> string="abcdef"
> for char in $(some magic that transforms $string into "a b c d e f") ;
> do
> process each $char, pick out characters I want via a counter, etc.
> done

newstring=
while [ -n "$string" ]
do
temp=${string#?}
echo ${string%"$temp"}
newstring="${newstring:+"$newstring "}${string%"$temp"}"
string=$temp
done
printf "%s\n" "$newstring"

> My last avenue of attack was to use typeset -L and typeset -R to chop
> strings up. I thought if I wanted to replace character 2 of a 20-
> character string, I could typeset -L1 to get character 1, typeset -R18
> to get characters 3-18, and then echo left, new character, right.
>
> But it breaks badly on spaces. Apparently typeset -R ignores leading
> spaces and the function must handle spaces. For example, here's an
> example of a broken "replace" script:

I never use typeset as it is non-standard. (I also dislike the use
of the word typeset for this purpose, but that's another story.)

> #!/bin/ksh
>
> # set up a 20-character string
> string="a "
> echo "start: |$string|"
> echo "len : ${#string}"
>
> # say we want to replace the 2nd character with "B"
> # so let's try taking left 1, right 18, and then mush them together
> with the B in the middle
>
> typeset -L1 lefty
> lefty=$string
> echo "lefty: |$lefty|"
> typeset -R18 righty
> righty=$string
> echo "right: |$righty|"
> new="${lefty}B${righty}"
> echo "new : |$new|"
>
> This gives me "new : |aB a|" which is obviously
> broken. typeset -R apparently ignores spaces when right-justifying.
> Gack. Same problems in just extracting and reporting a substring.
>
> I think I'm running out of options and should just get a life and use
> cut -c ;-) There's nothing wrong with that, of course...just thought
> I'd ask the brain trust here if that is really the only way.

There's no need to use an external command to do simple string
manipulation. Generally, external commands should be reserved for
operating on files, not strings.

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: Writing a "substring"/"replace substring" function in ksh88

am 13.09.2007 16:59:48 von spcecdt

In article <1189617865.321910.76930@57g2000hsv.googlegroups.com>,
Andrew Fabbro wrote:
>In ksh88, there's no substring operator I can find. And regular
>expressions in ksh88 don't support beginning- or end-of-line (^ and
>$),

Not sure what you mean by this. The globbing patterns available in
ksh88 are implicitly anchored at the start and end.

>My last avenue of attack was to use typeset -L and typeset -R to chop
>strings up. I thought if I wanted to replace character 2 of a 20-
>character string, I could typeset -L1 to get character 1, typeset -R18
>to get characters 3-18, and then echo left, new character, right.
>
>But it breaks badly on spaces. Apparently typeset -R ignores leading
>spaces and the function must handle spaces. For example, here's an
>example of a broken "replace" script:
>
>#!/bin/ksh
>
># set up a 20-character string
>string="a "
>echo "start: |$string|"
>echo "len : ${#string}"
>
># say we want to replace the 2nd character with "B"
># so let's try taking left 1, right 18, and then mush them together
>with the B in the middle
>
>typeset -L1 lefty
>lefty=$string
>echo "lefty: |$lefty|"
>typeset -R18 righty
>righty=$string
>echo "right: |$righty|"
>new="${lefty}B${righty}"
>echo "new : |$new|"

An alternative is to use the # and % operators. For example:

#!/bin/ksh

# set up a 20-character string
string="a "
echo "start: |$string|"
echo "len : ${#string}"

# say we want to replace the 2nd character with "B"
# so let's try taking left 1, right 18, and then mush them together
with the B in the middle

lefty=${string%???????????????????}
echo "lefty: |$lefty|"
righty=${string#??}
echo "right: |$righty|"
new="${lefty}B${righty}"
echo "new : |$new|"

John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/

Re: Writing a "substring"/"replace substring" function in ksh88

am 21.09.2007 04:26:57 von brian_hiles

Andrew Fabbro wrote:
> I'm having the hardest time writing substring functions in ksh88
> without resorting to cut.
> What I'm trying to do is write functions to:
> + print characters from within a string given a range.
> + replace a character in a string given an index.

> I'm aware I could call out to the shell and use cut -c (or sed/awk/
> grep regex, etc.)...but I was trying to do it without an external
> process call because I imagine it'd be faster without it. It'll be a
> frequently-used library function. (Why not just write it in perl?
> Don't ask).

Good man. Every single subshell invocation has the overhead
of _thousands_ of typical function calls. But, I tend to agree
with what I presume is your disposition to not want to spend
the time to write a satisfactory function library yourself, just
for the sake of efficiency -- especially on deadline.

SC's advice to use [n]awk(1) IMNSHO is optimal when the
entire program can be implemented in that language, as the
invocation overhead is especially egregious. Additionally, the
pattern matching of ksh88 is equal or even a bit superior to
[n]awk(1), and now, with the latter revisions of ksh93 (q.v.
next paragraph), now superior even to sed(1). I know, having
written a parser in awk(1), and regular expression debuggers
for both ksh(1) and sed(1).

> In ksh88, there's no substring operator I can find. And regular
> expressions in ksh88 don't support beginning- or end-of-line (^ and
> $), or repeat factors like {N} and {M}, which would be pretty
> essential to addressing the string by a position index.

All ksh-type [extended] pattern matching is de facto anchored
to BOS and EOS.

The latter revisions of ksh(1) version 1993 and newer have greatly
expanded parameter substitution, with a builtin global substitution
operator like sed(1)'s "s" command. Very impressive. Most online
ksh93 manpages don't even mention the best and latest
functionality:

?(pattern-list) - Optionally matches any one of the given patterns.
*(pattern-list) - Matches zero or more occurrences of the given
patterns.
+(pattern-list) - Matches one or more occurrences of the given
patterns.
@(pattern-list) - Matches exactly one of the given patterns.
!(pattern-list) - Matches anything except one of the given patterns.

New:
{n}(pattern-list) - Matches n occurrences of the given patterns.
{m,n}(pattern-list) - Matches from m to n occurrences of the given
patterns.

Note: pattern-lists are delimited by either "&" (all patterns must
match) or "|" (any pattern must match).
Note: Use "-(" instead of "(" for the shortest (nongreedy) match.
Note: \d, \D, \s, \S, \w, \W
Note: "(options:pattern-list)" (either options or :pattern-list can be
omitted)
can consist of one or more of the following characters:
+ Enable the following options. (default)
- Disable the following options.
i Treat the match as case insensitive.
g File the longest match (greedy). (default)

I don't suppose you can use ksh93, can you? It's freely
available through kornshell.com.

> Is there a way to have IFS set to null?....

Nope, not for the behavior that you specify.

> My last avenue of attack was to use typeset -L and typeset -R to chop
> strings up. I thought if I wanted to replace character 2 of a 20-
> character string, I could typeset -L1 to get character 1, typeset -R18
> to get characters 3-18, and then echo left, new character, right.

Ah, so you have discovered the typical idiom of simulating
substring extraction. Add to that the use of the #, ##, %,
and %% parameter substitution operators, and you've very
appropriately reinvented the wheel. Congratulations! ;)

> typeset -R apparently ignores spaces when right-justifying. Gack.
> Just thought I'd ask the brain trust here if that is really the only way.

I ran into this [undocumented!] gotcha many years ago.
I haven't completely parsed your example, but IIRC I used
the workaround of prepending a known character (so whitespace
would never prepend the string) and after processing, making
sure to remove it.

And speaking of reinventing the wheel...

I'm surprised that JD didn't mention his own "strings"
function library from his impressive script archive:

"strings.ksh":
ftp://ftp.armory.com/pub/lib/ksh/strings

.... and CFAJ didn't explicitly point the OQ to his very
workable function:

"_gsub.ksh":
http://cfaj.freeshell.org/src/scripts/gsub-sh

To be complete, WP also has a strings(3)-like function
library [in bash(1)] at:

"string.bash":
http://home.eol.ca/~parkw/#string
http://home.eol.ca/~parkw/string.sh
http://linuxgazette.net/108/park.html

I myself wrote a strings(3) clone of the usual C strings
functions, including the *r* variants, in greedy _and_ non-
greedy forms(!) ... but they are not of distributable quality,
being left unfinished for a variety of technical reasons and
a change of specifications.

=Brian

Re: Writing a "substring"/"replace substring" function in ksh88

am 28.09.2007 01:06:28 von Dan Mercer

try googling (Groups):

ksh function substr mercer group:comp.unix.shell

You'll find plenty of examples.

Dan Mercer