Field splitting: Is the default IFS treated differently?

Field splitting: Is the default IFS treated differently?

am 27.01.2008 04:37:41 von Eze

Hello!

Please tell me where the catch is!

A="a b" # six spaces
echo $A # outputs 'a b' (one space, ok)
IFS=","
B="a,,,,,,b"
echo $B # outputs 'a b' (six spaces!) why not 'a b'?
echo $A # again 'a b', so IFS seems to play a role in the
default behavior above

Thank you much in advance for any light on this puzzle.

Cheers,

Ezequiel

Re: Field splitting: Is the default IFS treated differently?

am 27.01.2008 04:54:41 von Eze

OK, I think I got it after re-re-re-...-reading the POSIX
specification on word splitting [http://www.opengroup.org/onlinepubs/
009695399/utilities/xcu_chap02.html#tag_02_06_05]:

Otherwise [if IFS is not ''], the following rules
shall be applied in sequence. The term " IFS white space" is used to
mean any sequence (zero or more instances) of white space characters
that are in the IFS value (for example, if IFS contains /
/ , any sequence of s and s is considered IFS
white space).

1. IFS white space shall be ignored at the beginning and end of the
input.
2. Each occurrence in the input of an IFS character that is not IFS
white space, along with any adjacent IFS white space, shall delimit a
field, as described previously.
3. Non-zero-length IFS white space shall delimit a field.

So I guess from 2 that the (four) middle commas, displayed as spaces,
are taken to delimit void fields. (Sigh.)
Any comments/corrections are still welcome.

Re: Field splitting: Is the default IFS treated differently?

am 27.01.2008 04:57:40 von G_r_a_n_t_

On Sat, 26 Jan 2008 19:37:41 -0800 (PST), Eze wrote:

>Hello!
>
>Please tell me where the catch is!
>
>A="a b" # six spaces
>echo $A # outputs 'a b' (one space, ok)

~$ A="a b"; echo $A "-> $A"
a b -> a b

IOW: Quote your variables!

Grant.
--
http://bugsplatter.mine.nu/

Re: Field splitting: Is the default IFS treated differently?

am 27.01.2008 21:46:50 von Stephane CHAZELAS

On Sat, 26 Jan 2008 19:54:41 -0800 (PST), Eze wrote:
> OK, I think I got it after re-re-re-...-reading the POSIX
> specification on word splitting [http://www.opengroup.org/onlinepubs/
> 009695399/utilities/xcu_chap02.html#tag_02_06_05]:
>
> Otherwise [if IFS is not ''], the following rules
> shall be applied in sequence. The term " IFS white space" is used to
> mean any sequence (zero or more instances) of white space characters
> that are in the IFS value (for example, if IFS contains /
> / , any sequence of s and s is considered IFS
> white space).
>
> 1. IFS white space shall be ignored at the beginning and end of the
> input.
> 2. Each occurrence in the input of an IFS character that is not IFS
> white space, along with any adjacent IFS white space, shall delimit a
> field, as described previously.
> 3. Non-zero-length IFS white space shall delimit a field.
>
> So I guess from 2 that the (four) middle commas, displayed as spaces,
> are taken to delimit void fields. (Sigh.)
> Any comments/corrections are still welcome.
[...]

Yes, that's correct.

That's an area where POSIX shells differ from the Bourne shell.
In the Bourne shell (and also in rc-like shells), any sequence
of delimiter characters (either white space of not) constitutes
one separator and the leading and trailing delimiter characters
are discarded.

In zsh and ksh93, you can use IFS=$'\t\t' or IFS=' ' (that is
double the white space chars in $IFS) if you want them to be
treated the same way as other characters wrt splitting.

Note that there's an area where the POSIX shells differ, that's
about the trailing delimiters. And the POSIX spec is not clear
about that.

For some, if IFS=:, var=foo:, $var gets split into "foo" and
"" (IFS means internal field separator), and for some (some
versions of bash and some implementations of ksh and ash), it
gets split into "foo" only (IFS means internal field
s^Hterminator). I guess the rational is that for those where S
means "separator", there's not way to represent a list
consisting of one empty element which is a bit embarrassing.
What that means is that you can't do:

set -f; IFS=:
for i in $PATH; do
...
done

as the behavior differs from shell to shell in the quite common
case where $PATH ends in ":".

--
Stephane