graphic chars, set-font and sed

am 12.04.2005 18:25:47 von Luca Ferrari

Hi,
I've got a few problem with semigraphic chars (those used tipically in dos or
in ncurses applications) under linux. Firs of all, if I use the setfont
command on a tty I can see files with the above characters listed well, but I
cannot do this on pseudo-tty (like those opened thru telnet or ssh). Any
trick for this? Second, I've noticed that sed regular expressions get
confused by the presence of multiple semigraphic chars, while a single one
seems to work ok. Does anybody knows a way to "escape" those chars, in order
to make them understandable to sed and other programs?

Thanks,
Luca
--
Luca Ferrari,
fluca1978@infinito.it
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: graphic chars, set-font and sed

am 12.04.2005 18:44:18 von Jens Knoell

Hi Luca,

On Tuesday 12 April 2005 10:25, Luca Ferrari wrote:
> Hi,
> I've got a few problem with semigraphic chars (those used tipically in dos
> or in ncurses applications) under linux. Firs of all, if I use the setfont
> command on a tty I can see files with the above characters listed well, but
> I cannot do this on pseudo-tty (like those opened thru telnet or ssh). Any
> trick for this? Second, I've noticed that sed regular expressions get
> confused by the presence of multiple semigraphic chars, while a single one
> seems to work ok. Does anybody knows a way to "escape" those chars, in
> order to make them understandable to sed and other programs?
>
> Thanks,
> Luca

These fonts usually get set on the local side. I.e. if you telnet or SSH into
your machine, the font used is the font on your LOCAL side, not the one on
your machine. Presuming that you come in from another linux box, issue the
setfont command before you telnet/ssh over to your box.

If you use an X terminal (or windows telnet/SSH client) it is usually the same
thing, except that with these you usually use the GUI to change the terminal
font (which really is a X font in this case).

Can't help with the regex question though, sorry.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: graphic chars, set-font and sed

am 14.04.2005 22:47:10 von Glynn Clements

Luca Ferrari wrote:

> I've got a few problem with semigraphic chars (those used tipically in dos or
> in ncurses applications) under linux. Firs of all, if I use the setfont
> command on a tty I can see files with the above characters listed well, but I
> cannot do this on pseudo-tty (like those opened thru telnet or ssh). Any
> trick for this?

setfont operates at the hardware level, and is specific to the Linux
virtual terminal driver. It essentially uploads the specified font to
the graphics card. You have to run it on the system whose graphics
card you wish to reconfigure (i.e. the "terminal" end of an ssh or
telnet session, not the "server" end).

You can configure xterm etc to use a different font; there's a
standard "VGA" font (vga.bdf) which is bundled with a few programs
which require DOS compatible graphics (e.g. dosemu).

> Second, I've noticed that sed regular expressions get
> confused by the presence of multiple semigraphic chars, while a single one
> seems to work ok. Does anybody knows a way to "escape" those chars, in order
> to make them understandable to sed and other programs?

sed itself should be 8-bit clean; are you sure that this isn't an
encoding (e.g. ISO-8859-1 vs UTF-8) issue?

--
Glynn Clements
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: graphic chars, set-font and sed

am 20.04.2005 18:46:07 von Luca Ferrari

On Thursday 14 April 2005 22:47 Glynn Clements's cat walking on the keyboard
wrote:

> > Second, I've noticed that sed regular expressions get
> > confused by the presence of multiple semigraphic chars, while a single
> > one seems to work ok. Does anybody knows a way to "escape" those chars,
> > in order to make them understandable to sed and other programs?
>
> sed itself should be 8-bit clean; are you sure that this isn't an
> encoding (e.g. ISO-8859-1 vs UTF-8) issue?

I don't know what you mean with "encoding issue". How can I discover it?

Luca

--
Luca Ferrari,
fluca1978@infinito.it
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: graphic chars, set-font and sed

am 21.04.2005 17:34:29 von Glynn Clements

Luca Ferrari wrote:

> > > Second, I've noticed that sed regular expressions get
> > > confused by the presence of multiple semigraphic chars, while a s=
ingle
> > > one seems to work ok. Does anybody knows a way to "escape" those =
chars,
> > > in order to make them understandable to sed and other programs?
> >
> > sed itself should be 8-bit clean; are you sure that this isn't an
> > encoding (e.g. ISO-8859-1 vs UTF-8) issue?
>=20
> I don't know what you mean with "encoding issue". How can I discover =
it?

An encoding is a mechanism for representing characters as bytes.=20
Examples of commonly-used encodings are ASCII, ISO-8859-1 and UTF-8.

ISO-8859-1 is a single-byte encoding. There are 192 printable
characters and 64 control characters, each encoded as a single byte.=20
E.g. the character "=E6" (a-e ligature, code 230) is represented by the
byte 230 ("\xE6" in C notation).

UTF-8 is a multi-byte encoding. It supports up to 2^31 characters,
each of which is encoded using between 1 and 6 bytes. The first 128
characters (the ASCII subset) are encoded as a single byte; the next
1920 characters are encoded as two bytes. E.g. the character "=E6" (a-e
ligature, code 230) is represented by the byte sequence 195,166
("\xC3\xA6" in C notation).

sed itself works with bytes, not characters. This means that it will
work with any single-byte encoding (e.g. ASCII and all of the
ISO-8859-* encodings), but it won't work with multi-byte encodings
such as UTF-8.

If you were to use an expression such as '=E6*' (zero or more
occurrences of the =E6 character), it would work in ISO-8859-1 (i.e.=20
zero or more occurrences of byte 230) but not in UTF-8, where it would
be interpreted as byte 195 followed by zero or more occurrences of
byte 166 (the * operator means "zero or more occurrences of the
preceding byte).

The semigraphic characters aren't part of the ASCII set, so the
sequence of bytes used to represent them will vary depending upon the
encoding which is used.

Essentially, you have to bear in mind that sed's regular expressions,
and the stream of data which it processes, are sequences of bytes, not
characters.

--=20
Glynn Clements
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html