Removing octal characters from a file
am 07.09.2007 18:29:39 von paintedjazz
Is there a way to remove octal characters e.g. \302\271 or
\342\204\242 using perl or sed or awk. What I would prefer to do is
remove all globally with one command. I'm not sure how I would enter
them as a range or even if that's even possible.
If it's not asking too much, is there also a way to incorporate one
exception into this to replace (rather than remove) octal chars used
by Microsoft instead of a simple apostrophe. Thanks a bunch for any
help.
Re: Removing octal characters from a file
am 07.09.2007 19:43:45 von Icarus Sparry
On Fri, 07 Sep 2007 16:29:39 +0000, paintedjazz wrote:
> Is there a way to remove octal characters e.g. \302\271 or \342\204\242
> using perl or sed or awk. What I would prefer to do is remove all
> globally with one command. I'm not sure how I would enter them as a
> range or even if that's even possible.
>
> If it's not asking too much, is there also a way to incorporate one
> exception into this to replace (rather than remove) octal chars used by
> Microsoft instead of a simple apostrophe. Thanks a bunch for any help.
With perl it is easy, the only question is what characters to keep.
Here I keep \010 (backspace) \011 (tab) and \012 (linefeed, used as
newline by unix), fram the control characters, \040 (space) to \176 (~).
perl -pi.bak -e 's/[\000-\007\013-\037\177-\377]//g;' filename
If you know what characters microsoft use, then you can certainly use
perl -pi.bak -e 's/MQC/'\''/g; s/[\000-\007\013-\037\177-\377]//g' filename
where MQC is the microsoft code for a quote.
Re: Removing octal characters from a file
am 07.09.2007 19:49:15 von Janis Papanagnou
paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters
What do you think "octal characters" are?
> e.g. \302\271 or
> \342\204\242 using perl or sed or awk.
The above are just character strings; escaped representations of 8-bit
values composed by octal digits.
Do you want to remove the character that may be displayed as "\302" or
do you want to remove the four-character-sequence '\', '3', '0', '2'
from your data?
> What I would prefer to do is
> remove all globally with one command. I'm not sure how I would enter
> them as a range or even if that's even possible.
If you have just charcter ranges to remove then use tr -d .
But if you have the character string representation as given above it
might be easier to do it in two steps; first transform valid sequences
of "\[0-3][0-7][0-7]" into the respective character, then use tr -d
with a range of characters (possibly also specified octal) to delete.
If you explain your task clearer we can help you further in detail.
> If it's not asking too much, is there also a way to incorporate one
> exception into this to replace (rather than remove) octal chars used
> by Microsoft instead of a simple apostrophe. Thanks a bunch for any
> help.
What do you mean by "octal chars used by Microsoft"?
Again, if it's just the characters then use tr (in this case without
option -d) as in, for example, tr \' \" or tr A-Z a-z
If you choose to use awk you may use the same function gsub() for both,
replace and remove.
Janis
Re: Removing octal characters from a file
am 07.09.2007 21:39:48 von Cyrus Kriticos
paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters e.g. \302\271 or
> \342\204\242 using perl or sed or awk. What I would prefer to do is
> remove all globally with one command.
$ echo 'ab\cdd23444\342333\204\242' | sed 's/\\[0-3][0-7][0-7]//g'
ab\cdd23444333
--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide