filter and then pipe a file back to itself

filter and then pipe a file back to itself

am 01.04.2008 21:09:47 von Andre Steinert

Is there an easy / elegant way to read from a file, then process the
stream with a set of filters and pipe back to the same file? For
example:

Let's say I have a file

B
C
D

and I want to add an A at the top to get a

A
B
C
D

I'd like to do:
( echo "A" ; cat t ) > t

but that spews an angry "cat: t: input file is output file"

This is a trivial toy example but I find myself running into this sort
of situation every so often. I can always work-around by using a temp
file but I'm just wondering if I'm missing an obvious metaphor...

--
Rahul

Re: filter and then pipe a file back to itself

am 01.04.2008 21:40:20 von PK

Rahul wrote:

> I'd like to do:
> ( echo "A" ; cat t ) > t
>
> but that spews an angry "cat: t: input file is output file"
>
> This is a trivial toy example but I find myself running into this sort
> of situation every so often. I can always work-around by using a temp
> file but I'm just wondering if I'm missing an obvious metaphor...

There is a recent post where a similar question was asked (subject: "shell
script replacing original file"). See the discussion there.
In short, from what I understand, it's generally unsafe since programs in a
pipeline are executed in parallel, and you don't have any guarantee that
the file will be completely read before the redirection starts writing to
it (especially if it's a large file), and thus what you want can't be done
(without using a temporary file), but read that thread for the details.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: filter and then pipe a file back to itself

am 02.04.2008 05:16:43 von Florian Diesch

MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Rahul wrote:

> Is there an easy / elegant way to read from a file, then process the
> stream with a set of filters and pipe back to the same file? For
> example:

That's what sponge is for. It's part of moreutils




Florian
--

------------------------------------------------------------ -----------
** Hi! I'm a signature virus! Copy me into your signature, please! **
------------------------------------------------------------ -----------

Re: filter and then pipe a file back to itself

am 02.04.2008 20:19:24 von Icarus Sparry

On Tue, 01 Apr 2008 21:40:20 +0200, pk wrote:

> Rahul wrote:
>
>> I'd like to do:
>> ( echo "A" ; cat t ) > t
>>
>> but that spews an angry "cat: t: input file is output file"
>>
>> This is a trivial toy example but I find myself running into this sort
>> of situation every so often. I can always work-around by using a temp
>> file but I'm just wondering if I'm missing an obvious metaphor...
>
> There is a recent post where a similar question was asked (subject:
> "shell script replacing original file"). See the discussion there. In
> short, from what I understand, it's generally unsafe since programs in a
> pipeline are executed in parallel, and you don't have any guarantee that
> the file will be completely read before the redirection starts writing
> to it (especially if it's a large file), and thus what you want can't be
> done (without using a temporary file), but read that thread for the
> details.

It can be done, the usual reason why it isn't done is that if the
operation is interrupted you are not in a good position to recover.

exec < t # give a new name (file descriptor 0 in this case) to t
rm t # remove the old name
{ echo A ; cat ; } > t # generate the new content.

Re: filter and then pipe a file back to itself

am 03.04.2008 03:13:02 von Janis Papanagnou

Icarus Sparry wrote:
> On Tue, 01 Apr 2008 21:40:20 +0200, pk wrote:
>
>
>>Rahul wrote:
>>
>>
>>>I'd like to do:
>>>( echo "A" ; cat t ) > t
>>>
>>>but that spews an angry "cat: t: input file is output file"
>>>
>>>This is a trivial toy example but I find myself running into this sort
>>>of situation every so often. I can always work-around by using a temp
>>>file but I'm just wondering if I'm missing an obvious metaphor...
>>
>>There is a recent post where a similar question was asked (subject:
>>"shell script replacing original file"). See the discussion there. In
>>short, from what I understand, it's generally unsafe since programs in a
>>pipeline are executed in parallel, and you don't have any guarantee that
>>the file will be completely read before the redirection starts writing
>>to it (especially if it's a large file), and thus what you want can't be
>>done (without using a temporary file), but read that thread for the
>>details.
>
>
> It can be done, the usual reason why it isn't done is that if the
> operation is interrupted you are not in a good position to recover.

It's also bad if you attempt that while the file system is full.

>
> exec < t # give a new name (file descriptor 0 in this case) to t
> rm t # remove the old name
> { echo A ; cat ; } > t # generate the new content.

Janis

Re: filter and then pipe a file back to itself

am 03.04.2008 11:46:58 von PK

Icarus Sparry wrote:

> It can be done, the usual reason why it isn't done is that if the
> operation is interrupted you are not in a good position to recover.
>
> exec < t # give a new name (file descriptor 0 in this case) to t

Try this on the command line.

> rm t # remove the old name
> { echo A ; cat ; } > t # generate the new content.

Ah sure. But then it's not a single command or pipeline. And it seems to me
that this is just another (more disguised) way to use a temporary file,
isnt'it? If the "exec < t" is executed in the same command group or
pipeline as the other commands, the file is deleted and the script hangs.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: filter and then pipe a file back to itself

am 03.04.2008 14:48:10 von PK

pk wrote:

> Icarus Sparry wrote:
>
>> It can be done, the usual reason why it isn't done is that if the
>> operation is interrupted you are not in a good position to recover.
>>
>> exec < t # give a new name (file descriptor 0 in this case) to t
>
> Try this on the command line.

Ok, I think now I understand that better. For this problem, it seems it's
enough to use a descriptor other than 0:

exec 5< t

>> rm t # remove the old name
>> { echo A ; cat ; } > t # generate the new content.

This works (with cat <&5) because the OS creates a different (new) file on
disk for t, while cat still reads from the old file (which has not been
deleted since it's still open), correct? If that is correct, then the shell
is working with two files, just like when a temporary file is used.

As for the single pipeline problem, is it however correct that the
preliminary descriptor must be opened using a stand-alone operation and the
command must not be part of a pipeline (or compound command) ending with ">
t", otherwise the shell will wipe the file first (or at some unpredictable
time)?

Thanks for any answer.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: filter and then pipe a file back to itself

am 03.04.2008 14:51:47 von PK

pk wrote:

> As for the single pipeline problem, is it however correct that the
> preliminary descriptor must be opened

, and the original file removed,

> using a stand-alone operation ...etc.

Re: filter and then pipe a file back to itself

am 03.04.2008 15:37:36 von Janis Papanagnou

On 3 Apr., 14:48, pk wrote:
> pk wrote:
> > Icarus Sparry wrote:
>
> >> It can be done, the usual reason why it isn't done is that if the
> >> operation is interrupted you are not in a good position to recover.
>
> >> exec < t # give a new name (file descriptor 0 in this case) to t
>
> > Try this on the command line.
>
> Ok, I think now I understand that better. For this problem, it seems it's
> enough to use a descriptor other than 0:
>
> exec 5< t
>
> >> rm t # remove the old name
> >> { echo A ; cat ; } > t # generate the new content.
>
> This works (with cat <&5) because the OS creates a different (new) file on
> disk for t, while cat still reads from the old file (which has not been
> deleted since it's still open), correct? If that is correct, then the shell
> is working with two files, just like when a temporary file is used.
>
> As for the single pipeline problem, is it however correct that the
> preliminary descriptor must be opened using a stand-alone operation and the
> command must not be part of a pipeline (or compound command) ending with ">
> t", otherwise the shell will wipe the file first (or at some unpredictable
> time)?
>
> Thanks for any answer.

I want to bring, again, to attention that this is an extereemly
unsafe construct - e.g., you will lose your file completely if
the filesystem is full - and it should therefore not be used or
advertised. Without need one would buy risks that can be avoided
in the first place by using safe shell constructs. Use explicit
temporaries and remove the original file only if the modified one
could be created completely. Is that really so unappealing to do,
and instead to favour a theoretically interesting but inherently
unsafe and obfuscating construct?

Janis

Re: filter and then pipe a file back to itself

am 03.04.2008 16:02:23 von PK

Janis wrote:

> I want to bring, again, to attention that this is an extereemly
> unsafe construct - e.g., you will lose your file completely if
> the filesystem is full - and it should therefore not be used or
> advertised. Without need one would buy risks that can be avoided
> in the first place by using safe shell constructs. Use explicit
> temporaries and remove the original file only if the modified one
> could be created completely. Is that really so unappealing to do,
> and instead to favour a theoretically interesting but inherently
> unsafe and obfuscating construct?

I'm not suggesting anyone to use this construct (sorry if it looked that
way), nor am I particularly enthusiast about it, nor would I use it
personally in my scripts.
Since I already know how to use a true temporary file, I was just trying to
understand how that particular (and, to me, partially new) construct works,
to get a better knowledge.

Thanks.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: filter and then pipe a file back to itself

am 03.04.2008 16:13:39 von Geoff Clare

Rahul wrote:

> Is there an easy / elegant way to read from a file, then process the
> stream with a set of filters and pipe back to the same file? For
> example:
>
> Let's say I have a file
>
> B
> C
> D
>
> and I want to add an A at the top to get a
>
> A
> B
> C
> D
>
> I'd like to do:
> ( echo "A" ; cat t ) > t
>
> but that spews an angry "cat: t: input file is output file"

(echo "A"; cat t) | sed '$!N;P;D' 1<> t

Caveat: the 'N;P;D' is an attempt to ensure sed has read the next
line before writing out the current one over the top of it. If any
line is (much) longer than the next, it may not work; it depends on
internal buffering in sed. It will also lose some data if killed
before it finishes.

This technique is more often associated with cases where data is
transformed in a 1-to-1 and repeatable way, e.g.

LC_ALL=C tr A-Z a-z < t 1<> t

Here if the process is killed before it finishes, the file will be
part-processed but the transformation can easily be completed just by
repeating the operation.

--
Geoff Clare

Re: filter and then pipe a file back to itself

am 03.04.2008 17:45:33 von Janis Papanagnou

On 3 Apr., 16:02, pk wrote:
> Janis wrote:
> > I want to bring, again, to attention that this is an extereemly
> > unsafe construct - e.g., you will lose your file completely if
> > the filesystem is full - and it should therefore not be used or
> > advertised. Without need one would buy risks that can be avoided
> > in the first place by using safe shell constructs. Use explicit
> > temporaries and remove the original file only if the modified one
> > could be created completely. Is that really so unappealing to do,
> > and instead to favour a theoretically interesting but inherently
> > unsafe and obfuscating construct?
>
> I'm not suggesting anyone to use this construct (sorry if it looked that
> way), nor am I particularly enthusiast about it, nor would I use it
> personally in my scripts.

Consider my warning to address the first sentence of the
OP's original posting, addressed not particularly to you.
The "I-want-to-do-everything-in-one-step approach" that is
occasionally requested (and which is syntactically possible
to achieve in many cases) often carries huge caveats that
should be pointed out when discussed.

Now, more specifically, addressed to your "is it correct"
question upthread; no, it is _generally_ *not* correct, e.g.
in case the file system is full, but technically it works the
way you described in case you have sufficient ressources and
no interrupts (or other conditions we currently do not think
of).

> Since I already know how to use a true temporary file, I was just trying to
> understand how that particular (and, to me, partially new) construct works,
> to get a better knowledge.

I hope the caveats are part of that knowledge. :-)

Janis

>
> Thanks.
>
> --
> All the commands are tested with bash and GNU tools, so they may use
> nonstandard features. I try to mention when something is nonstandard (if
> I'm aware of that), but I may miss something. Corrections are welcome.

Re: filter and then pipe a file back to itself

am 03.04.2008 18:44:51 von PK

Janis wrote:

> Consider my warning to address the first sentence of the
> OP's original posting, addressed not particularly to you.
> The "I-want-to-do-everything-in-one-step approach" that is
> occasionally requested (and which is syntactically possible
> to achieve in many cases) often carries huge caveats that
> should be pointed out when discussed.

Ok, no problem.

> Now, more specifically, addressed to your "is it correct"
> question upthread; no, it is _generally_ *not* correct, e.g.
> in case the file system is full, but technically it works the
> way you described in case you have sufficient ressources and
> no interrupts (or other conditions we currently do not think
> of).

That's enough for me...just what I wanted to confirm.

Thanks!

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.