looking for DOS-UNIX-Mac file detector

looking for DOS-UNIX-Mac file detector

am 09.10.2007 17:29:56 von Eric Pement

I am looking to see if anyone knows of a basic reporting tool that can
be passed a list of filenames on the command line, and for each file,
report whether the file has Unix line endings (LF), DOS line endings
(CR/LF), Mac line endings (CR), or something else (binary file,
inconsistent file, etc.). I don't want it to change the file, I just
want it to detect and report whether its newlines are for Unix, DOS/
Windows, or Mac systems.

I could probably script this, but if something has already been
written there's no reason for me to redo it. I am currently using
Cygwin, HP-UX, and various Unix types. A perl or awk script would
probably do this, but if there is a dedicated compiled utility already
out there or a good script already written, I would like to know about
it. Thanks.

Re: looking for DOS-UNIX-Mac file detector

am 09.10.2007 22:52:19 von Dummy

Eric Pement wrote:
> I am looking to see if anyone knows of a basic reporting tool that can
> be passed a list of filenames on the command line, and for each file,
> report whether the file has Unix line endings (LF), DOS line endings
> (CR/LF), Mac line endings (CR), or something else (binary file,
> inconsistent file, etc.). I don't want it to change the file, I just
> want it to detect and report whether its newlines are for Unix, DOS/
> Windows, or Mac systems.

man file


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: looking for DOS-UNIX-Mac file detector

am 11.10.2007 17:58:29 von Eric Pement

On Oct 9, 3:52 pm, "John W. Krahn" wrote:

> man file

That would be a solution where GNU file is available, using "file -e
soft filename", but I find from experimentation with Cygwin that
"file" misses some matches of bona-fide DOS files, incorrectly
reporting them simply as "data" when they should be reported as having
"CRLF line terminators". In short, it works some of the time but does
not work reliably.

More directly, I also need a solution that will work on Unixes that do
not have (and will not install) GNU file. Thanks anyway.

Eric

Re: looking for DOS-UNIX-Mac file detector

am 11.10.2007 18:24:48 von huge

On 2007-10-11, Eric Pement wrote:
> On Oct 9, 3:52 pm, "John W. Krahn" wrote:
>
>> man file
>
> That would be a solution where GNU file is available, using "file -e
> soft filename", but I find from experimentation with Cygwin that
> "file" misses some matches of bona-fide DOS files, incorrectly
> reporting them simply as "data" when they should be reported as having
> "CRLF line terminators". In short, it works some of the time but does
> not work reliably.
>
> More directly, I also need a solution that will work on Unixes that do
> not have (and will not install) GNU file. Thanks anyway.

"file" predates GNU, and AFAIK, all modern Unixes have it.



--
"Religion poisons everything."
[email me at huge {at} huge (dot) org uk]

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 04:52:37 von gazelle

In article , Huge wrote:
>On 2007-10-11, Eric Pement wrote:
>> On Oct 9, 3:52 pm, "John W. Krahn" wrote:
>>
>>> man file
>>
>> That would be a solution where GNU file is available, using "file -e
>> soft filename", but I find from experimentation with Cygwin that
>> "file" misses some matches of bona-fide DOS files, incorrectly
>> reporting them simply as "data" when they should be reported as having
>> "CRLF line terminators". In short, it works some of the time but does
>> not work reliably.
>>
>> More directly, I also need a solution that will work on Unixes that do
>> not have (and will not install) GNU file. Thanks anyway.
>
>"file" predates GNU, and AFAIK, all modern Unixes have it.

Francisco Franco - still dead.

I.e., that's not the point.

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 04:58:27 von Barry Margolin

In article ,
Huge wrote:

> On 2007-10-11, Eric Pement wrote:
> > On Oct 9, 3:52 pm, "John W. Krahn" wrote:
> >
> >> man file
> >
> > That would be a solution where GNU file is available, using "file -e
> > soft filename", but I find from experimentation with Cygwin that
> > "file" misses some matches of bona-fide DOS files, incorrectly
> > reporting them simply as "data" when they should be reported as having
> > "CRLF line terminators". In short, it works some of the time but does
> > not work reliably.
> >
> > More directly, I also need a solution that will work on Unixes that do
> > not have (and will not install) GNU file. Thanks anyway.
>
> "file" predates GNU, and AFAIK, all modern Unixes have it.

But do those older versions of find tell you the type of line breaks
that a text file uses?

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 09:15:11 von huge

On 2007-10-12, Barry Margolin wrote:
> In article ,
> Huge wrote:
>
>> On 2007-10-11, Eric Pement wrote:
>> > On Oct 9, 3:52 pm, "John W. Krahn" wrote:
>> >
>> >> man file
>> >
>> > That would be a solution where GNU file is available, using "file -e
>> > soft filename", but I find from experimentation with Cygwin that
>> > "file" misses some matches of bona-fide DOS files, incorrectly
>> > reporting them simply as "data" when they should be reported as having
>> > "CRLF line terminators". In short, it works some of the time but does
>> > not work reliably.
>> >
>> > More directly, I also need a solution that will work on Unixes that do
>> > not have (and will not install) GNU file. Thanks anyway.
>>
>> "file" predates GNU, and AFAIK, all modern Unixes have it.
>
> But do those older versions of find

What makes you think they are "old" versions of find?

> tell you the type of line breaks
> that a text file uses?

You pay my consultancy rate and I'll find out.


--
"Religion poisons everything."
[email me at huge {at} huge (dot) org uk]

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 09:33:19 von cfajohnson

On 2007-10-12, Barry Margolin wrote:
>
>
> In article ,
> Huge wrote:
>
>> On 2007-10-11, Eric Pement wrote:
>> > On Oct 9, 3:52 pm, "John W. Krahn" wrote:
>> >
>> >> man file
>> >
>> > That would be a solution where GNU file is available, using "file -e
>> > soft filename", but I find from experimentation with Cygwin that
>> > "file" misses some matches of bona-fide DOS files, incorrectly
>> > reporting them simply as "data" when they should be reported as having
>> > "CRLF line terminators". In short, it works some of the time but does
>> > not work reliably.
>> >
>> > More directly, I also need a solution that will work on Unixes that do
>> > not have (and will not install) GNU file. Thanks anyway.
>>
>> "file" predates GNU, and AFAIK, all modern Unixes have it.
>
> But do those older versions of find tell you the type of line breaks
> that a text file uses?

Yes, if that's defined in /etc/magic.

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 19:38:14 von dave+news001

Eric Pement wrote:
> I am looking to see if anyone knows of a basic reporting tool that can
> be passed a list of filenames on the command line, and for each file,
> report whether the file has Unix line endings (LF), DOS line endings
> (CR/LF), Mac line endings (CR), or something else (binary file,
> inconsistent file, etc.). I don't want it to change the file, I just
> want it to detect and report whether its newlines are for Unix, DOS/
> Windows, or Mac systems.

Bug: alternating isolated carriage-returns and line-feeds are
treated as carriage-return/line-feed pairs.

#! /bin/sh

for f in "$@" ; do
tr -cd '\015\012' < "$f" |
od -A n -t o1 |
awk -v f="$f" '
!/^[[:space:]]*\*?[[:space:]]*$/ { if (++n > 2) exit ; s = s $0 }
END {
t = "binary"
if (n <= 2) {
if (s ~ /^( 012)+$/) { t = "unix" }
else if (s ~ /^( 015 012)+$/) { t = "dos" }
else if (s ~ /^( 015)+$/) { t = "mac" }
}
printf "%s: %s\n", f, t
}'
done

Re: looking for DOS-UNIX-Mac file detector

am 12.10.2007 22:01:03 von William James

Eric Pement wrote:
> I am looking to see if anyone knows of a basic reporting tool that can
> be passed a list of filenames on the command line, and for each file,
> report whether the file has Unix line endings (LF), DOS line endings
> (CR/LF), Mac line endings (CR), or something else (binary file,
> inconsistent file, etc.). I don't want it to change the file, I just
> want it to detect and report whether its newlines are for Unix, DOS/
> Windows, or Mac systems.
>
> I could probably script this, but if something has already been
> written there's no reason for me to redo it. I am currently using
> Cygwin, HP-UX, and various Unix types. A perl or awk script would
> probably do this, but if there is a dedicated compiled utility already
> out there or a good script already written, I would like to know about
> it. Thanks.

#!awk
BEGIN { RS = FS = "\1" }

{
returns += gsub( /\r/, "&" )
linefeeds += gsub( /\n/, "&" )
pairs += gsub( /\r\n/, "&" )
}

END {
if (returns==linefeeds && returns==pairs && pairs)
{ print "dos"; exit }
if (returns==0 && linefeeds)
{ print "unix"; exit }
if (returns && linefeeds==0)
{ print "mac"; exit }
print "inconsistent or binary"
}

Re: looking for DOS-UNIX-Mac file detector

am 13.10.2007 05:09:38 von Barry Margolin

In article ,
Huge wrote:

> On 2007-10-12, Barry Margolin wrote:
> > In article ,
> > Huge wrote:
> >
> >> On 2007-10-11, Eric Pement wrote:
> >> > On Oct 9, 3:52 pm, "John W. Krahn" wrote:
> >> >
> >> >> man file
> >> >
> >> > That would be a solution where GNU file is available, using "file -e
> >> > soft filename", but I find from experimentation with Cygwin that
> >> > "file" misses some matches of bona-fide DOS files, incorrectly
> >> > reporting them simply as "data" when they should be reported as having
> >> > "CRLF line terminators". In short, it works some of the time but does
> >> > not work reliably.
> >> >
> >> > More directly, I also need a solution that will work on Unixes that do
> >> > not have (and will not install) GNU file. Thanks anyway.
> >>
> >> "file" predates GNU, and AFAIK, all modern Unixes have it.
> >
> > But do those older versions of find
>
> What makes you think they are "old" versions of find?

The word "predates".

>
> > tell you the type of line breaks
> > that a text file uses?
>
> You pay my consultancy rate and I'll find out.

You're the one who implied that non-GNU versions of file can serve the
OP's purpose. Doesn't that suggest that you already know the answer?

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***