best way to determine (MIME) content type of a stream of bytes?

best way to determine (MIME) content type of a stream of bytes?

am 08.03.2010 16:57:14 von rpjday

hi, i'm interested in the most comprehensive way to determine the
content type of a stream of bytes that's been uploaded to a PHP
script? assuming that the bytes are uploaded simply via a POST
parameter, i can see that there are a couple ways to do it:

* getimagesize()
* FileInfo

i've been doing some testing this morning and a few video formats
handed to FileInfo come back as "application/octet-stream" which isn't
particularly informative. and i want to support as many different
formats of image, audio and video as possible.

so ... what's the best way? oh, by the way, when i used fileinfo, i
didn't bother handing over a magic file. i'm starting to think that
would make a difference. and is there a noticeable advantage to
upgrading to PHP 5.3 since the server (centos 5.4) is currently
running only PHP 5.1.6. thanks.

rday
--

============================================================ ============
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Kernel Pedantry.

Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
============================================================ ============

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: best way to determine (MIME) content type of a stream ofbytes?

am 08.03.2010 17:14:11 von Ashley Sheridan

--=-RniUknbxGHxmpGvgmTUf
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Mon, 2010-03-08 at 10:57 -0500, Robert P. J. Day wrote:

> hi, i'm interested in the most comprehensive way to determine the
> content type of a stream of bytes that's been uploaded to a PHP
> script? assuming that the bytes are uploaded simply via a POST
> parameter, i can see that there are a couple ways to do it:
>
> * getimagesize()
> * FileInfo
>
> i've been doing some testing this morning and a few video formats
> handed to FileInfo come back as "application/octet-stream" which isn't
> particularly informative. and i want to support as many different
> formats of image, audio and video as possible.
>
> so ... what's the best way? oh, by the way, when i used fileinfo, i
> didn't bother handing over a magic file. i'm starting to think that
> would make a difference. and is there a noticeable advantage to
> upgrading to PHP 5.3 since the server (centos 5.4) is currently
> running only PHP 5.1.6. thanks.
>
> rday
> --
>
> ============================================================ ============
> Robert P. J. Day Waterloo, Ontario, CANADA
>
> Linux Consulting, Training and Kernel Pedantry.
>
> Web page: http://crashcourse.ca
> Twitter: http://twitter.com/rpjday
> ============================================================ ============
>


If you're wanting to grab details about a clip, what about using mplayer
for dealing with video clips. It has more than a few command line
options that can return various levels of detail about a media file. You
could use the extension of the clip as a hint about what way you can
determine a files exact type. So, if a file came in with a jpg, png or
gif extension, you could use GD functions to determine if it's really an
image. If it's a .avi, .mpg, .mp4, .mp3, .ogg, you could use mplayer to
deal with it.

This does seem to ba a bit of an area where PHP is lacking. Even the
manual pages are cryptic. It seems to suggest that the Mime functions
which we should use in-place of deprecated ones themselves rely on those
same deprecated functions!

Having said that, I've had good results from using "file -f filename" on
Linux, which is using version 5.03 on my system.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-RniUknbxGHxmpGvgmTUf--

Re: best way to determine (MIME) content type of a stream ofbytes?

am 08.03.2010 17:31:59 von Ashley Sheridan

--=-S/2S+YCW2rLyJml9zEOp
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Mon, 2010-03-08 at 11:33 -0500, Robert P. J. Day wrote:

> On Mon, 8 Mar 2010, Ashley Sheridan wrote:
>
> > On Mon, 2010-03-08 at 10:57 -0500, Robert P. J. Day wrote:
> >
> > > hi, i'm interested in the most comprehensive way to determine the
> > > content type of a stream of bytes that's been uploaded to a PHP
> > > script? assuming that the bytes are uploaded simply via a POST
> > > parameter, i can see that there are a couple ways to do it:
> > >
> > > * getimagesize()
> > > * FileInfo
> > >
> > > i've been doing some testing this morning and a few video formats
> > > handed to FileInfo come back as "application/octet-stream" which
> > > isn't particularly informative. and i want to support as many
> > > different formats of image, audio and video as possible.
> > >
> > > so ... what's the best way? oh, by the way, when i used
> > > fileinfo, i didn't bother handing over a magic file. i'm starting
> > > to think that would make a difference. and is there a noticeable
> > > advantage to upgrading to PHP 5.3 since the server (centos 5.4) is
> > > currently running only PHP 5.1.6. thanks.
>
> > If you're wanting to grab details about a clip, what about using
> > mplayer for dealing with video clips. It has more than a few command
> > line options that can return various levels of detail about a media
> > file. You could use the extension of the clip as a hint about what
> > way you can determine a files exact type. So, if a file came in with
> > a jpg, png or gif extension, you could use GD functions to determine
> > if it's really an image. If it's a .avi, .mpg, .mp4, .mp3, .ogg, you
> > could use mplayer to deal with it.
>
> in order to make life as difficult as possible, all i can assume is
> an incoming stream of bytes. i will have no idea where it came from,
> or its original file name. all of the mime/type identification has to
> be done by the PHP script on the server end, based solely on the
> content. (i'm fairly sure that means a "magic" file will have to be
> involved.)
>
> > This does seem to ba a bit of an area where PHP is lacking. Even the
> > manual pages are cryptic. It seems to suggest that the Mime
> > functions which we should use in-place of deprecated ones themselves
> > rely on those same deprecated functions!
>
> i have noticed that. the "mime_content_type()" function looked like
> a good candidate but it's marked as deprecated. the best option
> appears to be the Fileinfo stuff.
>
> rday
> --
>
> ============================================================ ============
> Robert P. J. Day Waterloo, Ontario, CANADA
>
> Linux Consulting, Training and Kernel Pedantry.
>
> Web page: http://crashcourse.ca
> Twitter: http://twitter.com/rpjday
> ============================================================ ============
>


What about writing the first n bytes to a file and then passing that to
the command line? I'm assuming a Linux server here, but it should do the
trick.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-S/2S+YCW2rLyJml9zEOp--

Re: best way to determine (MIME) content type of a stream ofbytes?

am 08.03.2010 17:33:18 von rpjday

On Mon, 8 Mar 2010, Ashley Sheridan wrote:

> On Mon, 2010-03-08 at 10:57 -0500, Robert P. J. Day wrote:
>
> > hi, i'm interested in the most comprehensive way to determine the
> > content type of a stream of bytes that's been uploaded to a PHP
> > script? assuming that the bytes are uploaded simply via a POST
> > parameter, i can see that there are a couple ways to do it:
> >
> > * getimagesize()
> > * FileInfo
> >
> > i've been doing some testing this morning and a few video formats
> > handed to FileInfo come back as "application/octet-stream" which
> > isn't particularly informative. and i want to support as many
> > different formats of image, audio and video as possible.
> >
> > so ... what's the best way? oh, by the way, when i used
> > fileinfo, i didn't bother handing over a magic file. i'm starting
> > to think that would make a difference. and is there a noticeable
> > advantage to upgrading to PHP 5.3 since the server (centos 5.4) is
> > currently running only PHP 5.1.6. thanks.

> If you're wanting to grab details about a clip, what about using
> mplayer for dealing with video clips. It has more than a few command
> line options that can return various levels of detail about a media
> file. You could use the extension of the clip as a hint about what
> way you can determine a files exact type. So, if a file came in with
> a jpg, png or gif extension, you could use GD functions to determine
> if it's really an image. If it's a .avi, .mpg, .mp4, .mp3, .ogg, you
> could use mplayer to deal with it.

in order to make life as difficult as possible, all i can assume is
an incoming stream of bytes. i will have no idea where it came from,
or its original file name. all of the mime/type identification has to
be done by the PHP script on the server end, based solely on the
content. (i'm fairly sure that means a "magic" file will have to be
involved.)

> This does seem to ba a bit of an area where PHP is lacking. Even the
> manual pages are cryptic. It seems to suggest that the Mime
> functions which we should use in-place of deprecated ones themselves
> rely on those same deprecated functions!

i have noticed that. the "mime_content_type()" function looked like
a good candidate but it's marked as deprecated. the best option
appears to be the Fileinfo stuff.

rday
--

============================================================ ============
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Kernel Pedantry.

Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
============================================================ ============

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: best way to determine (MIME) content type of a stream ofbytes?

am 08.03.2010 17:35:22 von Ashley Sheridan

--=-tftG1UpF6HW61lRTxLHr
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Mon, 2010-03-08 at 11:37 -0500, Robert P. J. Day wrote:

> On Mon, 8 Mar 2010, Ashley Sheridan wrote:
>
> > What about writing the first n bytes to a file and then passing that
> > to the command line? I'm assuming a Linux server here, but it should
> > do the trick.
>
> gaaaaah! i was hoping for something that wouldn't make me want to
> gouge out my eyes with a soup spoon. :-)
>
> rday
> --
>
> ============================================================ ============
> Robert P. J. Day Waterloo, Ontario, CANADA
>
> Linux Consulting, Training and Kernel Pedantry.
>
> Web page: http://crashcourse.ca
> Twitter: http://twitter.com/rpjday
> ============================================================ ============
>


Lol, that's about the easiest way I can think of doing it reliably!

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-tftG1UpF6HW61lRTxLHr--

Re: best way to determine (MIME) content type of a stream ofbytes?

am 08.03.2010 17:37:24 von rpjday

On Mon, 8 Mar 2010, Ashley Sheridan wrote:

> What about writing the first n bytes to a file and then passing that
> to the command line? I'm assuming a Linux server here, but it should
> do the trick.

gaaaaah! i was hoping for something that wouldn't make me want to
gouge out my eyes with a soup spoon. :-)

rday
--

============================================================ ============
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Kernel Pedantry.

Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
============================================================ ============

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: best way to determine (MIME) content type of a stream ofbytes?

am 09.03.2010 09:39:52 von Auke van Slooten

Robert P. J. Day wrote:
> On Mon, 8 Mar 2010, Ashley Sheridan wrote:
>
>> What about writing the first n bytes to a file and then passing that
>> to the command line? I'm assuming a Linux server here, but it should
>> do the trick.
>
> gaaaaah! i was hoping for something that wouldn't make me want to
> gouge out my eyes with a soup spoon. :-)

Maybe slightly less painfull, you can always write your own 'mimemagic'
detection method. The magic.mime file is relatively easy to parse. This
is the route we took some years ago to make our mime detection OS
independant. We've made a php script which parses the mime.types file
and the magic.mime file and it generates a php module which uses that
information to figure out the correct mimetype.

The resulting php module has a large array, which looks like this:

...
$mimemagic_data[0][4]["\0\0\1\273"]="video/mpeg";
$mimemagic_data[0][4]["\0\0\2\0"]="application/x-123";
$mimemagic_data[0][4]["\0\0\32\0"]="application/x-123";
$mimemagic_data[0][4]["\0\6\25\141"]="application/x-dbm";
$mimemagic_data[0][4]["\101\104\111\106"]="audio/X-HX-AAC-AD IF";
$mimemagic_data[0][4]["\103\120\103\262"]="image/x-cpi";
...

with the first key as the offset to start, the next key is the length of
the snippet to check (I guess that could have been skipped...) and the
final key is the exact string to match.

The magic.mime file is no magic bullet though, there are occasions when
it won't match with a file, but that's usually with more complex types
like microsoft office documents, not with images.

If you're interested, the mimemagic module can be found here:
http://svn.muze.nl/trunk/lib/modules/mod_mimemagic.php?revis ion=4299&root=ariadne-php5

And the builder script (which you should run on a unix system with
magic.mime and mime.types file) is here:
http://svn.muze.nl/trunk/bin/utils/build_mimemagic_script.ph p?revision=4299&root=ariadne-php5

Hope this helps,
Auke van Slooten
Muze

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php