Convert some files from html to plaintext
Convert some files from html to plaintext
am 11.11.2007 19:05:34 von lucavilla
I have many html files named like these:
c:\dir\femo-black.html
c:\dir\loren-white.html
c:\dir\spark-white.html
c:\dir\kim-black.html
c:\dir\paul-white.html
How can I convert only the files named "c:\dir\*-white.html" to
plaintext files named c:\dir\(original filename)-text.txt?
Is there a PHP module that does a good quality conversion HTML to
plaintext?
Re: Convert some files from html to plaintext
am 11.11.2007 20:53:53 von zeldorblat
On Nov 11, 1:05 pm, Luca Villa wrote:
> I have many html files named like these:
>
> c:\dir\femo-black.html
> c:\dir\loren-white.html
> c:\dir\spark-white.html
> c:\dir\kim-black.html
> c:\dir\paul-white.html
>
> How can I convert only the files named "c:\dir\*-white.html" to
> plaintext files named c:\dir\(original filename)-text.txt?
>
> Is there a PHP module that does a good quality conversion HTML to
> plaintext?
See this:
Re: Convert some files from html to plaintext
am 11.11.2007 20:58:50 von lucavilla
> See this:
>
>
Isn't there something of higher quality, like the rendering engine of
the textual browser Lynx?
Re: Convert some files from html to plaintext
am 11.11.2007 21:43:41 von Oli Thissen
On Nov 11, 8:58 pm, Luca Villa wrote:
> > See this:
>
> >
>
> Isn't there something of higher quality, like the rendering engine of
> the textual browser Lynx?
I guess you dont't simply want to remove all the tags. You rather want
to make sure, that the content of your -element is followed by an
empty line or that your
-elements are indented, etc.
This might seem a little oversized, but if all of your files have the
same structure, you might want to create an XSLT and have PHP
transform it to whatever strucure you prefer.
Check out the PHP manual here http://www.php.net/ref.xsl and maybe
this tutorial on XSLT http://www.w3schools.com/xsl/
Oli
Re: Convert some files from html to plaintext
am 11.11.2007 23:33:24 von lucavilla
Oli, there are ready and open source converters available like Lynx,
Links, ELinks, W3M etc...
I think that it's not the case to re-write with XSLT what's it's
already done by others with many years of work.
I hoped that PHP had an integrated solution for this, like the engine
of one of the mentioned textual browsers...
Re: Convert some files from html to plaintext
am 12.11.2007 13:00:21 von Ulf Kadner
Luca Villa wrote:
> Oli, there are ready and open source converters available like Lynx,
> Links, ELinks, W3M etc...
> I think that it's not the case to re-write with XSLT what's it's
> already done by others with many years of work.
> I hoped that PHP had an integrated solution for this, like the engine
> of one of the mentioned textual browsers...
Usually very simple. Install Lynx on youre server and call Lynx by one
of the command executing functions of PHP:
http://php.net/exec
Other Options you dont have without alot of work...
So long, Ulf
--
_,
_(_p> Ulf [Kado] Kadner
\<_)
^^
Re: Convert some files from html to plaintext
am 12.11.2007 19:33:49 von lucavilla
> Usually very simple. Install Lynx on youre server and call Lynx by one
> of the command executing functions of PHP:
That's the road I'm following, but calling an external program
thousands of times (I need to process thousand of files) is not much
efficient...
Re: Convert some files from html to plaintext
am 13.11.2007 01:56:59 von Ulf Kadner
Luca Villa wrote:
>> Usually very simple. Install Lynx on youre server and call Lynx by one
>> of the command executing functions of PHP:
>
> That's the road I'm following, but calling an external program
> thousands of times (I need to process thousand of files) is not much
> efficient...
sure, not a performance wonder :-)
better you write a shellscript that reads all resources from a file
(maybee dynamic generated) and handles it by lynx in a loop. Thats faster
So long, Ulf
--
_,
_(_p> Ulf [Kado] Kadner
\<_)
^^