What is the best html to latex program on the market or the internet ?

What is the best html to latex program on the market or the internet ?

am 22.10.2007 23:57:15 von vasan999

Basically, it should do all that any of the tools below and in
addition,

1/
human readable output that maintains the text lines of the source, ie
does not scramble the text lines or insert newlines unnecessarily or
removes them. inserts minimal latex elements.

2/
maintains cross-links, ie convert
but if the set of htmls is incomplete proceed with the assumption that
the reference is there, ie dont delete the links or try to modify them
or their addresses. One of the tool I tested is too smart in this
respect and actually ruins the result.

3/
proper conversion of images, tables, etc. No math mode involved in
html.


4/
Even an emacs lisp function could be written by a guru that can do the
job.

5/
Is there any commercial wysiwig tool ?


LaTeX etc

* html2latex is a program based on the NCSA html parser. Contact:
Nathan.Torkington@vuw.ac.nz.
* Another html2latex can combine several HTML files into a single
LaTeX file, converting links between the files to references. External
URL's can be converted into footnotes or into a bibliography sorted on
URL. Contact: F.J.Faase@cs.utwente.nl (Frans J. Faase)
* Another html2latex implemented on Linux by yacc+lex+C. Also
available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
Contact: naochan@naochan.com (Naoya Tozuka)
* htmlatex.pl is a perl script to do the conversion (may be moving
soon). Contact: n9146070@cc.wwu.edu (Jake Kesinger)
* There is also a sed script to convert HTML into LaTeX.

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 02:05:22 von vasan999

The site says, that this will convert html to latex. Can anyone
explain me this
code? I am not familiar with such difficult commands especially there
are no
comments line by line explanation and overall operation.

1i\
\\documentstyle{article}
1i\
\\begin{document}
$a\
\\end{document}
# Too bad there's no way to make sed ignore case!
/<[Xx][Mm][Pp]>/,/<.[Xx][Mm][Pp]>/b lit
/<.[Xx][Mm][Pp]>/b lit
/<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/,/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b
lit
/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit
/<[Pp][Rr][Ee]>/,/<.[Pp][Rr][Ee]>/b pre
/<.[Pp][Rr][Ee]>/b pre
# Stuff to ignore
s?<[Ii][Ss][Ii][Nn][Dd][Ee][Xx]>??
s???g
s?<[Nn][Ee][Xx][Tt][Ii][Dd][^>]*>??g
# character set translations for LaTex special chars
s?>.?>?g
s?<.? s?\\?\\backslash ?g
s?{?\\{?g
s?}?\\}?g
s?%?\\%?g
s?\$?\\$?g
s?&?\\&?g
s?#?\\#?g
s?_?\\_?g
s?~?\\~?g
s?\^?\\^?g
# Paragraph borders
s?<[Pp]>?\\par ?g
s???g
# Headings
s?<[Tt][Ii][Tt][Ll][Ee]>\([^<]*\)?\
\section*{\1}?g
s?<[Hh]n>?\\part{?g
s??}?g
s?<[Hh]1>?\\section*{?g
s??}?g
s?<[Hh]2>?\\subsection*{?g
s?<[Hh]3>?\\subsubsection*{?g
s?<[Hh]4>?\\subsubsection*{?g
s?<[Hh]5>?\\paragraph{?g
s?<[Hh]6>?\\subparagraph{?g
# UL is itemize
s?<[Uu][Ll]>?\\begin{itemize}?g
s??\\end{itemize}?g
s?<[Ll][Ii]>?\\item ?g
# DL is description
s?<[Dd][Ll]>?\\begin{description}?g
s??\\end{description}?g
# closing delimiter for DT is first < or end of line which ever comes
first NO
#s?<[Dd][Tt]>\([^<]*\) #s?<[Dd][Tt]>\([^<]*\)$?\\item[\1]?g
#s?<[Dd][Dd]>??g
s?<[Dd][Tt]>?\\item[ s?<[Dd][Dd]>?]?g
# Other common SGML markup. this is ad-hoc
s???
s?
??g
# Italics
s?\([^<]*\)?{\\it \1 }?g
# Get rid of Anchors
:pre
s?<[Aa][^>]*>??g
s???g
# This is a subroutine in sed, in case you are not a sed guru
: lit
s?<[Xx][Mm][Pp]>?\\begin{verbatim}?g
s??\\end{verbatim}?
s?<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\begin{verbatim}?g
s??\\end{verbatim}?


On Oct 22, 2:57 pm, vasan...@hotmail.com wrote:
> Basically, it should do all that any of the tools below and in
> addition,
>
> 1/
> human readable output that maintains the text lines of the source, ie
> does not scramble the text lines or insert newlines unnecessarily or
> removes them. inserts minimal latex elements.
>
> 2/
> maintains cross-links, ie convert >
> but if the set of htmls is incomplete proceed with the assumption that
> the reference is there, ie dont delete the links or try to modify them
> or their addresses. One of the tool I tested is too smart in this
> respect and actually ruins the result.
>
> 3/
> proper conversion of images, tables, etc. No math mode involved in
> html.
>
> 4/
> Even an emacs lisp function could be written by a guru that can do the
> job.
>
> 5/
> Is there any commercial wysiwig tool ?
>
> LaTeX etc
>
> * html2latex is a program based on the NCSA html parser. Contact:
> Nathan.Torking...@vuw.ac.nz.
> * Another html2latex can combine several HTML files into a single
> LaTeX file, converting links between the files to references. External
> URL's can be converted into footnotes or into a bibliography sorted on
> URL. Contact: F.J.Fa...@cs.utwente.nl (Frans J. Faase)
> * Another html2latex implemented on Linux by yacc+lex+C. Also
> available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
> Contact: naoc...@naochan.com (Naoya Tozuka)
> * htmlatex.pl is a perl script to do the conversion (may be moving
> soon). Contact: n9146...@cc.wwu.edu (Jake Kesinger)
> * There is also a sed script to convert HTML into LaTeX.

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 02:05:22 von vasan999

The site says, that this will convert html to latex. Can anyone
explain me this
code? I am not familiar with such difficult commands especially there
are no
comments line by line explanation and overall operation.

1i\
\\documentstyle{article}
1i\
\\begin{document}
$a\
\\end{document}
# Too bad there's no way to make sed ignore case!
/<[Xx][Mm][Pp]>/,/<.[Xx][Mm][Pp]>/b lit
/<.[Xx][Mm][Pp]>/b lit
/<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/,/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b
lit
/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit
/<[Pp][Rr][Ee]>/,/<.[Pp][Rr][Ee]>/b pre
/<.[Pp][Rr][Ee]>/b pre
# Stuff to ignore
s?<[Ii][Ss][Ii][Nn][Dd][Ee][Xx]>??
s???g
s?<[Nn][Ee][Xx][Tt][Ii][Dd][^>]*>??g
# character set translations for LaTex special chars
s?>.?>?g
s?<.? s?\\?\\backslash ?g
s?{?\\{?g
s?}?\\}?g
s?%?\\%?g
s?\$?\\$?g
s?&?\\&?g
s?#?\\#?g
s?_?\\_?g
s?~?\\~?g
s?\^?\\^?g
# Paragraph borders
s?<[Pp]>?\\par ?g
s???g
# Headings
s?<[Tt][Ii][Tt][Ll][Ee]>\([^<]*\)?\
\section*{\1}?g
s?<[Hh]n>?\\part{?g
s??}?g
s?<[Hh]1>?\\section*{?g
s??}?g
s?<[Hh]2>?\\subsection*{?g
s?<[Hh]3>?\\subsubsection*{?g
s?<[Hh]4>?\\subsubsection*{?g
s?<[Hh]5>?\\paragraph{?g
s?<[Hh]6>?\\subparagraph{?g
# UL is itemize
s?<[Uu][Ll]>?\\begin{itemize}?g
s??\\end{itemize}?g
s?<[Ll][Ii]>?\\item ?g
# DL is description
s?<[Dd][Ll]>?\\begin{description}?g
s??\\end{description}?g
# closing delimiter for DT is first < or end of line which ever comes
first NO
#s?<[Dd][Tt]>\([^<]*\) #s?<[Dd][Tt]>\([^<]*\)$?\\item[\1]?g
#s?<[Dd][Dd]>??g
s?<[Dd][Tt]>?\\item[ s?<[Dd][Dd]>?]?g
# Other common SGML markup. this is ad-hoc
s???
s?
??g
# Italics
s?\([^<]*\)?{\\it \1 }?g
# Get rid of Anchors
:pre
s?<[Aa][^>]*>??g
s???g
# This is a subroutine in sed, in case you are not a sed guru
: lit
s?<[Xx][Mm][Pp]>?\\begin{verbatim}?g
s??\\end{verbatim}?
s?<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\begin{verbatim}?g
s??\\end{verbatim}?


On Oct 22, 2:57 pm, vasan...@hotmail.com wrote:
> Basically, it should do all that any of the tools below and in
> addition,
>
> 1/
> human readable output that maintains the text lines of the source, ie
> does not scramble the text lines or insert newlines unnecessarily or
> removes them. inserts minimal latex elements.
>
> 2/
> maintains cross-links, ie convert >
> but if the set of htmls is incomplete proceed with the assumption that
> the reference is there, ie dont delete the links or try to modify them
> or their addresses. One of the tool I tested is too smart in this
> respect and actually ruins the result.
>
> 3/
> proper conversion of images, tables, etc. No math mode involved in
> html.
>
> 4/
> Even an emacs lisp function could be written by a guru that can do the
> job.
>
> 5/
> Is there any commercial wysiwig tool ?
>
> LaTeX etc
>
> * html2latex is a program based on the NCSA html parser. Contact:
> Nathan.Torking...@vuw.ac.nz.
> * Another html2latex can combine several HTML files into a single
> LaTeX file, converting links between the files to references. External
> URL's can be converted into footnotes or into a bibliography sorted on
> URL. Contact: F.J.Fa...@cs.utwente.nl (Frans J. Faase)
> * Another html2latex implemented on Linux by yacc+lex+C. Also
> available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
> Contact: naoc...@naochan.com (Naoya Tozuka)
> * htmlatex.pl is a perl script to do the conversion (may be moving
> soon). Contact: n9146...@cc.wwu.edu (Jake Kesinger)
> * There is also a sed script to convert HTML into LaTeX.

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 03:26:10 von vasan999

maybe I should post in european tex groups also

On Oct 22, 2:57 pm, vasan...@hotmail.com wrote:
> Basically, it should do all that any of the tools below and in
> addition,
>
> 1/
> human readable output that maintains the text lines of the source, ie
> does not scramble the text lines or insert newlines unnecessarily or
> removes them. inserts minimal latex elements.
>
> 2/
> maintains cross-links, ie convert >
> but if the set of htmls is incomplete proceed with the assumption that
> the reference is there, ie dont delete the links or try to modify them
> or their addresses. One of the tool I tested is too smart in this
> respect and actually ruins the result.
>
> 3/
> proper conversion of images, tables, etc. No math mode involved in
> html.
>
> 4/
> Even an emacs lisp function could be written by a guru that can do the
> job.
>
> 5/
> Is there any commercial wysiwig tool ?
>
> LaTeX etc
>
> * html2latex is a program based on the NCSA html parser. Contact:
> Nathan.Torking...@vuw.ac.nz.
> * Another html2latex can combine several HTML files into a single
> LaTeX file, converting links between the files to references. External
> URL's can be converted into footnotes or into a bibliography sorted on
> URL. Contact: F.J.Fa...@cs.utwente.nl (Frans J. Faase)
> * Another html2latex implemented on Linux by yacc+lex+C. Also
> available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
> Contact: naoc...@naochan.com (Naoya Tozuka)
> * htmlatex.pl is a perl script to do the conversion (may be moving
> soon). Contact: n9146...@cc.wwu.edu (Jake Kesinger)
> * There is also a sed script to convert HTML into LaTeX.

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 10:33:32 von Edd Barrett

On Oct 23, 2:26 am, vasan...@hotmail.com wrote:
> maybe I should post in european tex groups also
>
> On Oct 22, 2:57 pm, vasan...@hotmail.com wrote:
>
> > Basically, it should do all that any of the tools below and in
> > addition,
>
> > 1/
> > human readable output that maintains the text lines of the source, ie
> > does not scramble the text lines or insert newlines unnecessarily or
> > removes them. inserts minimal latex elements.
>
> > 2/
> > maintains cross-links, ie convert >
> > but if the set of htmls is incomplete proceed with the assumption that
> > the reference is there, ie dont delete the links or try to modify them
> > or their addresses. One of the tool I tested is too smart in this
> > respect and actually ruins the result.
>
> > 3/
> > proper conversion of images, tables, etc. No math mode involved in
> > html.
>
> > 4/
> > Even an emacs lisp function could be written by a guru that can do the
> > job.
>
> > 5/
> > Is there any commercial wysiwig tool ?
>
> > LaTeX etc
>
> > * html2latex is a program based on the NCSA html parser. Contact:
> > Nathan.Torking...@vuw.ac.nz.
> > * Another html2latex can combine several HTML files into a single
> > LaTeX file, converting links between the files to references. External
> > URL's can be converted into footnotes or into a bibliography sorted on
> > URL. Contact: F.J.Fa...@cs.utwente.nl (Frans J. Faase)
> > * Another html2latex implemented on Linux by yacc+lex+C. Also
> > available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
> > Contact: naoc...@naochan.com (Naoya Tozuka)
> > * htmlatex.pl is a perl script to do the conversion (may be moving
> > soon). Contact: n9146...@cc.wwu.edu (Jake Kesinger)
> > * There is also a sed script to convert HTML into LaTeX.

Hi,

I don't know if this can be of help:
http://openwetware.org/wiki/User:Austin_J._Che/Extensions/La texDoc

This is something that we are looking into to allow researchers to
distribute documents in both PDF and web-based (we hope).

Thanks

Edd

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 20:13:09 von metaperl

I like PlasTeX.SF.Net

> Basically, it should do all that any of the tools below and in
> addition,

Re: What is the best html to latex program on the market or the internet ?

am 23.10.2007 20:44:13 von gnuist006

On Oct 23, 11:13 am, "metaperl.com" wrote:
> I like PlasTeX.SF.Net
>
> > Basically, it should do all that any of the tools below and in
> > addition,

I think OP wanted html->latex

http://plastex.sourceforge.net/

SAS is currently using plasTeX to generate HTML and DocBook for
10,000+ pages of scientific documentation nightly.

Re: What is the best html to latex program on the market or the internet?

am 24.10.2007 00:24:50 von Peter Flynn

vasan999@hotmail.com wrote:
> The site says, that this will convert html to latex. Can anyone
> explain me this code? I am not familiar with such difficult commands
> especially there are no comments line by line explanation and overall
> operation.
>
> 1i\
> \\documentstyle{article}
[snip]

This is a sed(1) script. sed is a stream editor, available on most
platforms.

///Peter

Re: What is the best html to latex program on the market or the internet?

am 24.10.2007 00:24:50 von Peter Flynn

vasan999@hotmail.com wrote:
> The site says, that this will convert html to latex. Can anyone
> explain me this code? I am not familiar with such difficult commands
> especially there are no comments line by line explanation and overall
> operation.
>
> 1i\
> \\documentstyle{article}
[snip]

This is a sed(1) script. sed is a stream editor, available on most
platforms.

///Peter

Re: What is the best html to latex program on the market or the internet?

am 24.10.2007 00:27:29 von Peter Flynn

vasan999@hotmail.com wrote:
> Basically, it should do all that any of the tools below and in
> addition,

You've already asked this, and been given the answer, but in case you
didn't see it...

XSLT.

Run your HTML through Tidy to produce XHTML.
Then write an XSLT script to transform it to LaTeX.
This gives you 100% control and ensures robustness.

However, handling all the stupid things HTML authors do may make it
long-winded if you want to cope with them all. On the other hand, if
you are dealing with a reasonably consistent subset, it's probably the
most reliable method.

///Peter

Re: What is the best html to latex program on the market or the internet ?

am 24.10.2007 05:42:46 von gnuist006

On Oct 23, 3:27 pm, Peter Flynn wrote:
> vasan...@hotmail.com wrote:
> > Basically, it should do all that any of the tools below and in
> > addition,
>
> You've already asked this, and been given the answer, but in case you
> didn't see it...
>
> XSLT.
>
> Run your HTML through Tidy to produce XHTML.
> Then write an XSLT script to transform it to LaTeX.
> This gives you 100% control and ensures robustness.
>
> However, handling all the stupid things HTML authors do may make it
> long-winded if you want to cope with them all. On the other hand, if
> you are dealing with a reasonably consistent subset, it's probably the
> most reliable method.
>
> ///Peter

forgot to cc to myself.
Janusz

Re: What is the best html to latex program on the market or the internet ?

am 24.10.2007 09:08:53 von Victor Ivrii

On Oct 23, 6:27 pm, Peter Flynn wrote:
> vasan...@hotmail.com wrote:
> > Basically, it should do all that any of the tools below and in
> > addition,
>
> You've already asked this, and been given the answer, but in case you
> didn't see it...
>
> XSLT.
>
> Run your HTML through Tidy to produce XHTML.
> Then write an XSLT script to transform it to LaTeX.
> This gives you 100% control and ensures robustness.
>
> However, handling all the stupid things HTML authors do may make it
> long-winded if you want to cope with them all. On the other hand, if
> you are dealing with a reasonably consistent subset, it's probably the
> most reliable method.

One should remember that while tex parser (tex/latex/...) can run in
quiet mode, it is not a default and finished tex document normally
does not contain any tex errors. Meanwhile few html parsers (web
browsers) even advise about errors. As a result absolute majority of
html sources contain errors, from few to few hundreds (the latter is
the case usually with commercial web pages, produced by community
colleges graduates, who check their pages only against a specific
version of MSIE). The task of converting of such html sources to error-
free tex ones seems to be a really daunting




>
> ///Peter

Re: What is the best html to latex program on the market or the internet ?

am 24.10.2007 17:21:29 von tsy

On Oct 24, 5:27 am, Peter Flynn wrote:
> vasan...@hotmail.com wrote:
> Run your HTML through Tidy to produce XHTML.
> Then write an XSLT script to transform it to LaTeX.
> This gives you 100% control and ensures robustness.
Is XSLT way easier than using a decent scripting language with a SAX
library?

Re: What is the best html to latex program on the market or theinternet ?

am 27.10.2007 01:06:01 von Peter Flynn

On Wed, 24 Oct 2007 08:21:29 -0700, tsy wrote:

> On Oct 24, 5:27 am, Peter Flynn wrote:
>> vasan...@hotmail.com wrote:
>> Run your HTML through Tidy to produce XHTML. Then write an XSLT script
>> to transform it to LaTeX. This gives you 100% control and ensures
>> robustness.
> Is XSLT way easier than using a decent scripting language with a SAX
> library?

Yes. XSLT *is* a decent scripting (well, transformation-to-other-formats)
language.

///Peter