PHP Read PDF
am 19.09.2007 17:14:38 von Shelly
I had to do my first investigation regarding PDF files. Surprisingly, I
found that the only functions in PHP were for creating PDF files.
The potential customer receives order forms from the corporate headquarters
and they are PDF forms. What we want to do is to extract information from
these forms and process the data into a database. To do this we need to
read certain set fields. Nowhere did I find a function to be able to read
PDF files, let alone extract information from them.
My thoughts, in the absence of this function, would be if there were a way
to open the file, strip the formatting, and then work on the text stream.
The key unknown for me in this is how to strip the formatting.
So, do I hear any suggestions for either?:
(1) How to read predetermined field entries from a PDF file or
(2) How to convert a PDF into an unformatted text stream
Shelly
Re: PHP Read PDF
am 19.09.2007 20:48:33 von Shelly
Any suggestions?
"Shelly" wrote in message
news:13f2f8uqm3eck19@corp.supernews.com...
>I had to do my first investigation regarding PDF files. Surprisingly, I
>found that the only functions in PHP were for creating PDF files.
>
> The potential customer receives order forms from the corporate
> headquarters and they are PDF forms. What we want to do is to extract
> information from these forms and process the data into a database. To do
> this we need to read certain set fields. Nowhere did I find a function to
> be able to read PDF files, let alone extract information from them.
>
> My thoughts, in the absence of this function, would be if there were a way
> to open the file, strip the formatting, and then work on the text stream.
> The key unknown for me in this is how to strip the formatting.
>
> So, do I hear any suggestions for either?:
> (1) How to read predetermined field entries from a PDF file or
> (2) How to convert a PDF into an unformatted text stream
>
> Shelly
>
Re: PHP Read PDF
am 19.09.2007 21:46:36 von Good Man
"Shelly" wrote in
news:13f2ro925ga7teb@corp.supernews.com:
> Any suggestions?
>
> "Shelly" wrote in message
> news:13f2f8uqm3eck19@corp.supernews.com...
>>I had to do my first investigation regarding PDF files. Surprisingly,
>>I found that the only functions in PHP were for creating PDF files.
>>
>> The potential customer receives order forms from the corporate
>> headquarters and they are PDF forms. What we want to do is to
>> extract information from these forms and process the data into a
>> database. To do this we need to read certain set fields. Nowhere
>> did I find a function to be able to read PDF files, let alone extract
>> information from them.
>>
>> My thoughts, in the absence of this function, would be if there were
>> a way to open the file, strip the formatting, and then work on the
>> text stream. The key unknown for me in this is how to strip the
>> formatting.
>>
>> So, do I hear any suggestions for either?:
>> (1) How to read predetermined field entries from a PDF file or
>> (2) How to convert a PDF into an unformatted text stream
>>
>> Shelly
>>
yikes, found this expensive option via the folks at pdflib:
http://www.pdflib.com/products/tet/
.... also found a link that suggests PDF files are just gzipped XML, so
maybe you could write your own extractor:
http://www.thescripts.com/forum/thread631837.html
Re: PHP Read PDF
am 19.09.2007 22:12:48 von Shelly
"Good Man" wrote in message
news:Xns99B0A095D3484sonicyouth@216.196.97.131...
> "Shelly" wrote in
> news:13f2ro925ga7teb@corp.supernews.com:
>
>> Any suggestions?
>>
>> "Shelly" wrote in message
>> news:13f2f8uqm3eck19@corp.supernews.com...
>>>I had to do my first investigation regarding PDF files. Surprisingly,
>>>I found that the only functions in PHP were for creating PDF files.
>>>
>>> The potential customer receives order forms from the corporate
>>> headquarters and they are PDF forms. What we want to do is to
>>> extract information from these forms and process the data into a
>>> database. To do this we need to read certain set fields. Nowhere
>>> did I find a function to be able to read PDF files, let alone extract
>>> information from them.
>>>
>>> My thoughts, in the absence of this function, would be if there were
>>> a way to open the file, strip the formatting, and then work on the
>>> text stream. The key unknown for me in this is how to strip the
>>> formatting.
>>>
>>> So, do I hear any suggestions for either?:
>>> (1) How to read predetermined field entries from a PDF file or
>>> (2) How to convert a PDF into an unformatted text stream
>>>
>>> Shelly
>>>
>
> yikes, found this expensive option via the folks at pdflib:
>
> http://www.pdflib.com/products/tet/
yikes is an understatement
>
>
> ... also found a link that suggests PDF files are just gzipped XML, so
> maybe you could write your own extractor:
>
> http://www.thescripts.com/forum/thread631837.html
hmm.
Re: PHP Read PDF
am 19.09.2007 22:41:58 von Shelly
> "Good Man" wrote in message
>> ... also found a link that suggests PDF files are just gzipped XML, so
>> maybe you could write your own extractor:
>>
>> http://www.thescripts.com/forum/thread631837.html
>
> hmm.
I tried a very simple test with a very small PDF file. The code is:
$pdfFile = "./images/Postcard.pdf";
echo $pdfFile . "
";
$fp = gzopen($pdfFile, "r");
$rawStream = gzread($fp, 5000000);
gzclose($fp);
echo "**" .$rawStream . "**
";
$stream = gzuncompress($rawStream);
echo $stream;
?>
It came up with a "data error" in the line with
$stream = gzuncompress($rawStream);
The error was in the gzuncompress.
Re: PHP Read PDF
am 20.09.2007 12:40:24 von gosha bine
On 19.09.2007 22:41 Shelly wrote:
>> "Good Man" wrote in message
>>> ... also found a link that suggests PDF files are just gzipped XML, so
>>> maybe you could write your own extractor:
>>>
>>> http://www.thescripts.com/forum/thread631837.html
>> hmm.
>
> I tried a very simple test with a very small PDF file. The code is:
>
>
> $pdfFile = "./images/Postcard.pdf";
> echo $pdfFile . "
";
> $fp = gzopen($pdfFile, "r");
> $rawStream = gzread($fp, 5000000);
> gzclose($fp);
> echo "**" .$rawStream . "**
";
> $stream = gzuncompress($rawStream);
> echo $stream;
> ?>
>
>
> It came up with a "data error" in the line with
> $stream = gzuncompress($rawStream);
> The error was in the gzuncompress.
>
>
Some parts of PDF are compressed using zip algorithm, but PDF itself is
not a ZIP file. You cannot read it with gz functions.
--
gosha bine
makrell ~ http://www.tagarga.com/blok/makrell
php done right ;) http://code.google.com/p/pihipi
Re: PHP Read PDF
am 21.09.2007 02:54:51 von Jerry Stuckle
Shelly wrote:
> "Good Man" wrote in message
> news:Xns99B0A095D3484sonicyouth@216.196.97.131...
>> "Shelly" wrote in
>> news:13f2ro925ga7teb@corp.supernews.com:
>>
>>> Any suggestions?
>>>
>>> "Shelly" wrote in message
>>> news:13f2f8uqm3eck19@corp.supernews.com...
>>>> I had to do my first investigation regarding PDF files. Surprisingly,
>>>> I found that the only functions in PHP were for creating PDF files.
>>>>
>>>> The potential customer receives order forms from the corporate
>>>> headquarters and they are PDF forms. What we want to do is to
>>>> extract information from these forms and process the data into a
>>>> database. To do this we need to read certain set fields. Nowhere
>>>> did I find a function to be able to read PDF files, let alone extract
>>>> information from them.
>>>>
>>>> My thoughts, in the absence of this function, would be if there were
>>>> a way to open the file, strip the formatting, and then work on the
>>>> text stream. The key unknown for me in this is how to strip the
>>>> formatting.
>>>>
>>>> So, do I hear any suggestions for either?:
>>>> (1) How to read predetermined field entries from a PDF file or
>>>> (2) How to convert a PDF into an unformatted text stream
>>>>
>>>> Shelly
>>>>
>> yikes, found this expensive option via the folks at pdflib:
>>
>> http://www.pdflib.com/products/tet/
>
> yikes is an understatement
>
>>
>> ... also found a link that suggests PDF files are just gzipped XML, so
>> maybe you could write your own extractor:
>>
>> http://www.thescripts.com/forum/thread631837.html
>
> hmm.
>
>
No, that is incorrect.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Re: PHP Read PDF
am 21.09.2007 13:11:11 von dshesnicky
> I had to do my first investigation regarding PDF files. Surprisingly, I
> found that the only functions in PHP were for creating PDF files.
How about pdf2text? Google it if your interested.
Don
Re: PHP Read PDF
am 15.10.2007 15:08:28 von atulkapoor
Well You do have a option of reading PDF files from PHP
Firstly you will need PDFlib to be install written by Thomas Merz.
then using statement below:
$pdf = PDF_new();
PDF_open_file($pdf);
you can read the contents of the PDF file
"Where $var represents the variable to store the PDF object reference
(to be used in the next function in place of ) and
[filename] represents an optional parameter specifying a already
existing PDF file to open. If no filename is specified, then a new PDF
document is created."
for more reference please visit:
http://www.zend.com/zend/spotlight/creatingpdfmay1.php
On Sep 19, 11:48 pm, "Shelly" wrote:
> Any suggestions?
>
> "Shelly" wrote in message
>
> news:13f2f8uqm3eck19@corp.supernews.com...>I had to do my first investigation regarding PDF files. Surprisingly, I
> >found that the only functions in PHP were for creating PDF files.
>
> > The potential customer receives order forms from the corporate
> > headquarters and they are PDF forms. What we want to do is to extract
> > information from these forms and process the data into a database. To do
> > this we need to read certain set fields. Nowhere did I find a function to
> > be able to read PDF files, let alone extract information from them.
>
> > My thoughts, in the absence of this function, would be if there were a way
> > to open the file, strip the formatting, and then work on the text stream.
> > The key unknown for me in this is how to strip the formatting.
>
> > So, do I hear any suggestions for either?:
> > (1) How to read predetermined field entries from a PDF file or
> > (2) How to convert a PDF into an unformatted text stream
>
> > Shelly