Re: Spreadsheet_Excel_Reader problem

Re: Spreadsheet_Excel_Reader problem

am 16.03.2010 13:15:22 von Ashley Sheridan

--=-NBYS9+a+uiG3u3w9O9RH
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Tue, 2010-03-16 at 20:16 +0800, I am on the top of the world!
Borlange University wrote:

> i have a problem of reading values from excel file via
> spreadsheet_excel_reader which is a php class used to manipulate excel
> files.
>
> $data = new Spreadsheet_Excel_Reader();
> $data->setOutputEncoding('CP936');
> $data->read("d:\\tmp.xls");
> $rows=$data->sheets[0]['numRows'];
> $cell = $data->sheets[0]['cells'][1][1];
>
> if i type many rows,say 1000, in the tmp.xls, it can read, $rows shows 1000,
> however, 1000 rows of data are copied from another excel file and these
> cells may have different background or other changes that differ from normal
> cells, it fails to read.the variable $rows shows nothing....what i can do
> now is to split these data into small groups, small enough to be read, it
> really takes plenty of time.
>
>
> i dont know why, has somebody met this problem ever?


If it's just the data you need, try using a csv file instead. The Excel
format is closed, and as such, the PHP classes won't have full support
for all of it's features. It seems like formatting is causing the data
to be written to the spreadsheet differently maybe.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-NBYS9+a+uiG3u3w9O9RH--

Spreadsheet_Excel_Reader problem

am 16.03.2010 13:16:59 von michael3832628

--0016e6480e327175370481e9f91a
Content-Type: text/plain; charset=ISO-8859-1

i have a problem of reading values from excel file via
spreadsheet_excel_reader which is a php class used to manipulate excel
files.

$data = new Spreadsheet_Excel_Reader();
$data->setOutputEncoding('CP936');
$data->read("d:\\tmp.xls");
$rows=$data->sheets[0]['numRows'];
$cell = $data->sheets[0]['cells'][1][1];

if i type many rows,say 1000, in the tmp.xls, it can read, $rows shows 1000,
however, 1000 rows of data are copied from another excel file and these
cells may have different background or other changes that differ from normal
cells, it fails to read.the variable $rows shows nothing....what i can do
now is to split these data into small groups, small enough to be read, it
really takes plenty of time.


i dont know why, has somebody met this problem ever?

--0016e6480e327175370481e9f91a--

Re: Spreadsheet_Excel_Reader problem

am 16.03.2010 14:19:43 von jochen schultz

Or maybe you want to check out this:

http://www.codeplex.com/PHPExcel

I havn't checked it out for your special purpose but i found it useful
for some other jobs (especially excell 2007 support).

regards
Jochen

Ashley Sheridan schrieb:
> On Tue, 2010-03-16 at 20:16 +0800, I am on the top of the world!
> Borlange University wrote:
>
>> i have a problem of reading values from excel file via
>> spreadsheet_excel_reader which is a php class used to manipulate excel
>> files.
>>
>> $data = new Spreadsheet_Excel_Reader();
>> $data->setOutputEncoding('CP936');
>> $data->read("d:\\tmp.xls");
>> $rows=$data->sheets[0]['numRows'];
>> $cell = $data->sheets[0]['cells'][1][1];
>>
>> if i type many rows,say 1000, in the tmp.xls, it can read, $rows shows 1000,
>> however, 1000 rows of data are copied from another excel file and these
>> cells may have different background or other changes that differ from normal
>> cells, it fails to read.the variable $rows shows nothing....what i can do
>> now is to split these data into small groups, small enough to be read, it
>> really takes plenty of time.
>>
>>
>> i dont know why, has somebody met this problem ever?
>
>
> If it's just the data you need, try using a csv file instead. The Excel
> format is closed, and as such, the PHP classes won't have full support
> for all of it's features. It seems like formatting is causing the data
> to be written to the spreadsheet differently maybe.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>

--
Sport Import GmbH - Amtsgericht Oldenburg - Tel: +49-4405-9280-63
Industriestrasse 39 - HRB 1202900 -
26188 Edewecht - GF: Michael Müllmann

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 12:35:33 von Ashley Sheridan

--=-eSGpgkPMWpbapRrKrY7C
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Thu, 2010-03-18 at 19:21 +0800, I am on the top of the world!
Borlange University wrote:
> sounds good, i havnt checked out it with cvs format.
>
> thanks
>
>
> On Tue, Mar 16, 2010 at 8:15 PM, Ashley Sheridan
> wrote:
>
>
> On Tue, 2010-03-16 at 20:16 +0800, I am on the top of the
> world! Borlange University wrote:
>
> > i have a problem of reading values from excel file via
> > spreadsheet_excel_reader which is a php class used to manipulate excel
> > files.
> >
> > $data = new Spreadsheet_Excel_Reader();
> > $data->setOutputEncoding('CP936');
> > $data->read("d:\\tmp.xls");
> > $rows=$data->sheets[0]['numRows'];
> > $cell = $data->sheets[0]['cells'][1][1];
> >
> > if i type many rows,say 1000, in the tmp.xls, it can read, $rows shows 1000,
> > however, 1000 rows of data are copied from another excel file and these
> > cells may have different background or other changes that differ from normal
> > cells, it fails to read.the variable $rows shows nothing....what i can do
> > now is to split these data into small groups, small enough to be read, it
> > really takes plenty of time.
> >
> >
> > i dont know why, has somebody met this problem ever?
>
>
>
>
> If it's just the data you need, try using a csv file instead.
> The Excel format is closed, and as such, the PHP classes won't
> have full support for all of it's features. It seems like
> formatting is causing the data to be written to the
> spreadsheet differently maybe.
>
>
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
>
>


The .csv format is just a plain text format, so you won't get formatting
or formulas in your 'sheets' (csv is also a sheetless format) but it's
been used for years by many systems for data.

A lot of database systems will let you import csv files as well, which
is quite convenient, although you will have to make sure the csv uses
commas to delimit the data. Microsoft managed to basterdise this format
a bit as well, and lets you use tabs, spaces and all sorts of other
characters to delimit data fields. Someone obviously didn't mention to
them that the file type is 'comma separated values'!

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-eSGpgkPMWpbapRrKrY7C--

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 12:55:24 von Ashley Sheridan

--=-lkbX973rMvl4B8GKPjTg
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Thu, 2010-03-18 at 12:56 +0100, Jochen Schultz wrote:

> > Microsoft managed to basterdise this format
> > a bit as well, and lets you use tabs, spaces and all sorts of other
> > characters to delimit data fields. Someone obviously didn't mention to
> > them that the file type is 'comma separated values'!
>
> Or maybe it is because someone told them, that there is data out there,
> that might contain commas?
>
> regards
> Jochen


The csv format allows for quote marks which can serve to escape commas:

"data, with commas"

And for escaping quote marks, you just use triple quote marks:

"some text """with quotes""" "

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-lkbX973rMvl4B8GKPjTg--

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 12:56:02 von jochen schultz

> Microsoft managed to basterdise this format
> a bit as well, and lets you use tabs, spaces and all sorts of other
> characters to delimit data fields. Someone obviously didn't mention to
> them that the file type is 'comma separated values'!

Or maybe it is because someone told them, that there is data out there,
that might contain commas?

regards
Jochen

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 15:05:48 von Paul M Foster

On Thu, Mar 18, 2010 at 11:35:33AM +0000, Ashley Sheridan wrote:



>
> The .csv format is just a plain text format, so you won't get formatting
> or formulas in your 'sheets' (csv is also a sheetless format) but it's
> been used for years by many systems for data.
>
> A lot of database systems will let you import csv files as well, which
> is quite convenient, although you will have to make sure the csv uses
> commas to delimit the data. Microsoft managed to basterdise this format
> a bit as well, and lets you use tabs, spaces and all sorts of other
> characters to delimit data fields. Someone obviously didn't mention to
> them that the file type is 'comma separated values'!

I process a lot of CSV files, and what I typically see is that Excel
will enclose fields which might contain commas in quotes. This gets
messy. So I finally wrote a C utility which parses the file and yields
tab-delimited records without the quotes.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 16:08:16 von jochen schultz

AFAIK, there is no real standard out there for CSV file definition and
since Microsoft and many others (me too btw) use other chars for field
separation in so called CSV files, i think it is a good way to deal with
it and let it be how it is (actually i have to look into every CSV file
that is going to be exchanged and that has some advantages as well).

Btw. who told you to use tripple quotes?

http://www.rfc-editor.org/rfc/rfc4180.txt

7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:

"aaa","b""bb","ccc"

regards
Jochen

Ashley Sheridan schrieb:
> On Thu, 2010-03-18 at 12:56 +0100, Jochen Schultz wrote:
>> > Microsoft managed to basterdise this format
>> > a bit as well, and lets you use tabs, spaces and all sorts of other
>> > characters to delimit data fields. Someone obviously didn't mention to
>> > them that the file type is 'comma separated values'!
>>
>> Or maybe it is because someone told them, that there is data out there,
>> that might contain commas?
>>
>> regards
>> Jochen
>
> The csv format allows for quote marks which can serve to escape commas:
>
> "data, with commas"
>
> And for escaping quote marks, you just use triple quote marks:
>
> "some text """with quotes""" "
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 16:40:26 von TedD

At 11:35 AM +0000 3/18/10, Ashley Sheridan wrote:
>The .csv format is just a plain text format, so you won't get formatting
>or formulas in your 'sheets' (csv is also a sheetless format) but it's
>been used for years by many systems for data.
>
>A lot of database systems will let you import csv files as well, which
>is quite convenient, although you will have to make sure the csv uses
>commas to delimit the data. Microsoft managed to basterdise this format
>a bit as well, and lets you use tabs, spaces and all sorts of other
>characters to delimit data fields. Someone obviously didn't mention to
>them that the file type is 'comma separated values'!
>
>Thanks,
>Ash

Ash:

I remember many years ago I had a problem with M$ version of plain
text csv files -- the first cell of each spreadsheet was different
than all other cells. It took me a while to discover that difference
and build in an exception.

I've found that M$ always has a better idea, even if they are the
only ones who think so.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 16:57:00 von Tommy Pham

On Thu, Mar 18, 2010 at 8:40 AM, tedd wrote:
> At 11:35 AM +0000 3/18/10, Ashley Sheridan wrote:
>>
>> The .csv format is just a plain text format, so you won't get formatting
>> or formulas in your 'sheets' (csv is also a sheetless format) but it's
>> been used for years by many systems for data.
>>
>> A lot of database systems will let you import csv files as well, which
>> is quite convenient, although you will have to make sure the csv uses
>> commas to delimit the data. Microsoft managed to basterdise this format
>> a bit as well, and lets you use tabs, spaces and all sorts of other
>> characters to delimit data fields. Someone obviously didn't mention to
>> them that the file type is 'comma separated values'!
>>
>> Thanks,
>> Ash
>
> Ash:
>
> I remember many years ago I had a problem with M$ version of plain text csv
> files -- the first cell of each spreadsheet was different than all other
> cells. It took me a while to discover that difference and build in an
> exception.
>
> I've found that M$ always has a better idea, even if they are the only ones
> who think so.
>
> Cheers,
>
> tedd
>
> --

Personally, I find working with fixed widths is best. The text file
might be larger but I don't have worry about escaping any type of
characters ;)

Regards,
Tommy

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 17:12:42 von Paul M Foster

On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:



>
> Personally, I find working with fixed widths is best. The text file
> might be larger but I don't have worry about escaping any type of
> characters ;)

I find this impossible, since I never know the largest width of all the
fields in a file. And a simple explode() call allows pulling all the
fields into an array, based on a common delimiter.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 17:15:33 von Ashley Sheridan

--=-BTdWiXzMGUCVxFwJ+//5
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:

> On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
>
>
>
> >
> > Personally, I find working with fixed widths is best. The text file
> > might be larger but I don't have worry about escaping any type of
> > characters ;)
>
> I find this impossible, since I never know the largest width of all the
> fields in a file. And a simple explode() call allows pulling all the
> fields into an array, based on a common delimiter.
>
> Paul
>
> --
> Paul M. Foster
>


Explode won't work in the case of a comma in a field value.

Also, newlines can exist within a field value, so a line in the file
doesn't equate to a row of data

The best way is just to start parsing at the beginning of the file and
break it into fields one by one from there.

The bit I don't like about characters other than a comma being used in a
"comma separated values" file is that you can't automatically tell what
character has been used as the delimiter. Hence being asked by
spreadsheet programs what the delimiter is if a comma doesn't give up
what it recognises as valid fields.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-BTdWiXzMGUCVxFwJ+//5--

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 17:16:30 von Mattias Thorslund

Paul M Foster wrote:
> I process a lot of CSV files, and what I typically see is that Excel
> will enclose fields which might contain commas in quotes. This gets
> messy. So I finally wrote a C utility which parses the file and yields
> tab-delimited records without the quotes.
>
> Paul
>

And fgetcsv() didn't work for you?

http://www.php.net/fgetcsv

Cheers,

Mattias

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 17:41:10 von Paul M Foster

On Thu, Mar 18, 2010 at 09:16:30AM -0700, Mattias Thorslund wrote:

> Paul M Foster wrote:
>> I process a lot of CSV files, and what I typically see is that Excel
>> will enclose fields which might contain commas in quotes. This gets
>> messy. So I finally wrote a C utility which parses the file and yields
>> tab-delimited records without the quotes.
>>
>> Paul
>>
>
> And fgetcsv() didn't work for you?
>
> http://www.php.net/fgetcsv

I wrote my utility (and the infrastructure to process these files) long
before I was working with PHP. For what I do with the files, I must pipe
one operation's results to another process/command to get the final
result. This is impossible with web-based PHP. So I shell out from PHP
to do it. Like this:

// convert original file to tab-delimited
cat maillist.csv | cqf | filter.cq3or4 > jones.tab
// filter unwanted fields and reorder fields
mlt3.py nady jones.tab jones.rdb
// build basic DBF file
dbfsak -r mailers.rdb jones.dbf
// append rdb records to DBF file
dbfsak -a jones.rdb jones.dbf

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 17:57:30 von Paul M Foster

On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote:

> On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:
>
> On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
>
>
>
> >
> > Personally, I find working with fixed widths is best. The text file
> > might be larger but I don't have worry about escaping any type of
> > characters ;)
>
> I find this impossible, since I never know the largest width of all the
> fields in a file. And a simple explode() call allows pulling all the
> fields into an array, based on a common delimiter.
>
> Paul
>
> --
> Paul M. Foster
>
>
>
> Explode won't work in the case of a comma in a field value.

That's why I convert the files to tab-delimited first. explode() does
work in that case.

>
> Also, newlines can exist within a field value, so a line in the file doesn't
> equate to a row of data

I've never seen this in the files I receive.

>
> The best way is just to start parsing at the beginning of the file and break it
> into fields one by one from there.
>
> The bit I don't like about characters other than a comma being used in a "comma
> separated values" file is that you can't automatically tell what character has
> been used as the delimiter. Hence being asked by spreadsheet programs what the
> delimiter is if a comma doesn't give up what it recognises as valid fields.

I've honestly never seen a "CSV" or "Comma-separated Values" which used
tabs for delimiters. At that point, it's really not a *comma* separated
value file.

My application for all this is accepting mailing lists from customers
which I have to convert into DBFs for a commercial mailing list program.
Because most of my customers can barely find the on/off switch on their
computers, I never know what I'm going to get. So before I string
together the filters to process the file, I have to actually look at and
analyze the file to find out what it is. Could be a fixed-field length
file, a CSV, a tab-delimited file, or anything in between. Once I've
selected the filters, the sequence they will be put together in, and the
fields from the file I want to capture, I hit the button. After it's all
done, I now have to look at the result to ensure that the requested
fields ended up where they were supposed to.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 18:00:24 von Ashley Sheridan

--=-WqubokJ40UdezkFyoBG1
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Thu, 2010-03-18 at 12:57 -0400, Paul M Foster wrote:

> On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote:
>
> > On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:
> >
> > On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
> >
> >
> >
> > >
> > > Personally, I find working with fixed widths is best. The text file
> > > might be larger but I don't have worry about escaping any type of
> > > characters ;)
> >
> > I find this impossible, since I never know the largest width of all the
> > fields in a file. And a simple explode() call allows pulling all the
> > fields into an array, based on a common delimiter.
> >
> > Paul
> >
> > --
> > Paul M. Foster
> >
> >
> >
> > Explode won't work in the case of a comma in a field value.
>
> That's why I convert the files to tab-delimited first. explode() does
> work in that case.
>
> >
> > Also, newlines can exist within a field value, so a line in the file doesn't
> > equate to a row of data
>
> I've never seen this in the files I receive.
>
> >
> > The best way is just to start parsing at the beginning of the file and break it
> > into fields one by one from there.
> >
> > The bit I don't like about characters other than a comma being used in a "comma
> > separated values" file is that you can't automatically tell what character has
> > been used as the delimiter. Hence being asked by spreadsheet programs what the
> > delimiter is if a comma doesn't give up what it recognises as valid fields.
>
> I've honestly never seen a "CSV" or "Comma-separated Values" which used
> tabs for delimiters. At that point, it's really not a *comma* separated
> value file.
>
> My application for all this is accepting mailing lists from customers
> which I have to convert into DBFs for a commercial mailing list program.
> Because most of my customers can barely find the on/off switch on their
> computers, I never know what I'm going to get. So before I string
> together the filters to process the file, I have to actually look at and
> analyze the file to find out what it is. Could be a fixed-field length
> file, a CSV, a tab-delimited file, or anything in between. Once I've
> selected the filters, the sequence they will be put together in, and the
> fields from the file I want to capture, I hit the button. After it's all
> done, I now have to look at the result to ensure that the requested
> fields ended up where they were supposed to.
>
> Paul
>
> --
> Paul M. Foster
>


But surely whatever character is used as the delimiter could be part of
the fields value?

I hadn't even known that newlines would exist in the fields, until it
broke a script of mine!

And I believe that when MS Office saves a CSV out with a character other
than a comma as the delimiter, it still saves it as a .csv by default.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-WqubokJ40UdezkFyoBG1--

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 18:26:26 von Andrew Ballard

On Thu, Mar 18, 2010 at 1:00 PM, Ashley Sheridan
wrote:
[snip]
> And I believe that when MS Office saves a CSV out with a character other
> than a comma as the delimiter, it still saves it as a .csv by default.

Nope. If you save as CSV, it is comma-separated with double-quotes as
the text qualifier. There is also an option to save in tab-delimited
format, but the default extension for that is .txt.

The only issue I have with Excel handling text files is with columns
like ZIP code that should be treated as text (they are string
sequences that happen to contain only numeric digits where leading
zeros are significant) but are interpreted as numbers.

Andrew

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 18.03.2010 20:25:31 von Paul M Foster

On Thu, Mar 18, 2010 at 05:00:24PM +0000, Ashley Sheridan wrote:

> On Thu, 2010-03-18 at 12:57 -0400, Paul M Foster wrote:
>
> On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote:
>
> > On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:
> >
> > On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
> >



> > Explode won't work in the case of a comma in a field value.
>
> That's why I convert the files to tab-delimited first. explode() does
> work in that case.
>
> >
> > Also, newlines can exist within a field value, so a line in the
> file doesn't
> > equate to a row of data
>
> I've never seen this in the files I receive.
>
> >
> > The best way is just to start parsing at the beginning of the file
> and break it
> > into fields one by one from there.
> >
> > The bit I don't like about characters other than a comma being used
> in a "comma
> > separated values" file is that you can't automatically tell what
> character has
> > been used as the delimiter. Hence being asked by spreadsheet programs
> what the
> > delimiter is if a comma doesn't give up what it recognises as valid
> fields.
>
> I've honestly never seen a "CSV" or "Comma-separated
> Values" which used
> tabs for delimiters. At that point, it's really not a *comma* separated
> value file.
>
> My application for all this is accepting mailing lists from customers
> which I have to convert into DBFs for a commercial mailing list program.
> Because most of my customers can barely find the on/off switch on their
> computers, I never know what I'm going to get. So before I string
> together the filters to process the file, I have to actually look at and
> analyze the file to find out what it is. Could be a fixed-field length
> file, a CSV, a tab-delimited file, or anything in between. Once I've
> selected the filters, the sequence they will be put together in, and the
> fields from the file I want to capture, I hit the button. After it's all
> done, I now have to look at the result to ensure that the requested
> fields ended up where they were supposed to.
>
> Paul
>
> --
> Paul M. Foster
>
>
>
> But surely whatever character is used as the delimiter could be part of the
> fields value?

Well, remember I shove these into tab-delimited files. It does
occasionally happen that someone will slip a tab into a field. When that
happens, I can tell when the final result is off. Then I do a hex dump
of the file (in PHP) to determine if it actually is a tab. If so, I have
a filter I prepend to the line of filters which removes tabs from the
original CSV file. Then proceed as before.

Occasionally someone will send me a file in a "label" format which
contains \x0C characters or somesuch at page boundaries. I actually have
to look at the file and find out what they've inserted. I have filters
for most anything I find like that.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 19.03.2010 23:01:38 von Jason Pruim

On Mar 18, 2010, at 1:26 PM, Andrew Ballard wrote:

> On Thu, Mar 18, 2010 at 1:00 PM, Ashley Sheridan
> wrote:
> [snip]
>> And I believe that when MS Office saves a CSV out with a character
>> other
>> than a comma as the delimiter, it still saves it as a .csv by
>> default.
>
> Nope. If you save as CSV, it is comma-separated with double-quotes as
> the text qualifier. There is also an option to save in tab-delimited
> format, but the default extension for that is .txt.
>
> The only issue I have with Excel handling text files is with columns
> like ZIP code that should be treated as text (they are string
> sequences that happen to contain only numeric digits where leading
> zeros are significant) but are interpreted as numbers.
>
> Andrew


Hi Andrew,

As a fellow mailing list processor I can feel your pain... One thing I
have found is when you are importing the data, you can select the zip
column and change the format from "general" to "text" and it will
maintain the leading zero's. Or a simple filter applied to it
afterwards will help to.

But if you have a decent CASS software then it should add the zip back
in hehe :)


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 19.03.2010 23:13:54 von Ashley Sheridan

--=-qvo9/CZMH5g8ESpZ+rGI
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Fri, 2010-03-19 at 18:01 -0400, Jason Pruim wrote:

> On Mar 18, 2010, at 1:26 PM, Andrew Ballard wrote:
>
> > On Thu, Mar 18, 2010 at 1:00 PM, Ashley Sheridan
> > wrote:
> > [snip]
> >> And I believe that when MS Office saves a CSV out with a character
> >> other
> >> than a comma as the delimiter, it still saves it as a .csv by
> >> default.
> >
> > Nope. If you save as CSV, it is comma-separated with double-quotes as
> > the text qualifier. There is also an option to save in tab-delimited
> > format, but the default extension for that is .txt.
> >
> > The only issue I have with Excel handling text files is with columns
> > like ZIP code that should be treated as text (they are string
> > sequences that happen to contain only numeric digits where leading
> > zeros are significant) but are interpreted as numbers.
> >
> > Andrew
>
>
> Hi Andrew,
>
> As a fellow mailing list processor I can feel your pain... One thing I
> have found is when you are importing the data, you can select the zip
> column and change the format from "general" to "text" and it will
> maintain the leading zero's. Or a simple filter applied to it
> afterwards will help to.
>
> But if you have a decent CASS software then it should add the zip back
> in hehe :)
>
>


It's not really just that. In the csv format, a field value of 00123 (I
don't really know what zip code formats are) is perfectly valid.
Unfortunately, Excel (and Calc) tries to be clever and strips out
leading zeros on a field it recognises as all numbers. This is annoying
for things like zip codes and phone numbers (which in the UK mostly all
start with a 0)

I think short of enclosing the field in quote marks to signify it's a
string and not something that the software should guess at is the only
way to ensure it works effectively.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-qvo9/CZMH5g8ESpZ+rGI--

Re: Spreadsheet_Excel_Reader problem

am 20.03.2010 01:30:21 von Andrew Ballard

On Fri, Mar 19, 2010 at 6:01 PM, Jason Pruim wrote:
>
> On Mar 18, 2010, at 1:26 PM, Andrew Ballard wrote:
>
>> On Thu, Mar 18, 2010 at 1:00 PM, Ashley Sheridan
>> wrote:
>> [snip]
>>>
>>> And I believe that when MS Office saves a CSV out with a character other
>>> than a comma as the delimiter, it still saves it as a .csv by default.
>>
>> Nope. If you save as CSV, it is comma-separated with double-quotes as
>> the text qualifier. There is also an option to save in tab-delimited
>> format, but the default extension for that is .txt.
>>
>> The only issue I have with Excel handling text files is with columns
>> like ZIP code that should be treated as text (they are string
>> sequences that happen to contain only numeric digits where leading
>> zeros are significant) but are interpreted as numbers.
>>
>> Andrew
>
>
> Hi Andrew,
>
> As a fellow mailing list processor I can feel your pain... One thing I have
> found is when you are importing the data, you can select the zip column and
> change the format from "general" to "text" and it will maintain the leading
> zero's. Or a simple filter applied to it afterwards will help to.
>
> But if you have a decent CASS software then it should add the zip back in
> hehe :)
>
>

That works - if I'm the first one to open the file. Often I get files
that someone else opened in Excel to "fix" some things then saved back
to CSV and sent merrily along. :-)

Andrew

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 20.03.2010 01:32:47 von Andrew Ballard

On Fri, Mar 19, 2010 at 6:13 PM, Ashley Sheridan
wrote:
>
> On Fri, 2010-03-19 at 18:01 -0400, Jason Pruim wrote:
>
> On Mar 18, 2010, at 1:26 PM, Andrew Ballard wrote:
>
> > On Thu, Mar 18, 2010 at 1:00 PM, Ashley Sheridan
> > wrote:
> > [snip]
> >> And I believe that when MS Office saves a CSV out with a character
> >> other
> >> than a comma as the delimiter, it still saves it as a .csv by
> >> default.
> >
> > Nope. If you save as CSV, it is comma-separated with double-quotes as
> > the text qualifier. There is also an option to save in tab-delimited
> > format, but the default extension for that is .txt.
> >
> > The only issue I have with Excel handling text files is with columns
> > like ZIP code that should be treated as text (they are string
> > sequences that happen to contain only numeric digits where leading
> > zeros are significant) but are interpreted as numbers.
> >
> > Andrew
>
>
> Hi Andrew,
>
> As a fellow mailing list processor I can feel your pain... One thing I
> have found is when you are importing the data, you can select the zip
> column and change the format from "general" to "text" and it will
> maintain the leading zero's. Or a simple filter applied to it
> afterwards will help to.
>
> But if you have a decent CASS software then it should add the zip back
> in hehe :)
>
>
>
> It's not really just that. In the csv format, a field value of 00123 (I d=
on't really know what zip code formats are) is perfectly valid. Unfortunate=
ly, Excel (and Calc) tries to be clever and strips out leading zeros on a f=
ield it recognises as all numbers. This is annoying for things like zip cod=
es and phone numbers (which in the UK mostly all start with a 0)
>
> I think short of enclosing the field in quote marks to signify it's a str=
ing and not something that the software should guess at is the only way to =
ensure it works effectively.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

I don't think even that works. I think what Jason suggested (going
through the text import wizard -- which does not always launch if you
just open the CSV file since Excel thinks it knows how to handle it --
and specifying to treat the column as text) is the only way to be
sure.

Andrew

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 20.03.2010 07:52:28 von Paul M Foster

On Fri, Mar 19, 2010 at 10:13:54PM +0000, Ashley Sheridan wrote:



>
> It's not really just that. In the csv format, a field value of 00123 (I don't
> really know what zip code formats are) is perfectly valid.

ZIP codes are simply five digits. The starting digit (0-9) identifies a
broad region of the country. The beginning 3-digit sequence identifies a
major processing center. A full 5-digit ZIP code can encompass a few
blocks or hundreds of square kilometers, depending on the density of
addresses (population). ZIPs ending in 98 or 99 are often reserved for
the Post Office itself.

Fascinating, huh? ;-}

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Spreadsheet_Excel_Reader problem

am 20.03.2010 07:58:18 von Paul M Foster

On Fri, Mar 19, 2010 at 06:01:38PM -0400, Jason Pruim wrote:



>
> But if you have a decent CASS software then it should add the zip back
> in hehe :)

For the sake of those in Europe and elsewhere, CASS software is software
certified by the US Postal Service which cleans up addresses to conform
to what the Post Office wants them to look like, including adding
correct ZIP codes where possible. There is an additional kind of
software, called PAVE software, which sorts mailing lists into proper
groupings for simple delivery by the Post Office.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php