CSV pain and suffering with punctuation and non-ASCII characters

CSV pain and suffering with punctuation and non-ASCII characters

am 15.12.2009 03:50:56 von Jan Steinman

I am trying to import a database of notable quotations. It is really
quite simple, but I'm finding importing via phpMyAdmin is (IMHO)
erroneous.

The database came from FileMaker Pro 7, and there are no options for
changing the output. There are two choices:
1) tabs between fields (with no opportunity to include the tab
character in a field), and
2) CSV, which double-quotes fields and has commas in-between, which
apparently doubles embedded quotes.

I am using UTF-8 bin as the character set, and as the encoding for all
fields.

The problem is that numerous punctuation characters appear to
terminate the reading of a field, whether imported as a TAB file or as
a CSV file.

For example, importing a quote with a "Context" field of:
The Hitchhiker's Guide to the Galaxy

results in a field containing:
The Hitchhiker

whether I use TAB or CSV. Note that this in an ASCII single quote
character, not something exotic. I have also noted this on fields that
contain "curly quotes."

I am using phpMyAdmin's "CSV using LOAD DATA" option, checking
"Replace table data with file", terminating fields with a comma,
enclosing fields with double quotes, and blanking (or leaving the '\',
it doesn't matter) the "Fields escaped by" field.

It seems to me that using CSV, with double-quoted, comma separated
fields should tell the import process, "Hands off until you see
another double quote!" And that using TAB format, nothing between TAB
characters should be interpreted. Why are characters inside the double
quotes or tabs being interpreted?

I have not tried mysqlimport from the command line, assuming (perhaps
incorrectly) that phpMyAdmin was simply passing things along, and not
interpreting them.

Ah, one light in the tunnel: my "Quote" field was TEXT. Changing it to
BLOB preserves non-ASCII characters, but I still see the strange
behaviour noted above with a single quote, which was in a VARCHAR(256)
UTF8 field.

Thanks for whatever help you can offer!

:::: My pants just went on a wild rampage through a Long Island
Bowling Alley! -- Zippy the Pinhead
:::: Jan Steinman ::::


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: CSV pain and suffering with punctuation and non-ASCII characters

am 15.12.2009 05:11:17 von Neil Aggarwal

Jan:

> For example, importing a quote with a "Context" field of:
> The Hitchhiker's Guide to the Galaxy
>
> results in a field containing:
> The Hitchhiker

You are going to have to escape quotes, so your string should
look like this:
The Hitchhiker\'s Guide to the Galaxy

See this page:
http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html

Whenever I run into this situation, I use the Ostermiller
Excel CSV parser:
http://ostermiller.org/utils/ExcelCSV.html
with a custom class I wrote to escape strings using the
MySQL format. I don't know if there is an off the shelf
tool to do what you want. Maybe someone has one.

Neil

--
Neil Aggarwal, (281)846-8957, http://UnmeteredVPS.net
Host your MySQL database on a CentOS virtual server for $25/mo
Unmetered bandwidth = no overage charges, 7 day free trial


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org