dash was converted o a wierd character
dash was converted o a wierd character
am 15.03.2010 23:08:23 von Rotsen
--000e0cd23ab4a96beb0481de1e03
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
I have an sql file that I dump(mysqldump) and then I installed on a new
system and some how
the dashes on the file were changed to some wierd character.
When I look at the sql file in my windows machine using PUTTY
I get stuff like "1.01.A =E2 the second"
When I look at the same file from my linux machine via "ssh -y" the I get
stuff like "1.01.A =96 the second "
All I know is that this wierd character original was a dash (-)
How can I search for this character and convert it to a dash?
Thanks,
Nestor :-)
--000e0cd23ab4a96beb0481de1e03--
Re: dash was converted o a wierd character
am 16.03.2010 10:06:24 von Johan De Meersman
--0016e6469b60e5571f0481e74ffd
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
On *nix, look for a utility called convmv.
I've got a hunch that your original file comes from a windows host, and the
filenames may have been copied from a word document or something similar.
Microsoft knows best, and thus tends to convert regular dashes into some
weird, slightly elongated version. If you copy that to a filename, and then
move that file to a *nix host, you get strange stuff. It's all for your own
good, apparently.
On Mon, Mar 15, 2010 at 11:08 PM, N=E9stor wrote:
> I have an sql file that I dump(mysqldump) and then I installed on a new
> system and some how
> the dashes on the file were changed to some wierd character.
>
> When I look at the sql file in my windows machine using PUTTY
> I get stuff like "1.01.A =E2 the second"
>
> When I look at the same file from my linux machine via "ssh -y" the I get
> stuff like "1.01.A =96 the second "
>
> All I know is that this wierd character original was a dash (-)
>
> How can I search for this character and convert it to a dash?
>
> Thanks,
>
> Nestor :-)
>
--=20
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--0016e6469b60e5571f0481e74ffd--
Re: dash was converted o a wierd character
am 16.03.2010 20:03:09 von Michael Dykman
On Tue, Mar 16, 2010 at 5:06 AM, Johan De Meersman wrote:
> On *nix, look for a utility called convmv.
>
> I've got a hunch that your original file comes from a windows host, and the
> filenames may have been copied from a word document or something similar.
> Microsoft knows best, and thus tends to convert regular dashes into some
> weird, slightly elongated version. If you copy that to a filename, and then
> move that file to a *nix host, you get strange stuff. It's all for your own
> good, apparently.
That is exactly the phenomenon I was referring to.and I run into it
again and again.
Here is a copy of the table explaining the details of those
characters. It should inspire some ideas on how to address these in a
manner appropriate to your environment.
glyph Unicode HTML HTML/XML TeX Windows Char Codes
figure dash - U+2012 (8210) none ‒ or ‒ none
en dash - U+2013 (8211) – – or – -- ALT + 0150
em dash -- U+2014 (8212) — — or — --- ALT + 0151
horizontal bar -- U+2015 (8213) none ― or ― none
swung dash ~ U+2053 (8275) none ⁓ or ⁓ none
--
- michael dykman
- mdykman@gmail.com
May the Source be with you.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: dash was converted o a wierd character
am 17.03.2010 10:02:03 von Johan De Meersman
--001485e74caa2b979e0481fb5e66
Content-Type: text/plain; charset=ISO-8859-1
On Tue, Mar 16, 2010 at 8:03 PM, Michael Dykman wrote:
> On Tue, Mar 16, 2010 at 5:06 AM, Johan De Meersman
> wrote:
> > On *nix, look for a utility called convmv.
> >
> > I've got a hunch that your original file comes from a windows host, and
> the
> > filenames may have been copied from a word document or something similar.
> > Microsoft knows best, and thus tends to convert regular dashes into some
> > weird, slightly elongated version. If you copy that to a filename, and
> then
> > move that file to a *nix host, you get strange stuff. It's all for your
> own
> > good, apparently.
>
> That is exactly the phenomenon I was referring to.and I run into it
> again and again.
>
> Here is a copy of the table explaining the details of those
> characters. It should inspire some ideas on how to address these in a
> manner appropriate to your environment.
>
I would suggest that the manner appropriate to most any environment is to
just use plain ascii for your filenames :-) The "swung dash" you refer to
is called a tilde, btw, and is mostly used in spanish.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--001485e74caa2b979e0481fb5e66--
Identifiers (was: Re: dash was converted o a wierd character)
am 17.03.2010 10:49:29 von Joerg Bruehe
Hi everybody!
Johan De Meersman wrote:
> On Tue, Mar 16, 2010 at 8:03 PM, Michael Dykman =
wrote:
>=20
>> [[...]]
>>
>=20
> I would suggest that the manner appropriate to most any environment=
is to
> just use plain ascii for your filenames :-) [[...]]
Let me voice my full support for this position.
It is necessary to support local character sets and collations in the
data, but any use of non-ASCII in the names of files or tables
(sometimes even columns) introduces compatibility and portability
problems which turn into risks of data loss.
So for your own good: Stay with ASCII in any names that may become
visible to the operating system, and anything whose correct input
(on different platforms - think heterogeneous client-server!)
may become essential for the correct operation of your system.
(The same holds for mail - my name contains the German umlauts "ö" =
and
"ü", but I will always use the ASCII spelling "oe" and "ue" in mail
addresses, subject lines, and signature files as shown below.)
On a related note: blanks!
Do yourself a favor and avoid blanks in table, file, and column names=
..
There is a reason why programming languages have the concept of
"identifier".
IMNSHO you should never call a column (or table) "customer name" but
rather use "customer_name" or "CustomerName" (where I prefer the form=
er,
because of problems with case significance on some platforms).
Anybody who ever had to deal with blanks in file names breaking a scr=
ipt
will know my reasons.
Regards,
Jörg
(using the umlaut in the mail body only)
--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstrasse 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder
Amtsgericht Muenchen: HRB161028
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
RE: dash was converted o a wierd character
am 17.03.2010 16:41:01 von Jerry Schwartz
>I would suggest that the manner appropriate to most any environment is to
>just use plain ascii for your filenames :-) The "swung dash" you refer to
>is called a tilde, btw, and is mostly used in spanish.
>
[JS] ... and mathematical notation.
I certainly agree with your suggestion about file names, I don't even like to
see spaces (although that's common on Windows platforms).
However, I thought the original question had to do with the CONTENTS, not the
file names. Getting back to what I think was the original question, I work in
a multi-lingual (mostly Western Europe and Asia). As long as I stick to UTF-8
and web browsers, I'm okay. Unfortunately I have to import data from MS Excel
worksheets a lot, and then all bets are off. When you get a worksheet that was
populated by copy/paste from MS Word, you'll wind up with all kinds of
characters that "look right" but are actually not what you think they are.
(There are some Cyrillic punctuation marks that look like Latin-1 punctuation
marks, but are not the same.)
I spent a very long time experimenting with this, and summarized my
conclusions in http://lists.mysql.com/mysql/212392.
Here's a little bit of code that I use when cleaning up blocks of text. It
might not do what you want! It works on a "transliteration" scheme that maps
"funky" characters into rough equivalents. We use this throughout our system,
so the contents of the database are consistent in certain areas.
There are other places where we use UTF-8, because the source is UTF-8. It
would have been better to use UTF-8 throughout, rather than this
transliteration scheme, but not only did I inherit a lot of existing data but
my colleagues in Japan use the "MS Mincho" font, which can't handle these
characters. (If they used "Arial Unicode MS" it would solve a lot of problems,
but I don't run the zoo.)
It's written in Visual Basic, and implemented as an Excel function, but is
easily re-used. (I have implemented the same algorithm in PHP, for our web
site.)
==========
Option Explicit
Const VERSION As String = "2009-12-18 - 11:51"
Public Function FixCP1252(CellToScan As String)
' This function will transliterate the common high-ANSI (CP1252)
' characters that come from pasting MS Office text into an Excel
' spreadsheet.
Dim Temp As String
Dim I As Integer
Dim CharsToReplace(7, 1) As String
CharsToReplace(0, 0) = Chr(&H96)
CharsToReplace(0, 1) = "-"
CharsToReplace(1, 0) = Chr(&H97)
CharsToReplace(1, 1) = "--"
CharsToReplace(2, 0) = Chr(&H91)
CharsToReplace(2, 1) = "'"
CharsToReplace(3, 0) = Chr(&H92)
CharsToReplace(3, 1) = "'"
CharsToReplace(4, 0) = Chr(&H85)
CharsToReplace(4, 1) = "..."
CharsToReplace(5, 0) = Chr(&H93)
CharsToReplace(5, 1) = """"
CharsToReplace(6, 0) = Chr(&H94)
CharsToReplace(6, 1) = """"
CharsToReplace(7, 0) = Chr(&H95)
CharsToReplace(7, 1) = "*"
Temp = CellToScan
For I = 0 To UBound(CharsToReplace, 1) - 1
Temp = Replace(Temp, CharsToReplace(I, 0), CharsToReplace(I, 1))
Next I
FixCP1252 = Temp
End Function
==========
I also have to take text with all of this weirdness and make web pages out of
it. Just in case it comes in handy, here's the code I use for that:
==========
Private Function FixTroubleChars(ByVal LineIn As String) As String
' This little function cleans out any really troublesome characters, making
substitutions
' as appropriate. We'll have to extend the coding as necessary.
Dim Temp As String
' Fix some Unicode characters that are too weird to handle normally.
According to the
' Unicode maps, some are for "private" use (meaning that they have no
standard glyph assignment).
' Microsoft's "Arial Unicode MS" font can usually give you a suggestion,
since the data
' probably came from a Windows source.
Temp = Replace(LineIn, ChrW(&HDBC0), "•")
Temp = Replace(Temp, ChrW(&HDC83), "")
Temp = Replace(Temp, ChrW(&HF0A7), "•")
FixTroubleChars = Temp
End Function
Function MyHTMLEncode(ByVal InString As String)
Dim OutString As String, CleanString As String
Dim ThisChar As String * 1
Dim I As Integer
Dim CodePoint As Long
' First, take care of anything that is truly horrible and cannot be
used.
CleanString = FixTroubleChars(InString)
' Encode all "special" characters for use in a web page.
OutString = ""
For I = 1 To Len(CleanString)
ThisChar = Mid(CleanString, I, 1)
If ThisChar Like "[- a-zA-Z0-9!""#$%'&()*+,./:;=?@]" Then
OutString = OutString & ThisChar
Else
CodePoint = AscW(ThisChar)
If CodePoint < 0 Then
MsgBox "Untranslatable character in " & vbCrLf & """" &
InString & """" & vbCrLf & vbCrLf _
& "Codepoint = " & CodePoint & vbCrLf & vbCrLf _
& "Inspect the character and modify function
FixTroubleChars() accordingly", _
vbCritical + vbOKOnly, "Bad character"
Application.Cursor = xlDefault
End
End If
OutString = OutString & "" & CodePoint & ";"
End If
Next I
MyHTMLEncode = OutString
End Function
==========
I hope this helps somebody.
Regards,
Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org