Weird sorting order

Weird sorting order

am 16.07.2010 14:14:09 von Robert Voinea

Hi...

I have the following table:

CREATE TABLE test
(
id SERIAL PRIMARY KEY,
val VARCHAR(32) NOT NULL
);

INSERT INTO test VALUES
(DEFAULT, '##34''),
(DEFAULT, '##32'),
(DEFAULT, '##31'),
(DEFAULT, '2ff'),
(DEFAULT, '##26'),
(DEFAULT, '2##33'),
(DEFAULT, '2##25'),
(DEFAULT, '2##24'),
(DEFAULT, '2##23'),
(DEFAULT, '211'),
(DEFAULT, '210'),
(DEFAULT, '203'),
(DEFAULT, '202'),
(DEFAULT, '201'),
(DEFAULT, '200');

Why is it that when running the query:

SELECT * FROM test ORDER BY val;

I get the following result?
id | val
----+-------
1 | 200
2 | 201
3 | 202
4 | 203
5 | 210
6 | 211
7 | 2##23
8 | 2##24
9 | 2##25
10 | 2##33
11 | ##26
12 | 2ff
13 | ##31
14 | ##32
15 | ##34
(15 rows)

Shouldn't value '2ff' be placed right after '211' but before '2##23'?

Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu.

Thank you!...


--
Robert Voinea
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 16.07.2010 17:34:06 von Kevin Grittner

Robert Voinea wrote:

> Why is it that when running the query:
>
> SELECT * FROM test ORDER BY val;
>
> I get the following result?
> id | val
> ----+-------
> 1 | 200
> 2 | 201
> 3 | 202
> 4 | 203
> 5 | 210
> 6 | 211
> 7 | 2##23
> 8 | 2##24
> 9 | 2##25
> 10 | 2##33
> 11 | ##26
> 12 | 2ff
> 13 | ##31
> 14 | ##32
> 15 | ##34
> (15 rows)
>
> Shouldn't value '2ff' be placed right after '211' but before
> '2##23'?

That depends on your collation configuration. What do you get
from?:

show lc_collate;

In many collations, special characters such as '#' are ignored.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 16.07.2010 17:34:59 von DHSC Webmaster

This is a multi-part message in MIME format.
--------------020401090705090308060301
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

It's ignoring the hash marks. It's like they are invisible characters.
Tested on 8.1.16

-------- Original Message --------
Subject: [ADMIN] Weird sorting order
Date: Fri, 16 Jul 2010 15:14:09 +0300
From: Robert Voinea
Organization: Topex
To: pgsql-admin@postgresql.org

Hi...

I have the following table:

CREATE TABLE test
(
id SERIAL PRIMARY KEY,
val VARCHAR(32) NOT NULL
);

INSERT INTO test VALUES
(DEFAULT, '##34''),
(DEFAULT, '##32'),
(DEFAULT, '##31'),
(DEFAULT, '2ff'),
(DEFAULT, '##26'),
(DEFAULT, '2##33'),
(DEFAULT, '2##25'),
(DEFAULT, '2##24'),
(DEFAULT, '2##23'),
(DEFAULT, '211'),
(DEFAULT, '210'),
(DEFAULT, '203'),
(DEFAULT, '202'),
(DEFAULT, '201'),
(DEFAULT, '200');

Why is it that when running the query:

SELECT * FROM test ORDER BY val;

I get the following result?
id | val
----+-------
1 | 200
2 | 201
3 | 202
4 | 203
5 | 210
6 | 211
7 | 2##23
8 | 2##24
9 | 2##25
10 | 2##33
11 | ##26
12 | 2ff
13 | ##31
14 | ##32
15 | ##34
(15 rows)

Shouldn't value '2ff' be placed right after '211' but before '2##23'?

Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu.

Thank you!...


--
Robert Voinea
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


--------------020401090705090308060301
Content-Type: text/plain;
name="SpamAssassinReport.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="SpamAssassinReport.txt"

U3BhbSBkZXRlY3Rpb24gc29mdHdhcmUsIHJ1bm5pbmcgb24gdGhlIHN5c3Rl
bSAibWFpbC5kaHMtY2x1Yi5jb20iLCBoYXMNCmlkZW50aWZpZWQgdGhpcyBp
bmNvbWluZyBlbWFpbCBhcyBwb3NzaWJsZSBzcGFtLiAgVGhlIG9yaWdpbmFs
IG1lc3NhZ2UNCmhhcyBiZWVuIGF0dGFjaGVkIHRvIHRoaXMgc28geW91IGNh
biB2aWV3IGl0IChpZiBpdCBpc24ndCBzcGFtKSBvciBsYWJlbA0Kc2ltaWxh
ciBmdXR1cmUgZW1haWwuICBJZiB5b3UgaGF2ZSBhbnkgcXVlc3Rpb25zLCBz
ZWUNCnBvc3RtYXN0ZXJAZGhzLWNsdWIuY29tIGZvciBkZXRhaWxzLg0KDQpD
b250ZW50IHByZXZpZXc6ICBIaS4uLiBJIGhhdmUgdGhlIGZvbGxvd2luZyB0
YWJsZTogQ1JFQVRFIFRBQkxFIHRlc3QgKCBpZCBTRVJJQUwNCiAgIFBSSU1B
UlkgS0VZLCB2YWwgVkFSQ0hBUigzMikgTk9UIE5VTEwgKTsgWy4uLl0gDQoN
CkNvbnRlbnQgYW5hbHlzaXMgZGV0YWlsczogICAoMi4wIHBvaW50cywgNS4w
IHJlcXVpcmVkKQ0KDQogcHRzIHJ1bGUgbmFtZSAgICAgICAgICAgICAgZGVz
Y3JpcHRpb24NCi0tLS0gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSAtLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0K
IDAuNiBSQ1ZEX0lOX1NPUkJTX1dFQiAgICAgIFJCTDogU09SQlM6IHNlbmRl
ciBpcyBhIGFidXNlYWJsZSB3ZWIgc2VydmVyDQogICAgICAgICAgICAgICAg
ICAgICAgICAgICAgWzg5LjM4LjE3NC4xOTQgbGlzdGVkIGluIGRuc2JsLnNv
cmJzLm5ldF0NCiAwLjAgQkFZRVNfNTAgICAgICAgICAgICAgICBCT0RZOiBC
YXllc2lhbiBzcGFtIHByb2JhYmlsaXR5IGlzIDQwIHRvIDYwJQ0KICAgICAg
ICAgICAgICAgICAgICAgICAgICAgIFtzY29yZTogMC41MDE1XQ0KIDEuNCBB
V0wgICAgICAgICAgICAgICAgICAgIEFXTDogRnJvbTogYWRkcmVzcyBpcyBp
biB0aGUgYXV0byB3aGl0ZS1saXN0DQoNCg0KDQo=

--------------020401090705090308060301
Content-Type: text/plain
Content-Disposition: inline
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable


--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

--------------020401090705090308060301--

Re: Weird sorting order

am 19.07.2010 07:23:13 von Robert Voinea

On Friday 16 July 2010 18:34:06 Kevin Grittner wrote:
> Robert Voinea wrote:
> > Why is it that when running the query:
> >
> > SELECT * FROM test ORDER BY val;
> >
> > I get the following result?
> >
> > id | val
> >
> > ----+-------
> >
> > 1 | 200
> > 2 | 201
> > 3 | 202
> > 4 | 203
> > 5 | 210
> > 6 | 211
> > 7 | 2##23
> > 8 | 2##24
> > 9 | 2##25
> >
> > 10 | 2##33
> > 11 | ##26
> > 12 | 2ff
> > 13 | ##31
> > 14 | ##32
> > 15 | ##34
> >
> > (15 rows)
> >
> > Shouldn't value '2ff' be placed right after '211' but before
> > '2##23'?
>
> That depends on your collation configuration. What do you get
> from?:
>
> show lc_collate;
>
> In many collations, special characters such as '#' are ignored.

=> show lc_collate;
lc_collate
-------------
en_US.UTF-8

--
Robert Voinea
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 19.07.2010 16:11:39 von Kevin Grittner

Robert Voinea wrote:

> => show lc_collate;
> lc_collate
> -------------
> en_US.UTF-8

I'm afraid the order you're seeing is what you're supposed to get
for that collation sequence. In that collation, special characters
(including spaces) are only used as tie-breakers for values which
are tied when the special characters are ignored. There may be a
few consequences of that which you haven't yet found. As one
example:

test=# show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)

test=# create table t1 (c1 text);
CREATE TABLE
test=# insert into t1 values ('one'),(' one'),('one
'),('##one'),('one##');
INSERT 0 5

test=# select '"' || c1 || '"' from t1 order by c1;
?column?
----------
"one"
" one"
"##one"
"one "
"one##"
(5 rows)

test=# select '"' || c1 || '"' from t1 order by c1 desc;
?column?
----------
"one##"
"one "
"##one"
" one"
"one"
(5 rows)

For that reason, we have (so far) used the C locale, which provides
the binary sort you probably expected, and we use special columns,
maintained by triggers, to control selection and sequencing as
needed -- for example we have a "searchName" column in any table
where we have name columns, which is forced into a canonical format.

FWIW, our algorithm for generating a canonical name also excludes
spaces and the '#' character, although it still has significant
differences from the en_US.UTF-8 collation.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 19.07.2010 16:28:47 von Robert.Burgholzer

This is really interesting. Is there a way to set the locale on the fly
for a sort? Having looked at the docs, it appears as if the initDB step
is the only opportunity to do so.

Regards,
r.b.


--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 19.07.2010 18:15:32 von Kevin Grittner

Burgholzer, Robert (DEQ)"
wrote:

> This is really interesting. Is there a way to set the locale on
> the fly for a sort? Having looked at the docs, it appears as if
> the initDB step is the only opportunity to do so.

Starting with 8.4 you can specify it at the database level:

http://www.postgresql.org/docs/8.4/interactive/sql-createdat abase.html

Peter Eisentraut is working on a way to set the collation for an
individual column or (I think) for a sort; but that won't be in 9.0.
Hopefully in 9.1....

http://archives.postgresql.org/message-id/1279045531.32647.1 4.camel@vanquo.pezone.net

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Weird sorting order

am 21.07.2010 07:56:14 von Robert Voinea

On Monday 19 July 2010 19:15:32 Kevin Grittner wrote:
> Burgholzer, Robert (DEQ)"
>
> wrote:
> > This is really interesting. Is there a way to set the locale on
> > the fly for a sort? Having looked at the docs, it appears as if
> > the initDB step is the only opportunity to do so.
>
> Starting with 8.4 you can specify it at the database level:
>
> http://www.postgresql.org/docs/8.4/interactive/sql-createdat abase.html
>
> Peter Eisentraut is working on a way to set the collation for an
> individual column or (I think) for a sort; but that won't be in 9.0.
> Hopefully in 9.1....
>
> http://archives.postgresql.org/message-id/1279045531.32647.1 4.camel@vanquo.
> pezone.net
>
> -Kevin


Thank you for clarifying this.

--
Robert Voinea
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin