Sorting lists in .Net - why it sucks
Sorting lists in .Net - why it sucks
am 25.01.2008 14:38:18 von nightwatch77
Today I stumbled upon a very interesting 'feature' of .net 2.0
My application uses a generic SortedList class for storing a sorted
list of objects (on a DateTime key) - that is, SortedList
object>. However, the program kept crashing for some time until I
debugged it to find out that it crashes on inserting data to the
sorted list. Problem was that I was inserting two items with the same
date (the same key).
Hey, how do you think, should sorted list just crash when there is a
duplicate value? Normal lists are expected to store duplicate values,
but some great inventor at Microsoft has made it better. They have
documented this behavior, but why they assume sorting shouldn't work
for duplicate entries in the list? Normal sorting algorithms are just
fine with that.
Of course, I had to use non-generic ArrayList instead with custom
IComparer for sorting.
And there goes another surprise - ArrayList.Sort implements an
unstable sort algorithm. That is, for equal objects, it does not
guarantee that after sorting they will keep the original order. Of
course this is not documented at all. And of course causes problems -
in my case it would make sequential messages with timestamp come in
wrong order.
I don't know what sort algorithm did Microsoft use, but simple
QuickSort implementation is a stable algorithm and has no problem with
sorting duplicated values. When Microsoft recruits any developer, they
ask him to implement some list sorting algorithm during job interview,
so I thought all developers in Microsoft are capable of implementing
it correctly.
Best regards,
RG
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 15:41:55 von Alexander Gilman Carver
For info, Wintellect (http://www.wintellect.com/PowerCollections.aspx) has
written a set of generic collection classes that support what you are
looking for (OrderedBag).
From Jeffrey Richter Book:
---------------------------
At Microsoft's request, Wintellect has produced the "Power Collections
Library" to bring some of the C++ Standard Template Library's collection
classes to the CLR programmer.
- José
"nightwatch77" wrote in message
news:8c3f6f35-5f68-4cdf-8fd5-8aa2b462c433@v46g2000hsv.google groups.com...
> Today I stumbled upon a very interesting 'feature' of .net 2.0
> My application uses a generic SortedList class for storing a sorted
> list of objects (on a DateTime key) - that is, SortedList
> object>. However, the program kept crashing for some time until I
> debugged it to find out that it crashes on inserting data to the
> sorted list. Problem was that I was inserting two items with the same
> date (the same key).
> Hey, how do you think, should sorted list just crash when there is a
> duplicate value? Normal lists are expected to store duplicate values,
> but some great inventor at Microsoft has made it better. They have
> documented this behavior, but why they assume sorting shouldn't work
> for duplicate entries in the list? Normal sorting algorithms are just
> fine with that.
> Of course, I had to use non-generic ArrayList instead with custom
> IComparer for sorting.
> And there goes another surprise - ArrayList.Sort implements an
> unstable sort algorithm. That is, for equal objects, it does not
> guarantee that after sorting they will keep the original order. Of
> course this is not documented at all. And of course causes problems -
> in my case it would make sequential messages with timestamp come in
> wrong order.
>
> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values. When Microsoft recruits any developer, they
> ask him to implement some list sorting algorithm during job interview,
> so I thought all developers in Microsoft are capable of implementing
> it correctly.
>
> Best regards,
> RG
>
>
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 16:05:08 von Alexander Gilman Carver
ouups, the one you need is:
OrderedMultiDictionary
- José
"José Joye" wrote in message
news:FF2FA6B1-C294-4013-A1DA-C444038D8FC8@microsoft.com...
> For info, Wintellect (http://www.wintellect.com/PowerCollections.aspx) has
> written a set of generic collection classes that support what you are
> looking for (OrderedBag).
>
> From Jeffrey Richter Book:
> ---------------------------
> At Microsoft's request, Wintellect has produced the "Power Collections
> Library" to bring some of the C++ Standard Template Library's collection
> classes to the CLR programmer.
>
> - José
>
> "nightwatch77" wrote in message
> news:8c3f6f35-5f68-4cdf-8fd5-8aa2b462c433@v46g2000hsv.google groups.com...
>> Today I stumbled upon a very interesting 'feature' of .net 2.0
>> My application uses a generic SortedList class for storing a sorted
>> list of objects (on a DateTime key) - that is, SortedList
>> object>. However, the program kept crashing for some time until I
>> debugged it to find out that it crashes on inserting data to the
>> sorted list. Problem was that I was inserting two items with the same
>> date (the same key).
>> Hey, how do you think, should sorted list just crash when there is a
>> duplicate value? Normal lists are expected to store duplicate values,
>> but some great inventor at Microsoft has made it better. They have
>> documented this behavior, but why they assume sorting shouldn't work
>> for duplicate entries in the list? Normal sorting algorithms are just
>> fine with that.
>> Of course, I had to use non-generic ArrayList instead with custom
>> IComparer for sorting.
>> And there goes another surprise - ArrayList.Sort implements an
>> unstable sort algorithm. That is, for equal objects, it does not
>> guarantee that after sorting they will keep the original order. Of
>> course this is not documented at all. And of course causes problems -
>> in my case it would make sequential messages with timestamp come in
>> wrong order.
>>
>> I don't know what sort algorithm did Microsoft use, but simple
>> QuickSort implementation is a stable algorithm and has no problem with
>> sorting duplicated values. When Microsoft recruits any developer, they
>> ask him to implement some list sorting algorithm during job interview,
>> so I thought all developers in Microsoft are capable of implementing
>> it correctly.
>>
>> Best regards,
>> RG
>>
>>
>
RE: Sorting lists in .Net - why it sucks
am 25.01.2008 16:26:00 von FamilyTreeMike
"nightwatch77" wrote:
> Today I stumbled upon a very interesting 'feature' of .net 2.0
> My application uses a generic SortedList class for storing a sorted
> list of objects (on a DateTime key) - that is, SortedList
> object>. However, the program kept crashing for some time until I
> debugged it to find out that it crashes on inserting data to the
> sorted list. Problem was that I was inserting two items with the same
> date (the same key).
> Hey, how do you think, should sorted list just crash when there is a
> duplicate value? Normal lists are expected to store duplicate values,
> but some great inventor at Microsoft has made it better. They have
> documented this behavior, but why they assume sorting shouldn't work
> for duplicate entries in the list? Normal sorting algorithms are just
> fine with that.
> Of course, I had to use non-generic ArrayList instead with custom
> IComparer for sorting.
> And there goes another surprise - ArrayList.Sort implements an
> unstable sort algorithm. That is, for equal objects, it does not
> guarantee that after sorting they will keep the original order. Of
> course this is not documented at all. And of course causes problems -
> in my case it would make sequential messages with timestamp come in
> wrong order.
>
> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values. When Microsoft recruits any developer, they
> ask him to implement some list sorting algorithm during job interview,
> so I thought all developers in Microsoft are capable of implementing
> it correctly.
>
> Best regards,
> RG
>
>
You should create your own class that encapsulates a DateTime object and
implements IComparable. You just need to handle the equals case as you see
fit. Here is an example of what I mean:
namespace SortedListTest
{
public class ComparableDateTime : IComparable
{
private DateTime mDT;
public int CompareTo ( object obj )
{
if (mDT.Ticks < ((ComparableDateTime) obj).mDT.Ticks) return -1;
return 1;
}
public ComparableDateTime ( long milliseconds )
{
mDT = new DateTime ( milliseconds );
}
}
class Program
{
static void Main ( string [ ] args )
{
SortedList msl;
msl = new SortedList ();
msl.Add ( new ComparableDateTime ( 123456786 ), "string one" );
msl.Add ( new ComparableDateTime ( 123456786 ), "string two" );
Console.Out.WriteLine ( );
foreach (string s in MySortedList.Values)
{
Console.WriteLine ( s );
}
Console.Out.WriteLine ( );
Console.ReadLine ();
}
}
}
>
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 16:38:26 von Marc Gravell
When you're done with the venom, this is clearly documented:
http://msdn2.microsoft.com/en-us/library/ms132319.aspx
"Every key in a SortedList<(Of <(TKey, TValue>)>) must be unique."
(equally, the exception tells you very clearly what the problem is)
With regards to unstable sort, this too is clearly documented:
http://msdn2.microsoft.com/en-us/library/b0zbh7b6.aspx
"This implementation performs an unstable sort; that is, if two
elements are equal, their order might not be preserved."
I might be mistaken but I seem to recall that the LINQ sort is stable.
Likewise in like there is an ILookup for duplicated keys,
but the default implementation (Lookup) is immutable. It
took me only a few moments to write a mutable ILookup
implementation, however.
Marc
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 17:20:47 von skeet
On Jan 25, 1:38 pm, nightwatch77 wrote:
> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values.
I assume by "simple" you mean "using extra memory rather than sorting
in place". The more efficient in-place quick sort isn't stable.
Jon
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 20:17:57 von nightwatch77
Hi,
- Family Tree Mike, you probably didn't notice I have implemented
IComparer, thanks anyway.
- Marc, I know Microsoft has documented the weird SortedList behavior,
but does it add more sense to the API?
- Jon, you're right, in-place quicksort is unstable, but anyway, there
is a stable version.
Still, the sorting functionality is very essential to a collections
API, just like a good hashtable implementation. It should be done
better and programmers should have the option to select sorting
algorithm.
best regards
RG
Re: Sorting lists in .Net - why it sucks
am 25.01.2008 23:59:53 von Dejan Stanic
Check out http://www.itu.dk/research/c5/.
Great stuff.
LP,
Dejan
Re: Sorting lists in .Net - why it sucks
am 26.01.2008 01:15:02 von FamilyTreeMike
I guess what I was too subtle about was, that by
1. writing an IComparable (not IComparer) class wrapping DateTime,
2. that the comparison routine never returns equal,
the code does not crash on equal keys.
"nightwatch77" wrote:
> Hi,
>
> - Family Tree Mike, you probably didn't notice I have implemented
> IComparer, thanks anyway.
> - Marc, I know Microsoft has documented the weird SortedList behavior,
> but does it add more sense to the API?
> - Jon, you're right, in-place quicksort is unstable, but anyway, there
> is a stable version.
> Still, the sorting functionality is very essential to a collections
> API, just like a good hashtable implementation. It should be done
> better and programmers should have the option to select sorting
> algorithm.
> best regards
> RG
>
>
>
Re: Sorting lists in .Net - why it sucks
am 28.01.2008 12:32:27 von nightwatch77
On Jan 26, 1:15 am, Family Tree Mike
wrote:
> I guess what I was too subtle about was, that by
>
> 1. writing an IComparable (not IComparer) class wrapping DateTime,
> 2. that the comparison routine never returns equal,
>
> the code does not crash on equal keys.
Mike, your solution is very simple but doesn't help a bit because it
doesn't guarantee stable sorting. Equal timestamps is something I have
to handle in a certain way, that is I have to get them in the same
order as in input, and your IComparable just makes sure two timestamps
are never equal - that solves some problem, but not mine.
Re: Sorting lists in .Net - why it sucks
am 28.01.2008 12:51:46 von Marc Gravell
To restate - the LINQ sort is stable:
http://technet.microsoft.com/en-us/library/bb549422.aspx
"This method performs a stable sort; that is, if the keys of two
elements are equal, the order of the elements is preserved."
If .NET 3.5 isn't an option, then hacky though it sounds, perhaps
consider adding a member to keep track of the original insertion order
(just an int), and then use that as a secondary after your primary
sort condition(s) - i.e. the following will give you a stable sort
(assuming you give each Foo incremental Sequence values, perhaps using
something in the ctor):
class Foo {
public int Sequence { get; set; }
public double Value { get; set; }
}
public class FooComparer : IComparer {
int Compare(Foo x, Foo y) {
// primary sort
int result = x.Value.CompareTo(y.Value);
// backup for stability
if (result == 0) {
result = x.Sequence.CompareTo(y.Sequence);
}
return result;
}
}
Yes it is duplication...
Third option: use an external library.
Marc