Clustered index on the identity column or a datetime column

Clustered index on the identity column or a datetime column

am 25.09.2007 15:57:37 von Tommy Hayes

Hello all,

We have a table with about 2 million rows that is used to store log
events. The table has an identity column and also a datetime column to
record the event time. It is expected to at least double in size over
its lifetime. Because the datetime records the current time, the value
of that column goes up as the value in the identity column goes up.

When doing searches on the table it is normal to have a date range as
part of the search. This would lead me to think that the clustered
index of the main table should be on the datetime column, letting SQL
Server know that a date range is sequential in the table as opposed to
the database server hopping all over the place to retrieve rows.

However is there any argument for putting the clustered index on the
identity column of the main table? The datetimes may not be unique
(highly unlikely in the given application, but logically possible I
guess) - would that lessen the advantage of the clustered index on the
datetime?

How about putting the clustered index on the identity column, and if
given two dates to form a range then look up the two corresponding
identities and turn the query into an indentity range query, thus
allowing the identity column clustered index to be used for the date
range query as well?

Or is there some way to let SQL Server know that two columns are
'linked' so that it knows to alter the query itself? (i.e. a range on
column A is equivalent to a range on column B, and column B is the
clustered index column)

Any ideas would be appreciated.
Thanks,
Tommy.

Re: Clustered index on the identity column or a datetime column

am 25.09.2007 16:08:18 von Roy Harvey

I would be inclined to cluster (unique) on the datetime/identity pair,
in that order, and set the PK on the identity non-clustered. That
gives you the date range efficiency you need.

I can see no benefit to clustering on the identity. There is no way
to tell the optimizer that the two columns are moving along "in
parallel" the way they are, so don't worry about it.

Roy Harvey
Beacon Falls, CT

On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes
wrote:

>Hello all,
>
>We have a table with about 2 million rows that is used to store log
>events. The table has an identity column and also a datetime column to
>record the event time. It is expected to at least double in size over
>its lifetime. Because the datetime records the current time, the value
>of that column goes up as the value in the identity column goes up.
>
>When doing searches on the table it is normal to have a date range as
>part of the search. This would lead me to think that the clustered
>index of the main table should be on the datetime column, letting SQL
>Server know that a date range is sequential in the table as opposed to
>the database server hopping all over the place to retrieve rows.
>
>However is there any argument for putting the clustered index on the
>identity column of the main table? The datetimes may not be unique
>(highly unlikely in the given application, but logically possible I
>guess) - would that lessen the advantage of the clustered index on the
>datetime?
>
>How about putting the clustered index on the identity column, and if
>given two dates to form a range then look up the two corresponding
>identities and turn the query into an indentity range query, thus
>allowing the identity column clustered index to be used for the date
>range query as well?
>
>Or is there some way to let SQL Server know that two columns are
>'linked' so that it knows to alter the query itself? (i.e. a range on
>column A is equivalent to a range on column B, and column B is the
>clustered index column)
>
>Any ideas would be appreciated.
>Thanks,
>Tommy.

Re: Clustered index on the identity column or a datetime column

am 25.09.2007 18:36:23 von Tommy Hayes

Thanks again Roy,

Tommy.


On Sep 25, 3:08 pm, "Roy Harvey (SQL Server MVP)"
wrote:
> I would be inclined to cluster (unique) on the datetime/identity pair,
> in that order, and set the PK on the identity non-clustered. That
> gives you the date range efficiency you need.
>
> I can see no benefit to clustering on the identity. There is no way
> to tell the optimizer that the two columns are moving along "in
> parallel" the way they are, so don't worry about it.
>
> Roy Harvey
> Beacon Falls, CT
>
> On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes
>
>
>
> wrote:
> >Hello all,
>
> >We have a table with about 2 million rows that is used to store log
> >events. The table has an identity column and also a datetime column to
> >record the event time. It is expected to at least double in size over
> >its lifetime. Because the datetime records the current time, the value
> >of that column goes up as the value in the identity column goes up.
>
> >When doing searches on the table it is normal to have a date range as
> >part of the search. This would lead me to think that the clustered
> >index of the main table should be on the datetime column, letting SQL
> >Server know that a date range is sequential in the table as opposed to
> >the database server hopping all over the place to retrieve rows.
>
> >However is there any argument for putting the clustered index on the
> >identity column of the main table? The datetimes may not be unique
> >(highly unlikely in the given application, but logically possible I
> >guess) - would that lessen the advantage of the clustered index on the
> >datetime?
>
> >How about putting the clustered index on the identity column, and if
> >given two dates to form a range then look up the two corresponding
> >identities and turn the query into an indentity range query, thus
> >allowing the identity column clustered index to be used for the date
> >range query as well?
>
> >Or is there some way to let SQL Server know that two columns are
> >'linked' so that it knows to alter the query itself? (i.e. a range on
> >column A is equivalent to a range on column B, and column B is the
> >clustered index column)
>
> >Any ideas would be appreciated.
> >Thanks,
> >Tommy.- Hide quoted text -
>
> - Show quoted text -

Re: Clustered index on the identity column or a datetime column

am 25.09.2007 23:17:13 von Gert-Jan Strik

If there are relatively few duplicates in the datetime (which is typical
for a log file), then I would go for a clustered nonunique index on the
datetime column, because that would be a bit more efficient then a
unique compound index, since under the covers the uniqueifier is only
added for duplicates. Obviously this applies to the index tree (because
the data pages would require more space in case of duplicates), but it
might just save one index depth.

--
Gert-Jan


"Roy Harvey (SQL Server MVP)" wrote:
>
> I would be inclined to cluster (unique) on the datetime/identity pair,
> in that order, and set the PK on the identity non-clustered. That
> gives you the date range efficiency you need.
>
> I can see no benefit to clustering on the identity. There is no way
> to tell the optimizer that the two columns are moving along "in
> parallel" the way they are, so don't worry about it.
>
> Roy Harvey
> Beacon Falls, CT
>
> On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes
> wrote:
>
> >Hello all,
> >
> >We have a table with about 2 million rows that is used to store log
> >events. The table has an identity column and also a datetime column to
> >record the event time. It is expected to at least double in size over
> >its lifetime. Because the datetime records the current time, the value
> >of that column goes up as the value in the identity column goes up.
> >
> >When doing searches on the table it is normal to have a date range as
> >part of the search. This would lead me to think that the clustered
> >index of the main table should be on the datetime column, letting SQL
> >Server know that a date range is sequential in the table as opposed to
> >the database server hopping all over the place to retrieve rows.
> >
> >However is there any argument for putting the clustered index on the
> >identity column of the main table? The datetimes may not be unique
> >(highly unlikely in the given application, but logically possible I
> >guess) - would that lessen the advantage of the clustered index on the
> >datetime?
> >
> >How about putting the clustered index on the identity column, and if
> >given two dates to form a range then look up the two corresponding
> >identities and turn the query into an indentity range query, thus
> >allowing the identity column clustered index to be used for the date
> >range query as well?
> >
> >Or is there some way to let SQL Server know that two columns are
> >'linked' so that it knows to alter the query itself? (i.e. a range on
> >column A is equivalent to a range on column B, and column B is the
> >clustered index column)
> >
> >Any ideas would be appreciated.
> >Thanks,
> >Tommy.