SQLServer Table Partitioning

SQLServer Table Partitioning

am 05.06.2007 18:58:41 von giorgi.piero

Hi!

I have a question:

I already have a DB that uses partitions to divide data in US
Counties, partitioned by state.

Can I use TWO levels of partitioning?

I mean... 3077 filegroups and 50 partition functions that address
them, but can I use another function to group the 50 states?

Thanks!

Piero

Re: SQLServer Table Partitioning

am 05.06.2007 22:52:27 von Erland Sommarskog

Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
> I already have a DB that uses partitions to divide data in US
> Counties, partitioned by state.
>
> Can I use TWO levels of partitioning?
>
> I mean... 3077 filegroups and 50 partition functions that address
> them, but can I use another function to group the 50 states?

Do I understand it correctly that you already have 50 partitions, and
now you want even more? About what size do you expect per partition?

I'm not sure that partitioning by state is the best strategy. The partition
for Californina will be a lot bigger than the ones for Alaska and Rhode
Island.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 05.06.2007 23:10:42 von giorgi.piero

> Do I understand it correctly that you already have 50 partitions, and
> now you want even more? About what size do you expect per partition?
>
> I'm not sure that partitioning by state is the best strategy. The partition
> for California will be a lot bigger than the ones for Alaska and Rhode
> Island.

I know that, but partitioning by county makes the DB a lot easier to
maintain.
I have to work that way because I'm dealing with criminal records, and
they are separated by county with a ton of different files, so for
many of them I have to clear the table and reload the whole county
every time I get an update. Easier on partitions... :-)

Table size can be anywhere from 8000 to 3 million records, depending
on the county.

The best way to do that would be having a table partitioned over 3077
filegroups, so storing the data will go by COUNTY in this way:

CA_ALAMEDA
CA_ALPINE
CA_AMADOR
CA_BUTTE
CA_CALAVERAS
CA_COLUSA
CA_CONTRA_COSTA
CA_DEL_NORTE
CA_EL_DORADO
CA_FRESNO
CA_GLENN
CA_HUMBOLDT
CA_IMPERIAL
CA_INYO

With the COUNTY as the partition Parameter.

But, before trying, can I have 3077 files in ONE partition, and drop
all the states stuff?

Thanks!

Piero

Re: SQLServer Table Partitioning

am 06.06.2007 00:24:43 von Erland Sommarskog

Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
> I know that, but partitioning by county makes the DB a lot easier to
> maintain.
> I have to work that way because I'm dealing with criminal records, and
> they are separated by county with a ton of different files, so for
> many of them I have to clear the table and reload the whole county
> every time I get an update. Easier on partitions... :-)
>
> Table size can be anywhere from 8000 to 3 million records, depending
> on the county.

Deleting 8000 rows is a breeze, but deleting 3 million rows takes
some resources, particularly if the rows are wide. But it still only
a matter of minutes.

> But, before trying, can I have 3077 files in ONE partition, and drop
> all the states stuff?

No, in the topic for CREATE PARTITION FUNCTION, I found that you
cannot have more than 999 boundary values.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 06.06.2007 01:59:10 von giorgi.piero

On Jun 5, 3:24 pm, Erland Sommarskog wrote:

>> But, before trying, can I have 3077 files in ONE partition, and drop
>> all the states stuff?

> No, in the topic for CREATE PARTITION FUNCTION, I found that you
> cannot have more than 999 boundary values.

Dang it... I kew there was a catch.

So, I'll be forced to have 3077 filegroups, grouped with 50
partitions.
Is there a way to have a partition function/scheme that sees other
schemes, instead of filegroups?

I mean Filegroups Counties (3077) - grouped by state (50) - all
together in ONE partitioned table.

Any Ideas?

Thank you!

Piero

Re: SQLServer Table Partitioning

am 06.06.2007 07:02:32 von Ed Murphy

Erland Sommarskog wrote:

> Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
>> I know that, but partitioning by county makes the DB a lot easier to
>> maintain.
>> I have to work that way because I'm dealing with criminal records, and
>> they are separated by county with a ton of different files, so for
>> many of them I have to clear the table and reload the whole county
>> every time I get an update. Easier on partitions... :-)
>>
>> Table size can be anywhere from 8000 to 3 million records, depending
>> on the county.
>
> Deleting 8000 rows is a breeze, but deleting 3 million rows takes
> some resources, particularly if the rows are wide. But it still only
> a matter of minutes.

I do assume that (state, county) is an index. If not, then get
that fixed yesterday.

Re: SQLServer Table Partitioning

am 06.06.2007 14:19:31 von Dan Guzman

> So, I'll be forced to have 3077 filegroups, grouped with 50
> partitions.
> Is there a way to have a partition function/scheme that sees other
> schemes, instead of filegroups?

Why do you need separate filegoups? It seems to me that the main purpose of
partitioning here is for manageability and all those files/filegroups only
add to administration complexity and wasted space.

You might consider a hybrid solution with 50 individual state tables
included in a partitioned view, with each state table partitioned by county.
This approach would leverage partitioning to quickly reload individual
counties yet provide a seamless view of the entire country.


--
Hope this helps.

Dan Guzman
SQL Server MVP

"Piero 'Giops' Giorgi" wrote in message
news:1181087950.821544.175350@j4g2000prf.googlegroups.com...
> On Jun 5, 3:24 pm, Erland Sommarskog wrote:
>
>>> But, before trying, can I have 3077 files in ONE partition, and drop
>>> all the states stuff?
>
>> No, in the topic for CREATE PARTITION FUNCTION, I found that you
>> cannot have more than 999 boundary values.
>
> Dang it... I kew there was a catch.
>
> So, I'll be forced to have 3077 filegroups, grouped with 50
> partitions.
> Is there a way to have a partition function/scheme that sees other
> schemes, instead of filegroups?
>
> I mean Filegroups Counties (3077) - grouped by state (50) - all
> together in ONE partitioned table.
>
> Any Ideas?
>
> Thank you!
>
> Piero
>

Re: SQLServer Table Partitioning

am 06.06.2007 18:14:54 von giorgi.piero

> You might consider a hybrid solution with 50 individual state tables
> included in a partitioned view, with each state table partitioned by county.
> This approach would leverage partitioning to quickly reload individual
> counties yet provide a seamless view of the entire country.

THANK YOU!
That is exactly what I want to do, but unfortunately I'm not (YET)
able to do it.

How can I have a partitioned view of partitioned tables?
I have the 50 state tables partitioned by county, but I can't get to
the next step.

Can someone post a small example of the thing?

Thanks

Piero

Re: SQLServer Table Partitioning

am 06.06.2007 18:15:41 von giorgi.piero

On Jun 5, 10:02 pm, Ed Murphy wrote:

> I do assume that (state, county) is an index. If not, then get
> that fixed yesterday.

Of course!
Actually it was fixes the day BEFORE yesterday... :-)

Piero

Re: SQLServer Table Partitioning

am 06.06.2007 23:13:01 von Erland Sommarskog

Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
>> You might consider a hybrid solution with 50 individual state tables
>> included in a partitioned view, with each state table partitioned by
>> county. This approach would leverage partitioning to quickly reload
>> individual counties yet provide a seamless view of the entire country.
>
> That is exactly what I want to do, but unfortunately I'm not (YET)
> able to do it.
>
> How can I have a partitioned view of partitioned tables?
> I have the 50 state tables partitioned by county, but I can't get to
> the next step.
>
> Can someone post a small example of the thing?

To me that sounds like a managability nightmare. While you can query
the beast in one query, when you need to flush the rows for Orange
County, you would have to explicitly to go to the CA table to
switch partitions, which would mean a lot of dynamic SQL.

I don't know if there is any catch with partition views over partitioned
tables (I really need to find some time to play with partitioned tables
to learn them!), but in a normal partitioned view you would have:

CREATE TABLE CA (state char(2) DEFAULT 'CA' CHECK (state = 'CA'),
-- other columns
PRIMARY KEY (state, county, whatever))

CREATE TABLE RI (state char(2) DEFAULT 'RI' CHECK (state = 'RI'),
...

CREATE VIEW thewholebunch AS
SELECT state, county, .....
FROM CA
UNION ALL
SELECT state, county, .....
FROM RI
....

But personally I would look into make the merging of new files more
effective than just dropping all existing rows.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 06.06.2007 23:47:50 von giorgi.piero

On Jun 6, 2:13 pm, Erland Sommarskog wrote:

> To me that sounds like a managability nightmare. While you can query
> the beast in one query, when you need to flush the rows for Orange
> County, you would have to explicitly to go to the CA table to
> switch partitions, which would mean a lot of dynamic SQL.

That can be, but the issue is the way counties update their records.

Some (VERY few) counties send a monthly "Update" file, but almost all
of them just send the whole dump of their DB.

This gives us two problems:

1) Normalizing the data in a common format (Naturally, no two counties
have the same record format)

2) Updating the db in an "Online" state for Orange County only (With
more than 3mil rows) checking for "Pre-Existing" records would just
kill the server, and there are 3077 counties...

Any better ideas?

Piero

Re: SQLServer Table Partitioning

am 07.06.2007 09:27:44 von Erland Sommarskog

Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
> 1) Normalizing the data in a common format (Naturally, no two counties
> have the same record format)

But this is a problem you have no matter the solution to load the data,
right? Since you work with a partitioned table, you have the same schemas
for all counties. Once you have normalised the format, you can load it
into a staging table and work from there.

> 2) Updating the db in an "Online" state for Orange County only (With
> more than 3mil rows) checking for "Pre-Existing" records would just
> kill the server, and there are 3077 counties...

Did you actually try it? With proper indexing this does not have to be
that painful.

Then again, if you have code to normalise the data for 3077 counties, I
guess 50 tables with a total of 3077 partitions is a smaller headache.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 07.06.2007 14:15:56 von Dan Guzman

Erland, thanks for providing Piero with he partitioned view example.


> To me that sounds like a managability nightmare. While you can query
> the beast in one query, when you need to flush the rows for Orange
> County, you would have to explicitly to go to the CA table to
> switch partitions, which would mean a lot of dynamic SQL.

It's true that Piero's county import process will need to be state-table
aware. Rather than traditional dynamic SQL, an alternative is SQLCMD
scripts with variables executed via SQLCMD.EXE or SSMS in SQLCMD mode.
IMHO, the SQLCMD variable approach is a bit cleaner.

The example below assumes the state tables, partition functions and
partition schemes all have a state code suffix. The archive and staging
tables use the same partitioning scheme as the primary table to simply
things. This ensures those objects are on the same file groups and also
eliminates the need to create a check constraint on the staging table county
column.


> I don't know if there is any catch with partition views over partitioned
> tables

I must admit I had not considered a partitioned view over partitioned tables
before this thread. I did some cursory testing with Piero's state/county
scenario and it seems to work as expected but there might be gochas. My
biggest concern here is with query complexity when joining the partitioned
view. This approach is probably rarely used so Piero should probably test
thoroughly before committing.


> (I really need to find some time to play with partitioned tables
> to learn them!)

I'm fortunate because I had to develop a complex sliding-window partitioning
scheme for one of our applications that gave me the opportunity to learn the
finer points of SQL 2005 partitioning. There's just so many cool features
in the product nowadays that it's hard to find the time to thoroughly learn
most, let alone all. Now if I could only get fully up to speed on the new
features before SQL Server 2008 ;-)


--sample script for county import process

--define and initialize SQLCMD variables
:setvar StateCode WA
:setvar County Spokane

--create archive table for county
IF OBJECT_ID(N'dbo.CountyArchive', 'U') IS NOT NULL
DROP TABLE dbo.CountyArchive
GO
CREATE TABLE dbo.CountyArchive
(
StateCode char(2) NOT NULL,
County varchar(50) NOT NULL,
CountyData varchar(100),
CONSTRAINT PK_CountyArchive
PRIMARY KEY CLUSTERED (StateCode, County)
ON PS_State_$(StateCode)(county)
)
GO

--create staging table for county
IF OBJECT_ID(N'dbo.CountyStaging', 'U') IS NOT NULL
DROP TABLE dbo.CountyStaging
GO

CREATE TABLE dbo.CountyStaging
(
StateCode char(2) NOT NULL,
County varchar(50) NOT NULL,
CountyData varchar(100),
CONSTRAINT PK_CountyStaging
PRIMARY KEY CLUSTERED (StateCode, County)
ON PS_State_$(StateCode)(county)
)
GO

---------------------------------
--load dbo.CountyStaging table here
---------------------------------

--add constraint needed for parttitioned view
ALTER TABLE dbo.CountyStaging
ADD CONSTRAINT CK_CountyStaging_State CHECK (StateCode = '$(StateCode)')

--move old county data to archive table
ALTER TABLE dbo.State_$(StateCode)
SWITCH PARTITION $PARTITION.PF_State_$(StateCode)('$(County)') TO
dbo.CountyArchive PARTITION
$PARTITION.PF_State_$(StateCode)('$(County)')

--move new county data into state table
ALTER TABLE dbo.CountyStaging
SWITCH PARTITION $PARTITION.PF_State_$(StateCode)('$(County)') TO
dbo.State_$(StateCode) PARTITION
$PARTITION.PF_State_$(StateCode)('$(County)')


--
Hope this helps.

Dan Guzman
SQL Server MVP

"Erland Sommarskog" wrote in message
news:Xns9947EC5A8DAD2Yazorman@127.0.0.1...
> Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
>>> You might consider a hybrid solution with 50 individual state tables
>>> included in a partitioned view, with each state table partitioned by
>>> county. This approach would leverage partitioning to quickly reload
>>> individual counties yet provide a seamless view of the entire country.
>>
>> That is exactly what I want to do, but unfortunately I'm not (YET)
>> able to do it.
>>
>> How can I have a partitioned view of partitioned tables?
>> I have the 50 state tables partitioned by county, but I can't get to
>> the next step.
>>
>> Can someone post a small example of the thing?
>
> To me that sounds like a managability nightmare. While you can query
> the beast in one query, when you need to flush the rows for Orange
> County, you would have to explicitly to go to the CA table to
> switch partitions, which would mean a lot of dynamic SQL.
>
> I don't know if there is any catch with partition views over partitioned
> tables (I really need to find some time to play with partitioned tables
> to learn them!), but in a normal partitioned view you would have:
>
> CREATE TABLE CA (state char(2) DEFAULT 'CA' CHECK (state = 'CA'),
> -- other columns
> PRIMARY KEY (state, county, whatever))
>
> CREATE TABLE RI (state char(2) DEFAULT 'RI' CHECK (state = 'RI'),
> ...
>
> CREATE VIEW thewholebunch AS
> SELECT state, county, .....
> FROM CA
> UNION ALL
> SELECT state, county, .....
> FROM RI
> ....
>
> But personally I would look into make the merging of new files more
> effective than just dropping all existing rows.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 07.06.2007 20:48:01 von giorgi.piero

On Jun 7, 5:15 am, "Dan Guzman"
wrote:

> Hope this helps.

That's exactly what I had in mind. Thank you!

One thing... is there any way to partition the states, too?
I mean Query the whole beast with only one SQL Query?

Piero

Re: SQLServer Table Partitioning

am 07.06.2007 23:16:19 von Erland Sommarskog

Piero 'Giops' Giorgi (giorgi.piero@gmail.com) writes:
> One thing... is there any way to partition the states, too?
> I mean Query the whole beast with only one SQL Query?

You would query the view.

One idea to occurred to me is that you could have a mix, so that big
counties like Orange County(*) have a single partition, where as
smaller counties and states would be gathered in the same partition.
This would mean that you would have different models to load the files.
For Orange County you switch table in and out, whereas for smaller
counties you delete and insert. Of course, this means more complex
code since there would be two code paths. But there would be far
fewer partitions to care about.


(*) When I picked Orange County as an example, I did not know that it
was one of the biggies. I just picked it as it was one of the county
names I knew; the name appears in a few Zappa tracks.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: SQLServer Table Partitioning

am 08.06.2007 18:31:01 von giorgi.piero

On Jun 7, 2:16 pm, Erland Sommarskog wrote:
> Piero 'Giops' Giorgi (giorgi.pi...@gmail.com) writes:
>
> > One thing... is there any way to partition the states, too?
> > I mean Query the whole beast with only one SQL Query?
>
> You would query the view.

Yes, that is what I'm doing. It's not the optimal, but it's working
enough.

> One idea to occurred to me is that you could have a mix, so that big
> counties like Orange County(*) have a single partition, where as
> smaller counties and states would be gathered in the same partition.

That would give problems while updating the DB.
Ok, I drop the big ones and update the small ones, but IMHO it would
be better stay with one partition per County, grouped by state.
The BEST thing would be to be able to partition the states (Already
partitioned by County) in one big table.

Possible?

> (*) When I picked Orange County as an example, I did not know that it
> was one of the biggies. I just picked it as it was one of the county
> names I knew; the name appears in a few Zappa tracks.

It's a BIG one!
But you should see New York...
I'm not saying that there are more criminals in NY, but the city is
bigger and there is a lot more people, so...

For now, I'm experimenting with 50 States, partitioned over 3077
Counties. Seems promising, but the UNION query is a resource hog. :-
(

Thank you all!

P

Piero