Transform/transfer 50Gb - how to do it fast?
Transform/transfer 50Gb - how to do it fast?
am 28.04.2007 07:54:56 von B D Jensen
Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this
insert into newtable
select id, func1(col1), func2(col2)
from oldtable;
I'm also plan to make newtable partioned (before insert).
But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen
Re: Transform/transfer 50Gb - how to do it fast?
am 28.04.2007 10:53:10 von Mork69
Hi,
The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation
regards,
Malc
B D Jensen wrote:
> Hello!
> I have an big table with 50Gb data, written som functions
> that cleanup data and want to do something like this
>
> insert into newtable
> select id, func1(col1), func2(col2)
> from oldtable;
>
> I'm also plan to make newtable partioned (before insert).
>
> But how could i get the insert as fast as possible?
> Greetings
> Bjorn D. Jensen
Re: Transform/transfer 50Gb - how to do it fast?
am 28.04.2007 11:04:31 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> I have an big table with 50Gb data, written som functions
> that cleanup data and want to do something like this
>
> insert into newtable
> select id, func1(col1), func2(col2)
> from oldtable;
>
> I'm also plan to make newtable partioned (before insert).
>
> But how could i get the insert as fast as possible?
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.
Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)
Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.
And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:
SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END
Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.
If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 28.04.2007 17:55:29 von Erland Sommarskog
Mork69 (mleach@bigfoot.com) writes:
> The quickest way to do this is to use INSERT INTO...SELECT FROM as it
> is a non-logged operation
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 06:01:44 von B D Jensen
>
> Exactly what is in those functions? Do they perform data access? Are
> they written in T-SQL or in the CLR? I ask, because they could have a
> great impact on performance.
>
Hello Erland!
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage,
so the functions check that values are in correct domain and if not
they return null
- what is a correct result, because the values then are physical
impossible.
I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I also expected I must create table first (and then not being able to
use 'select into')
because of the new datatypes - but of course I could write:
select id, cast(func1(col1) as ) into newtbl from oldtbl
Doing this multiple times with the appropiate where-clause
followed by partion switches maybe will be the solution; I'll
investigate...
.... but if you have some comments let me hear ;-)
Greetings
Bjorn D. Jensen
bjorn.d.jensen@gmail.com
P.S. I already know your website, thanks for all that good
information!
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 10:52:32 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> Functions are written in T-SQL (i also wrote them in CLR, but in this
> case they were slower). The original columns have incorrect datatypes,
> that uses too much storage, so the functions check that values are in
> correct domain and if not they return null - what is a correct result,
> because the values then are physical impossible.
I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.
> I wondered why you only wrote that I can't use "select into" for
> patitioned tables.
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 14:20:20 von B D Jensen
Hi again!
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??
I think there is another problem: the original table is not in the
right filegroup
and if I understand it right, It must be or must be to make switch
ultra fast;
I'll investigate...
Greetings
Bjorn
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 17:43:09 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> As I see the cast is not needed, because the functions return correct
> datatype.
> Is it that what you mean with "inline"??
I don't know what your functions do, but it seemed from your description
that I could expect something like:
CREATE FUNCTION makesmaller(@x bigint) RETURNS tinyint AS
BEGIN
RETURN (CASE WHEN @x BETWEEN 0 AND 255 THEN @x ELSE NULL END)
END
Then in your INSERT operation you would rather write:
SELECT CASE WHEN bigcol BETWEEN 0 AND 255 THEN bigcol ELSE NULL END,
...
then
SELECT dbo.makesmaller(bigcol), ...
there is an overhead for the call, although it seems to be a lot less
in SQL 2005 than in SQL 2000.
> I think there is another problem: the original table is not in the right
> filegroup and if I understand it right, It must be or must be to make
> switch ultra fast;
Yes, but as you are about to ditch the original table that is not much
of an issue, or?
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 18:26:13 von B D Jensen
Hi!
You nearly guessed on of my functions;
but I use <= and >= in stead of 'between'.
I didn't understand the last part about "ditch" (what means that?).
Will the use of functions make the select into very slow?
Greetings
Bjorn
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 18:44:17 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> I didn't understand the last part about "ditch" (what means that?).
To ditch = slänga, kasta, göra sig av med.
> Will the use of functions make the select into very slow?
Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 22:04:42 von Dan Guzman
> I assumed that it is not possible to create a partitioned tables from
> existing ones. But I have not worked much with partitioned tables, so
> I could be wrong.
It is possible to move a non-partitioned table (actually a single partition)
into a partitioned table with ALTER TABLE...SWITCH PARTITION. The
source/target table must have the same schema (including indexes) and
table/indexes must reside on the same , filegroup(s). Also, the source
table must have a check constraint on the partitioning column to ensure data
is within the target partition boundaries.
One caveat is that the index stats are not updated when data is switched
into the partitioned table so it's probably a good idea to update stats
after SWITCH.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Erland Sommarskog" wrote in message
news:Xns99216E9ABE6C8Yazorman@127.0.0.1...
>B D Jensen (bjorn.d.jensen@gmail.com) writes:
>> Functions are written in T-SQL (i also wrote them in CLR, but in this
>> case they were slower). The original columns have incorrect datatypes,
>> that uses too much storage, so the functions check that values are in
>> correct domain and if not they return null - what is a correct result,
>> because the values then are physical impossible.
>
> I would recommend that you have the expressions inline, at least if
> you desire to cut down execution time.
>
>> I wondered why you only wrote that I can't use "select into" for
>> patitioned tables.
>
> I assumed that it is not possible to create a partitioned tables from
> existing ones. But I have not worked much with partitioned tables, so
> I could be wrong.
>
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 29.04.2007 23:19:25 von Erland Sommarskog
Dan Guzman (guzmanda@nospam-online.sbcglobal.net) writes:
> It is possible to move a non-partitioned table (actually a single
> partition) into a partitioned table with ALTER TABLE...SWITCH PARTITION.
> The source/target table must have the same schema (including indexes)
> and table/indexes must reside on the same , filegroup(s). Also, the
> source table must have a check constraint on the partitioning column to
> ensure data is within the target partition boundaries.
Ah, that's great. That would it would be possible for Bjørn to create his
partitions with SELECT INTO, add the required index and constraints nd
then glue them together.
Thanks Dan for the information. ... I really need to start playing with
partitioning some day.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 30.04.2007 06:19:39 von B D Jensen
Hi Dan!
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.
And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.
So I think I have look at unload/load now....
Best regards
Bjorn D. Jensen
Re: Transform/transfer 50Gb - how to do it fast?
am 30.04.2007 09:34:30 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> Thanks for the details about requirements.
> But I'm afraid I then must create the newtbl first,
> because the old table is in the Primary filegroup.
>
> And as I see there is no way for saying:
> select id, func1(col1) into newtbl MYNEWFG from oldtbl.
>
> So I think I have look at unload/load now....
But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 30.04.2007 10:22:08 von B D Jensen
On 30 Apr., 09:34, Erland Sommarskog wrote:
> B D Jensen (bjorn.d.jen...@gmail.com) writes:
>
> > Thanks for the details about requirements.
> > But I'm afraid I then must create the newtbl first,
> > because the old table is in the Primary filegroup.
>
> > And as I see there is no way for saying:
> > select id, func1(col1) into newtbl MYNEWFG from oldtbl.
>
> > So I think I have look at unload/load now....
>
> But doesn't ALTER DATABASE permit you to specify a different filegroup as
> the default filegroup? You could do that, and then your SELECT INTO tables
> should end up there. At least that is what I would expect.
>
> --
> Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
>
> Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/down loads/books...
> Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/boo ks.mspx
very, very good point!
/Bjorn
Re: Transform/transfer 50Gb - how to do it fast?
am 30.04.2007 11:04:46 von B D Jensen
On 29 Apr., 18:44, Erland Sommarskog wrote:
> B D Jensen (bjorn.d.jen...@gmail.com) writes:
>
> > I didn't understand the last part about "ditch" (what means that?).
>
> To ditch =3D slänga, kasta, göra sig av med.
>
> > Will the use of functions make the select into very slow?
>
> Slower. I cannot say how much slower, but I would never use functions for
> this situation. Since this appears to be a one-off, code maintainability
> does not seem to be important.
>
> --
> Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
>
> Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodt=
echnol/sql/2005/downloads/books...
> Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/=
previousversions/books.mspx
I made a comparison for the case of converting to tinyint
and wrote a loop going from -1mio to +1mio
using TSQL-function: 72 seconds
using directly between (not in a function): 70 seconds ; it's in
between ;^)
using directly <=3D and >=3D : 67seconds
(and CLR-function: 2min 27seconds)
So you are right not writing it in seperate function is faster (in
this case),
so it depends (... ;^) on the situation if the difference is too
costly.
Maybe it's one time only, but if you think you can reuse it, then at
all there is less typing and more
important: your code is much more readable, because it becomes shorter
and much more
natural too read. And if one finds an better implementation, you can
just replace it without affecting depending code.
Again: it depends ;^)
Best regards
Bj=F8rn
Re: Transform/transfer 50Gb - how to do it fast?
am 30.04.2007 23:26:17 von Erland Sommarskog
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> I made a comparison for the case of converting to tinyint and wrote a
> loop going from -1mio to +1mio using TSQL-function: 72 seconds using
> directly between (not in a function): 70 seconds ; it's in between ;^)
> using directly <= and >= : 67seconds
>
> So you are right not writing it in seperate function is faster (in
> this case),
I find it difficult to believe that there is any case where a scalar
T-SQL UDF would be faster.
Then again, with the numbers you present it's dubious whether you actually
have found a significant difference.
> (and CLR-function: 2min 27seconds)
With a more complex operation, you would have had a different outcome.
I once did a test where I had to convert zoned numbers with fixed
decimal from an AS400 system. In that case a CLR function was faster
than all T-SQL solutions. I think I have heard that when you have more
than four operations, the CLR pays off.
> Maybe it's one time only, but if you think you can reuse it, then at all
> there is less typing and more important: your code is much more
> readable, because it becomes shorter and much more natural too read.
Or you sit asking yourself "wonder what this function does".
> Again: it depends ;^)
True. That's the answer to almost all performance questions.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Transform/transfer 50Gb - how to do it fast?
am 01.05.2007 11:08:57 von Mork69
On Apr 28, 4:55 pm, Erland Sommarskog wrote:
> This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
> thinking of SELECT INTO which is a minimally logged operation. That is,
> all that is logged are the extent allocations. There are no write operations
> in SQL Server that are entirely non-logged.
Yes, sorry, I was clearly having a bad day, SELECT INTO is what I
meant.
Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online. In any case, as the only records that are
written are merely to log a table's creation the difference is
somewhat irrelevant in this context.
Re: Transform/transfer 50Gb - how to do it fast?
am 01.05.2007 13:42:57 von Erland Sommarskog
Mork69 (mleach@bigfoot.com) writes:
> Regarding the statement that it is a "non logged" operation -
> obviously all operations write to the transaction log in some way, I
> was just using the term that is in general use that was erroneously
> started by Books Online.
Actually, not even that. Books Online for SQL 2000, is very careful to
talk about minimally logged. I looked in Books Online for SQL 6.5, which
indeed talks about non-logged, but that was loooong ago. And the
architecture was different way backk then.
> In any case, as the only records that are written are merely to log a
> table's creation.
Not only. The extent allocations are also logged. If they weren't and
the operation failed on illegal convert operation half-way through, you
would be left with a table that would have a couple of rows in it.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx