optimization strategies based on file-level storage

optimization strategies based on file-level storage

am 14.06.2011 10:44:47 von Bennett Haselton

I'm looking for some tips & tricks documentation that explains how
different data types in rows are stored at the file level (in MyISAM
tables, at least), and how to optimize tables for faster queries,
updates, table definition modification, etc. based on this knowledge.

For example, I've heard that if all of your columns are fixed-length,
that makes it faster to find individual rows since row N is located
at position N*rowlength in the file. (And, presumably, having just
one variable-length column will slow things down considerably.) But
I've also read that having TEXT and BLOB columns will also slow down
row-finding operations. This seems to make no sense because I
thought TEXT was not actually stored in the row, but the row just
stored a constant-length reference to the TEXT whose actual data was
stored somewhere else. Is this correct? Then is it incorrect to say
that a TEXT column will slow down the locate-row-N operation,
assuming all other columns are fixed-length? This is the kind of
thing I'm looking for a document to explain.

Another example: It sometimes takes me an extremely long time to add
new columns to a table. What if I had a table with all fixed-length
columns, and I "reserved" some space at the end of each row to be
used for columns to be added in the future. Would it then be
possible to add new columns much more quickly? You wouldn't have to
move around the existing row data to make room for the new column
(although presumably you would still have to *write* to the place in
reach row where the new column had just been defined, to fill it in
with its default value).

In particular, I'm not looking for a list of optimization tricks, so
much as a document that explains how the rows are stored at the file
level, and thereby explains how the optimization tricks *follow
logically from* this information. The reason is that if I just have
a grab-bag of optimization hints (of which I've found many on the
Web), some of them will be not applicable to my situation, or just
plain wrong, and I'll have no way of knowing which ones. But if you
know *why* something works, you can more easily figure out if it
applies to your situation.

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: optimization strategies based on file-level storage

am 14.06.2011 16:29:17 von Bennett Haselton

At 05:46 AM 6/14/2011, Carlos Eduardo Caldi wrote:
>Hello Bennett
>
>
>On the Mysql developer site have a grate documentation, try the
>links above.
>
>http://dev.mysql.com/doc/refman/5.0/en/optimizing-database- structure.html
>http://dev.mysql.com/doc/refman/5.0/en/data-size.html

Thanks, this gets me a little closer to the answer but doesn't really
provide the level of detail that I'm looking for. For example, it
says: "For MyISAM tables, if you do not have any variable-length
columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is
used. This is faster..." I still don't understand: If TEXT and BLOB
columns are stored not by putting the data in the row but by having
the row store a reference to the TEXT/BLOB data stored somewhere
else, then can't a row with TEXT and BLOB data types *still* be a
fixed-size row, with the resulting increased speed?

My main motivation is that I have a table with 22 million records and
it takes a few hours for me to add a new column to it. I suspect
this would be faster if I designed the table correctly from the
beginning, and I want to change it to a smarter design, but I only
want to do it once. So I want to understand really thoroughly *why*
a different design would make it faster to complete the table
modifications. (For example, the question I asked earlier about
whether you can declare extra space at the end of each row that is
"reserved for future columns".)


>Att.
>Carlos,
>
> > Date: Tue, 14 Jun 2011 01:44:47 -0700
> > To: mysql@lists.mysql.com
> > From: bennett@peacefire.org
> > Subject: optimization strategies based on file-level storage
> >
> > I'm looking for some tips & tricks documentation that explains how
> > different data types in rows are stored at the file level (in MyISAM
> > tables, at least), and how to optimize tables for faster queries,
> > updates, table definition modification, etc. based on this knowledge.
> >
> > For example, I've heard that if all of your columns are fixed-length,
> > that makes it faster to find individual rows since row N is located
> > at position N*rowlength in the file. (And, presumably, having just
> > one variable-length column will slow things down considerably.) But
> > I've also read that having TEXT and BLOB columns will also slow down
> > row-finding operations. This seems to make no sense because I
> > thought TEXT was not actually stored in the row, but the row just
> > stored a constant-length reference to the TEXT whose actual data was
> > stored somewhere else. Is this correct? Then is it incorrect to say
> > that a TEXT column will slow down the locate-row-N operation,
> > assuming all other columns are fixed-length? This is the kind of
> > thing I'm looking for a document to explain.
> >
> > Another example: It sometimes takes me an extremely long time to add
> > new columns to a table. What if I had a table with all fixed-length
> > columns, and I "reserved" some space at the end of each row to be
> > used for columns to be added in the future. Would it then be
> > possible to add new columns much more quickly? You wouldn't have to
> > move around the existing row data to make room for the new column
> > (although presumably you would still have to *write* to the place in
> > reach row where the new column had just been defined, to fill it in
> > with its default value).
> >
> > In particular, I'm not looking for a list of optimization tricks, so
> > much as a document that explains how the rows are stored at the file
> > level, and thereby explains how the optimization tricks *follow
> > logically from* this information. The reason is that if I just have
> > a grab-bag of optimization hints (of which I've found many on the
> > Web), some of them will be not applicable to my situation, or just
> > plain wrong, and I'll have no way of knowing which ones. But if you
> > know *why* something works, you can more easily figure out if it
> > applies to your situation.
> >
> > -Bennett
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe: http://lists.mysql.com/mysql?unsub=ce_caldi@hotmail.com
> >


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimization strategies based on file-level storage

am 14.06.2011 20:45:17 von Johan De Meersman

----- Original Message -----
> From: "Bennett Haselton"
>
> modifications. (For example, the question I asked earlier about
> whether you can declare extra space at the end of each row that is
> "reserved for future columns".)

That question I can answer: you can't "reserve" space, but if you know what kind of rows you'll want to add later you can pre-add them (and incur the accompanying storage cost), and simply rename them appropriately later.

ALTER TABLE will do a full re-write of your table when that is necessary (like adding or deleting columns), but will (likely) not do so when you're just renaming an existing column. Look hard enough on the web and you can find info on how to hack the .frm files, too, so some operations that would do a full re-write don't - but that's fishy business, and always at your own risk.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimization strategies based on file-level storage

am 15.06.2011 11:17:49 von Bennett Haselton

At 11:45 AM 6/14/2011, Johan De Meersman wrote:


>----- Original Message -----
> > From: "Bennett Haselton"
> >
> > modifications. (For example, the question I asked earlier about
> > whether you can declare extra space at the end of each row that is
> > "reserved for future columns".)
>
>That question I can answer: you can't "reserve" space, but if you
>know what kind of rows you'll want to add later you can pre-add them
>(and incur the accompanying storage cost), and simply rename them
>appropriately later.

Thanks. It would be more flexible if I could declare, say, 50 bytes,
and decide later if I wanted to use them for a datetime, a char(n),
or an int, but this is still helpful :)

Do you happen to know the answer to my other problem -- if I have
TEXT and BLOB columns but all my other columns are fixed-length, can
I still get the benefit of faster lookups resulting from fixed-length
rows, if each row just contains a fixed-length reference to the
actual TEXT or BLOB data which is stored elsewhere?

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimization strategies based on file-level storage

am 17.06.2011 07:47:09 von Johan De Meersman

----- Original Message -----
> From: "Bennett Haselton"
>
> Do you happen to know the answer to my other problem -- if I have
> TEXT and BLOB columns but all my other columns are fixed-length, can
> I still get the benefit of faster lookups resulting from fixed-length
> rows, if each row just contains a fixed-length reference to the
> actual TEXT or BLOB data which is stored elsewhere?

entirely unsure, but given the single-datafile structure of MyISAM tables, I strongly suspect BLOBS get stored inline.

For InnoDB, the answer appears to be "it varies":
If a row is less than half a page long, all of it is stored locally
within the page. If it exceeds half a page, variable-length columns
are chosen for external off-page storage until the row fits within
half a page. For a column chosen for off-page storage, InnoDB stores
the first 768 bytes locally in the row, and the rest externally into
overflow pages.

See http://dev.mysql.com/doc/refman/5.0/en/innodb-file-space.htm l for more on that.

Also, I *think* the concept of "fixed-length" rows is only applicable to MyISAM, InnoDB has index-organised tables - that is to say, it stores all row data in the leaves of the primary index. The consequence, of course, is that no additional pointer lookup gets done for primary key selects; the tradeoff is that all nonprimary key lookups get detoured through the primary key.


The online documentation is really pretty good; but for the really low-level things, I guess the best documentation may be the source.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimization strategies based on file-level storage

am 17.06.2011 10:19:59 von Bennett Haselton

At 10:47 PM 6/16/2011, Johan De Meersman wrote:


>----- Original Message -----
> > From: "Bennett Haselton"
> >
> > Do you happen to know the answer to my other problem -- if I have
> > TEXT and BLOB columns but all my other columns are fixed-length, can
> > I still get the benefit of faster lookups resulting from fixed-length
> > rows, if each row just contains a fixed-length reference to the
> > actual TEXT or BLOB data which is stored elsewhere?
>
>entirely unsure, but given the single-datafile structure of MyISAM
>tables, I strongly suspect BLOBS get stored inline.

I can't find a source that says for sure.
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements. html
does say: "The maximum size of a row in a MyISAM table is 65,535
bytes. This figure excludes BLOB or TEXT columns, which contribute
only 9 to 12 bytes toward this size... For BLOB and TEXT data, the
information is stored internally in a different area of memory than
the row buffer." But that's talking about memory, not disk. When
people talk about performance improvements from using fixed-length
rows, are they talking primarily about memory or hard disk?

Hold up though, I just got this reply from posting the question in a forum:
http://forums.mysql.com/read.php?21,423433,423846
which says "Almost always the discussions recommending Fixed length
records in MyISAM are myths. The apparent performance improvement is
swamped by the performance loss of shoveling around the wasted
padding bytes" and goes on to give reasons.

Actually, that does make sense that it's a myth. I was surprised to
hear so many sources claiming that there was a big performance
increase from being able to find row N by jumping to position
N*rowlength. Because even with variable-length rows, you can just
store a table associating row numbers with the position of the row in
the file, can't you -- which would mean it would only take one
near-instantaneous lookup to be able to jump to the row you're looking for.

What I was really trying to figure out was why it takes me 4 hours to
add a new column to my 22-million-row table, and whether a different
table design can avoid that problem. That reply in the forum says,
"ALTER TABLE ... ADD COLUMN will always copy the entire table over,
and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do
_not_ leave space for extra columns, it won't help." I'm about to
reply and point out the trick that you suggested to me: create dummy
columns early and then just rename them later :)

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: optimization strategies based on file-level storage

am 17.06.2011 16:11:11 von Jerry Schwartz

>-----Original Message-----


>What I was really trying to figure out was why it takes me 4 hours to
>add a new column to my 22-million-row table, and whether a different
>table design can avoid that problem. That reply in the forum says,
>"ALTER TABLE ... ADD COLUMN will always copy the entire table over,
>and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do
>_not_ leave space for extra columns, it won't help." I'm about to
>reply and point out the trick that you suggested to me: create dummy
>columns early and then just rename them later :)
>
> -Bennett
>
[JS] They will be too small, or the wrong type, or there won't be enough of
them. Based upon 30+ years of database design, I'd bet money on it. ;-)

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: optimization strategies based on file-level storage

am 17.06.2011 18:02:19 von mussatto

------=_20110617090219_57342
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit



On Fri, June 17, 2011 07:11, Jerry Schwartz wrote:
>>-----Original Message-----
>
>
>>What I was really trying to figure out was why it takes me 4
hours to
>>add a new column to my 22-million-row table, and
whether a different
>>table design can avoid that problem.
That reply in the forum says,
>>"ALTER TABLE ... ADD
COLUMN will always copy the entire table over,
>>and rebuild
all the indexes. (And, effectively, do an OPTIMIZE.) Do
>>_not_
leave space for extra columns, it won't help." I'm about to
>>reply and point out the trick that you suggested to me: create
dummy
>>columns early and then just rename them later :)
>>
>> -Bennett
>>
> [JS]
They will be too small, or the wrong type, or there won't be enough
> of
> them. Based upon 30+ years of database design, I'd bet
money on it. ;-)
>
> Regards,
>
> Jerry
Schwartz
> Global Information Incorporated
The only
"alternative design" would be to create another table with the
added columns and a common key field and then lock the primary table and
populate it with the keys from the original table, and I'm not convinced
that would be any faster or less disruptive.
------
William R.
Mussatto
Systems Engineer
http://www.csz.com
909-920-9154

------=_20110617090219_57342--