how things get messed up

how things get messed up

am 10.02.2010 20:09:23 von jheim

About 5 years ago, I was asked to write a php app for my department. The app
keeps track of graduate school applicants to my department at the
university. The main data elements are the scores each professor gives to
each applicant. There are only about 400 applicants each year so even with
all the personal data, scores, transcripts, etc for each student, it's not
much. for the first 2 years, it was under a meg of data. Well, then the
selection committee asked me to add something so that if a student e-mailed
the department a document, say a paper he'd written or a photo of himself,
or whatever, it could be tacked on to the info they saw about him while
grading the applicant.

So I said, "Well, there is only going to be maybe 10 or 20 of those a year.
And even if all 400 applicants submit a PDF of a paper they'd written, it
would be only 400 docs. 4,000 after 10 years. Yeah, lets just create a
documents table in the database and store them in mysql."

For the first 2 years, only 2 students sent in documents to attach to their
application. I figured I'd wasted my time. Then the next year, the graduate
school changed their web application form to allow students to upload
documents. "Fine," I said, "My worst case scenario has already come true.
But, well, this is why you plan for the worst case."

Then they started taking letters of recommendation as PDF documents. In
fact, they started requiring PDF docs. Each student has 3 to 6 letters of
recommendation. All in all, I figure we're at about 100 times as many docs
in our database as I originally expected and about 10x my worst case
scenario.

I should either be fired or shot. Maybe fired *then* shot. Actually, its
not as bad as all that. I can pretty easily write a perl script to export
the docs to files and access them via a network mounted filesystem. After
all, saving myself 5 hours of work 5 years ago is worth what? -- maybe
10hours today? It is amazing how often quick & dirty turns out just being
dirty in the end.




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 10.02.2010 20:38:08 von Buford Tannen

John G. Heim wrote:
> About 5 years ago,...
> then the selection committee asked me to add something ...
> Then the next year, the graduate school changed their ...
> Then they started taking letters of recommendation ...

One might think of this as scope-creep, but really, given the
length of time over which it happened, it seems more like a
natural progression of more work begotten as a consequence of
past good work and impressive results. I hope you were
commensurately recognized for your valuable contribution to your
employer's operation!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 10.02.2010 21:30:53 von Ilya Kazakevich

Hello John,

There was a joke in russian PHP club: "why strore images in database? Are
you going to have a full text search on them?"

IMHO:
Storing files in DB is probably bad idea.
Here are some advantages of storing files on filesystem:
1) Theoretically it takes more time to load data from database then to load
it directly from filesystem if data is huge (we are not speaking about
databases, installed on raw disks)
2) while storing files on fs, you can access them via web server and ftp and
network fs and what-ever-you-like. Almost all servers for your platform
supports filesystem, but not database.
Yes, its possible to create script to access data from db via web, but you
have to create it and support HTTP cache (modified-sience, e-tag and so on)
and you also need to store modification date in db (to make this cache work)
and so on. If you store files as files -- you have filesystem and servers to
access them. If you use db as filesystem -- you should create servers by
yourself. Looks like invention of wheel.
3) Tables with BLOB are not so flexible, i.e. HEAP tables do not support
such fields. So, in some situation you'll have "filesort" that you cannt
fix.
4) You can not easly do "SELECT *" from table with 10.000 rows if each row
contains field with 200Kb pdf file.

And there is only one disadvantage: consistency. When deleting user -- all
her data could be deleted automatically using foreign keys, but you have to
do it in your app if files are stored externally.

If somebody knows more reasons to store files in DB -- post it here, please.
It would be interesting. Thanks.

Ilya.

-----Original Message-----
From: John G. Heim [mailto:jheim@math.wisc.edu]
Sent: Wednesday, February 10, 2010 10:09 PM
To: mysql@lists.mysql.com
Subject: how things get messed up

About 5 years ago, I was asked to write a php app for my department. The app
keeps track of graduate school applicants to my department at the
university. The main data elements are the scores each professor gives to
each applicant. There are only about 400 applicants each year so even with
all the personal data, scores, transcripts, etc for each student, it's not
much. for the first 2 years, it was under a meg of data. Well, then the
selection committee asked me to add something so that if a student e-mailed
the department a document, say a paper he'd written or a photo of himself,
or whatever, it could be tacked on to the info they saw about him while
grading the applicant.

So I said, "Well, there is only going to be maybe 10 or 20 of those a year.
And even if all 400 applicants submit a PDF of a paper they'd written, it
would be only 400 docs. 4,000 after 10 years. Yeah, lets just create a
documents table in the database and store them in mysql."

For the first 2 years, only 2 students sent in documents to attach to their
application. I figured I'd wasted my time. Then the next year, the graduate
school changed their web application form to allow students to upload
documents. "Fine," I said, "My worst case scenario has already come true.
But, well, this is why you plan for the worst case."

Then they started taking letters of recommendation as PDF documents. In
fact, they started requiring PDF docs. Each student has 3 to 6 letters of
recommendation. All in all, I figure we're at about 100 times as many docs
in our database as I originally expected and about 10x my worst case
scenario.

I should either be fired or shot. Maybe fired *then* shot. Actually, its
not as bad as all that. I can pretty easily write a perl script to export
the docs to files and access them via a network mounted filesystem. After
all, saving myself 5 hours of work 5 years ago is worth what? -- maybe
10hours today? It is amazing how often quick & dirty turns out just being
dirty in the end.




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=kazakevich@devexperts.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 11.02.2010 04:53:22 von Cui Shijun

Hello Ilya,
I think store files in DB has no complicated reasons, just for
convenience. For example, I might make files automatically be
backup-ed, and no budget for an independent backup solution. Thanks.
Cui

2010/2/11 Ilya Kazakevich :
> Hello John,
>
> There was a joke in russian PHP club: "why strore images in database? Are
> you going to have a full text search on them?"
>
> IMHO:
> Storing files in DB is probably bad idea.
> Here are some advantages of storing files on filesystem:
> 1) Theoretically it takes more time to load data from database then to lo=
ad
> it directly from filesystem if data is huge (we are not speaking about
> databases, installed on raw disks)
> 2) while storing files on fs, you can access them via web server and ftp =
and
> network fs and what-ever-you-like. Almost all servers for your platform
> supports filesystem, but not database.
> Yes, its possible to create script to access data from db via web, but yo=
u
> have to create it and support HTTP cache (modified-sience, e-tag and so o=
n)
> and you also need to store modification date in db (to make this cache wo=
rk)
> and so on. If you store files as files -- you have filesystem and servers=
to
> access them. If you use db as filesystem -- you should create servers by
> yourself. Looks like invention of wheel.
> 3) Tables with BLOB are not so flexible, i.e. HEAP tables do not support
> such fields. So, in some situation you'll have "filesort" that you cannt
> fix.
> 4) You can not easly do "SELECT *" from table with 10.000 rows if each ro=
w
> contains field with 200Kb pdf file.
>
> And there is only one disadvantage: consistency. When deleting user -- al=
l
> her data could be deleted automatically using foreign keys, but you have =
to
> do it in your app if files are stored externally.
>
> If somebody knows more reasons to store files in DB -- post it here, plea=
se.
> It would be interesting. Thanks.
>
> Ilya.
>
> -----Original Message-----
> From: John G. Heim [mailto:jheim@math.wisc.edu]
> Sent: Wednesday, February 10, 2010 10:09 PM
> To: mysql@lists.mysql.com
> Subject: how things get messed up
>
> About 5 years ago, I was asked to write a php app for my department. The =
app
> keeps track of graduate school applicants to my department at the
> university. The main data elements are the scores each professor gives to
> each applicant. There are only about 400 applicants each year so even wit=
h
> all the personal data, scores, transcripts, =A0etc for each student, it's=
not
> much. for the first 2 years, it was under a meg of data. Well, then the
> selection committee asked me to add something so that if a student e-mail=
ed
> the department a document, say a paper he'd written or a photo of himself=
,
> or whatever, it could be tacked on to the info they saw about him while
> grading the applicant.
>
> So I said, "Well, there is only going to be maybe 10 or 20 of those a yea=
r.
> And even if all 400 applicants submit a PDF of a paper they'd written, it
> would be only 400 docs. 4,000 after 10 years. Yeah, lets just create a
> documents table in the database and store them in mysql."
>
> For the first 2 years, only 2 students sent in documents to attach to the=
ir
> application. I figured I'd wasted my time. Then the next year, the gradua=
te
> school changed their =A0web application form to allow students to upload
> documents. "Fine," I said, "My worst case scenario has already come true.
> But, well, this is why you plan for the worst case."
>
> Then they started taking letters of recommendation as PDF documents. In
> fact, they started requiring PDF docs. Each student has 3 to 6 letters of
> recommendation. All in all, I figure we're at about 100 times as many doc=
s
> in our database as I originally expected and about 10x my worst case
> scenario.
>
> I should either be fired or shot. =A0Maybe fired *then* shot. Actually, i=
ts
> not as bad as all that. I can pretty easily write a perl script to export
> the docs to files and access them via a network mounted filesystem. After
> all, saving myself 5 hours of work 5 years ago is worth what? -- maybe
> 10hours today? It is amazing how often quick & dirty turns out just being
> dirty in the end.
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql?unsub=3Dkazakevich@devexperts.c om
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =A0 =A0http://lists.mysql.com/mysql?unsub=3Drancpine@gmai=
l.com
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: how things get messed up

am 11.02.2010 15:56:23 von Martijn Tonies

Hello John,

> About 5 years ago, I was asked to write a php app for my department. The
> app keeps track of graduate school applicants to my department at the
> university. The main data elements are the scores each professor gives to
> each applicant. There are only about 400 applicants each year so even with
> all the personal data, scores, transcripts, etc for each student, it's
> not much. for the first 2 years, it was under a meg of data. Well, then
> the selection committee asked me to add something so that if a student
> e-mailed the department a document, say a paper he'd written or a photo of
> himself, or whatever, it could be tacked on to the info they saw about him
> while grading the applicant.
>
> So I said, "Well, there is only going to be maybe 10 or 20 of those a
> year. And even if all 400 applicants submit a PDF of a paper they'd
> written, it would be only 400 docs. 4,000 after 10 years. Yeah, lets just
> create a documents table in the database and store them in mysql."
>
> For the first 2 years, only 2 students sent in documents to attach to
> their application. I figured I'd wasted my time. Then the next year, the
> graduate school changed their web application form to allow students to
> upload documents. "Fine," I said, "My worst case scenario has already come
> true. But, well, this is why you plan for the worst case."
>
> Then they started taking letters of recommendation as PDF documents. In
> fact, they started requiring PDF docs. Each student has 3 to 6 letters of
> recommendation. All in all, I figure we're at about 100 times as many docs
> in our database as I originally expected and about 10x my worst case
> scenario.
>
> I should either be fired or shot. Maybe fired *then* shot. Actually, its
> not as bad as all that. I can pretty easily write a perl script to export
> the docs to files and access them via a network mounted filesystem. After
> all, saving myself 5 hours of work 5 years ago is worth what? -- maybe
> 10hours today? It is amazing how often quick & dirty turns out just being
> dirty in the end.

Not sure what the problem is really... What are you running into?

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 11.02.2010 16:11:33 von David Giragosian

--0016e65b52f2fc86e4047f5490df
Content-Type: text/plain; charset=UTF-8

On Thu, Feb 11, 2010 at 8:56 AM, Martijn Tonies wrote:

> Hello John,
>
> About 5 years ago, I was asked to write a php app for my department. The
>> app keeps track of graduate school applicants to my department at the
>> university. The main data elements are the scores each professor gives to
>> each applicant. There are only about 400 applicants each year so even with
>> all the personal data, scores, transcripts, etc for each student, it's not
>> much. for the first 2 years, it was under a meg of data. Well, then the
>> selection committee asked me to add something so that if a student e-mailed
>> the department a document, say a paper he'd written or a photo of himself,
>> or whatever, it could be tacked on to the info they saw about him while
>> grading the applicant.
>>
>> So I said, "Well, there is only going to be maybe 10 or 20 of those a
>> year. And even if all 400 applicants submit a PDF of a paper they'd written,
>> it would be only 400 docs. 4,000 after 10 years. Yeah, lets just create a
>> documents table in the database and store them in mysql."
>>
>> For the first 2 years, only 2 students sent in documents to attach to
>> their application. I figured I'd wasted my time. Then the next year, the
>> graduate school changed their web application form to allow students to
>> upload documents. "Fine," I said, "My worst case scenario has already come
>> true. But, well, this is why you plan for the worst case."
>>
>> Then they started taking letters of recommendation as PDF documents. In
>> fact, they started requiring PDF docs. Each student has 3 to 6 letters of
>> recommendation. All in all, I figure we're at about 100 times as many docs
>> in our database as I originally expected and about 10x my worst case
>> scenario.
>>
>> I should either be fired or shot. Maybe fired *then* shot. Actually, its
>> not as bad as all that. I can pretty easily write a perl script to export
>> the docs to files and access them via a network mounted filesystem. After
>> all, saving myself 5 hours of work 5 years ago is worth what? -- maybe
>> 10hours today? It is amazing how often quick & dirty turns out just being
>> dirty in the end.
>>
>
> Not sure what the problem is really... What are you running into?
>

I think John is just sharing an experience - a lesson learned if you will.
With the same spirit in mind, many projects in my work culture begin with a
specification of, "Just put up anything so our (internal) users can react to
it." Talk about vague. Geesh! However, a senior programmer told me years
ago that the life of a programmer is often filled with doing, undoing, and
redoing. And not enough appreciation for the work involved. I try to keep
that in mind.

David

--0016e65b52f2fc86e4047f5490df--

RE: how things get messed up

am 11.02.2010 17:24:24 von Jerry Schwartz

>-----Original Message-----
>From: Cui Shijun [mailto:rancpine@gmail.com]
>Sent: Wednesday, February 10, 2010 10:53 PM
>To: Ilya Kazakevich
>Cc: John G. Heim; mysql@lists.mysql.com
>Subject: Re: how things get messed up
>
>Hello Ilya,
> I think store files in DB has no complicated reasons, just for
>convenience. For example, I might make files automatically be
>backup-ed, and no budget for an independent backup solution. Thanks.
> Cui
>
[JS] Storing BLOBs in a table must slow down the inner workings of the
database engine. After all, the tables are really disk files under the covers.
Can the database engine read only parts of a file record?

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 11.02.2010 17:24:24 von Jerry Schwartz

>-----Original Message-----
>From: John G. Heim [mailto:jheim@math.wisc.edu]
>Sent: Wednesday, February 10, 2010 2:09 PM
>To: mysql@lists.mysql.com
>Subject: how things get messed up
>It is amazing how often quick & dirty turns out just being
>dirty in the end.
>
>
[JS] Hee-hee... How right you are. I've had the pleasure of writing "perfect"
programs whose logic I was unable to understand a week later.

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 11.02.2010 17:46:47 von Joerg Bruehe

Hi Ilya, everybody!


Ilya Kazakevich wrote:
> [[...]]
>=20
> IMHO:
> Storing files in DB is probably bad idea.
> Here are some advantages of storing files on filesystem:
> [[...]]

We could discuss them individually, but I agree several of your point=
s
are valid. The remaining question is which importance you assign to t=
hem.

>=20
> And there is only one disadvantage: consistency. When deleting user=
-- all
> her data could be deleted automatically using foreign keys, but you=
have to
> do it in your app if files are stored externally.

"Only one ...: consistency": I contradict your use of "only", very st=
rongly.
Probably that is because I value consistency extremely high, definite=
ly
higher than ease of programming and performance.

If you keep related data in the same storage mechanism (say, the
database), then the chances of dealing with them in a consistent mann=
er
improve greatly. This shows when you think about access privileges,
referential integrity, export/import, backup/restore, migration to a =
new
system, replication, ...

By keeping your data in related tables (nobody forces you to have the
images in the person table, using the same ID in two tables and defin=
ing
referential integrity constraints is sufficient), you have a much bet=
ter
chance to get the application and the handling correct.
If your data are valuable (and applicant data are - just consider the
costs of a law suit when a rejected applicant goes to court), that
correctness is an essential requirement.

>=20
> [[...]]
>=20

Regards,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB1610=
28


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: how things get messed up

am 11.02.2010 18:03:45 von Jochem van Dieten

On 2/10/10, Ilya Kazakevich wrote:
> There was a joke in russian PHP club: "why strore images in database? Are
> you going to have a full text search on them?"

Yes. That is what EXIF data is for, isn't it?

And considering this is about PDFs any inability of a database engine
to do a full text search on them surely is a limitation of that
database, not a conceptual disqualification of storing binary data in
a database.

Jochem


--
Jochem van Dieten
http://jochem.vandieten.net/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 11.02.2010 18:27:30 von Martijn Tonies

>> I think store files in DB has no complicated reasons, just for
>>convenience. For example, I might make files automatically be
>>backup-ed, and no budget for an independent backup solution. Thanks.
>> Cui
>>
> [JS] Storing BLOBs in a table must slow down the inner workings of the
> database engine. After all, the tables are really disk files under the
> covers.
> Can the database engine read only parts of a file record?
>
> Jerry Schwartz

Well, there's where things go wrong -- first of all, a "database engine"
can do -anything- it likes with the actual data.

For example, the Firebird DBMS stores (longer) Blob data not right
there in the record, so whenever you don't request the blob (that is,
not selecting it), it ignores it completely and it can go through the file
quickly.

If BLOB data makes the DBMS slowish by default, it's an implementation
issue that needs to be fixed.

Conceptually, there's much to say -for- storing binary data in database
tables,
see the post by Joerg for example.

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 11.02.2010 23:42:03 von Jerry Schwartz

>> [JS] Storing BLOBs in a table must slow down the inner workings of the
>> database engine. After all, the tables are really disk files under the
>> covers.
>> Can the database engine read only parts of a file record?
>>
>> Jerry Schwartz
>
>Well, there's where things go wrong -- first of all, a "database engine"
>can do -anything- it likes with the actual data.
>
>For example, the Firebird DBMS stores (longer) Blob data not right
>there in the record, so whenever you don't request the blob (that is,
>not selecting it), it ignores it completely and it can go through the file
>quickly.
>
[JS] That's an interesting idea that hadn't occurred to me. Essentially, it's
creating a dependent table behind your back.

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 12.02.2010 07:56:13 von Johan De Meersman

--0016369205f66a493f047f61c3e1
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Feb 11, 2010 at 5:24 PM, Jerry Schwartz
wrote:

> [JS] Storing BLOBs in a table must slow down the inner workings of the
> database engine. After all, the tables are really disk files under the
> covers.
> Can the database engine read only parts of a file record?
>

They recently invented this funky little thing called "random access". No
more need to spool your tape forward for ten minutes before finding the data
you want.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0016369205f66a493f047f61c3e1--

Re: how things get messed up

am 12.02.2010 08:05:26 von Martijn Tonies

>>> [JS] Storing BLOBs in a table must slow down the inner workings of the
>>> database engine. After all, the tables are really disk files under the
>>> covers.
>>> Can the database engine read only parts of a file record?
>>>
>>> Jerry Schwartz
>>
>>Well, there's where things go wrong -- first of all, a "database engine"
>>can do -anything- it likes with the actual data.
>>
>>For example, the Firebird DBMS stores (longer) Blob data not right
>>there in the record, so whenever you don't request the blob (that is,
>>not selecting it), it ignores it completely and it can go through the file
>>quickly.
>>
> [JS] That's an interesting idea that hadn't occurred to me. Essentially,
> it's
> creating a dependent table behind your back.
> Jerry Schwartz

Firebird uses 1 or multiple files per database, not per table, this file has
a special "blob area", so to speak, and the records include a "blob ID".

When a client/stored routine selects a blob that isn't available in the
record
data (stored "in between"), it then goes to the area where blobs are stored
and gets it from there. Works quite well.

As I said, a DBMS can do anything it likes with it's data as long as it
returns
it correctly.

I'm surprised there are DBMSses that don't do this, Firebird comes from
InterBase comes from Groton Database Systems which was the first
with BLOBs and they had the idea right, so it seems ;-) Enjoy the read:
http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_blob_ history


With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 12.02.2010 08:23:00 von Johan De Meersman

--001636c5971b349103047f62231b
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Feb 12, 2010 at 8:05 AM, Martijn Tonies wrote:

> Firebird uses 1 or multiple files per database, not per table, this file
> has
> a special "blob area", so to speak, and the records include a "blob ID".
>
> When a client/stored routine selects a blob that isn't available in the
> record
> data (stored "in between"), it then goes to the area where blobs are stored
> and gets it from there. Works quite well.
>
> As I said, a DBMS can do anything it likes with it's data as long as it
> returns
> it correctly.
>

As a matter of interest, FaceBook has basically written their own blob
engine for the storage of userpics. They basically append every new picture
to the end of a huge binary file (probably a raw device, really), and simply
put the start and end byte of the picture in an index.

Every webserver (you can imagine they have a few) that hasn't got a
requested image in it's local cache yet, then simply does a seek() to the
start byte and reads the necessary number of bytes to fetch the image.
Highly efficient, and very simple.

Also of interest is that they never actually clear images from the binary
storage - it would be too slow to reclaim the free space, and it's faster
and cheaper for them to just keep adding storage.

The main performance benefit of this over files-on-disk might well be that
you don't have the overhead of a filesystem, and thus no constant open/close
operations - the image serving daemon keeps the file open at all times.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--001636c5971b349103047f62231b--

Re: how things get messed up

am 12.02.2010 08:27:57 von Martijn Tonies

>>On Fri, Feb 12, 2010 at 8:05 AM, Martijn Tonies
>>wrote:
>>Firebird uses 1 or multiple files per database, not per table, this file
>>has
>>a special "blob area", so to speak, and the records include a "blob ID".
>>
>>When a client/stored routine selects a blob that isn't available in the
>>record
>>data (stored "in between"), it then goes to the area where blobs are
>>stored
>>and gets it from there. Works quite well.
>>
>>As I said, a DBMS can do anything it likes with it's data as long as it
>>returns
>>it correctly.
>
>As a matter of interest, FaceBook has basically written their own blob
>engine for the storage of userpics. They basically append every new picture
> >to the end of a huge binary file (probably a raw device, really), and
>simply put the start and end byte of the picture in an index.
>
>Every webserver (you can imagine they have a few) that hasn't got a
>requested image in it's local cache yet, then simply does a seek() to the
>start >byte and reads the necessary number of bytes to fetch the image.
>Highly efficient, and very simple.
>
>Also of interest is that they never actually clear images from the binary
>storage - it would be too slow to reclaim the free space, and it's faster
>and >cheaper for them to just keep adding storage.
>
>
>The main performance benefit of this over files-on-disk might well be that
>you don't have the overhead of a filesystem, and thus no constant
> >open/close operations - the image serving daemon keeps the file open at
>all times.

Sounds logical, what's also nice to see, is that even though people here
tend to say "don't put binaries in the database", apparently Facebook
thought it would be nice to do so (for all sorts of reasons) and even took
the time to write their own blob storage mechanism ;-)

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 12.02.2010 08:39:32 von Johan De Meersman

--0016364d31074dc980047f625e5a
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Feb 12, 2010 at 8:27 AM, Martijn Tonies wrote:

> Sounds logical, what's also nice to see, is that even though people here
> tend to say "don't put binaries in the database", apparently Facebook
> thought it would be nice to do so (for all sorts of reasons) and even took
> the time to write their own blob storage mechanism ;-)


The whole point is that they *aren't*' putting blobs in their database -
that has way too much overhead. They're using a custom service that does
nothing but "read from byte X to byte Y". No concepts of tablespaces,
integrity, indices, whatever.

The only thing they store in their database, is the start- and end-byte of
each image.

I doubt they even took it as far as to write a plugin engine - that would
again bring too much overhead.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0016364d31074dc980047f625e5a--

Re: how things get messed up

am 12.02.2010 09:19:38 von Vikram A

--0-443838374-1265962778=:73441
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Sir, I am in the situation to storing student and staff images. every =
year 2000 new photos has to be added in our application. Can i have yo=
ur suggestion, which is the best one, storing as a blob Or using NFS?=0AIt =
will be great help to me, because such experts are sharing your own experie=
nce on this binary storage issue. Thank you. VIKRAM A =
=0A________________________________=0AFrom: Johan De Meersman era.be>=0ATo: Martijn Tonies =0ACc: mysql@lists.mysql=
..com=0ASent: Fri, 12 February, 2010 1:09:32 PM=0ASubject: Re: how things ge=
t messed up On Fri, Feb 12, 2010 at 8:27 AM, Martijn Tonies upscene.com>wrote: > Sounds logical, what's also nice to see, is that =
even though people here=0A> tend to say "don't put binaries in the database=
", apparently Facebook=0A> thought it would be nice to do so (for all sorts=
of reasons) and even took=0A> the time to write their own blob storage mec=
hanism ;-) =0AThe whole point is that they *aren't*' putting blobs in =
their database -=0Athat has way too much overhead. They're using a custom s=
ervice that does=0Anothing but "read from byte X to byte Y". No concepts of=
tablespaces,=0Aintegrity, indices, whatever. The only thing they stor=
e in their database, is the start- and end-byte of=0Aeach image. I dou=
bt they even took it as far as to write a plugin engine - that would=0Aagai=
n bring too much overhead. =0A-- =0ABier met grenadyn=0AIs als mosterd=
by den wyn=0ASy die't drinkt, is eene kwezel=0AHy die't drinkt, is ras een=
ezel Your Mail works best with the New Yahoo Optimized IE=
8. Get it NOW! http://downloads.yahoo.com/in/internetexplorer/
--0-443838374-1265962778=:73441--

Re: how things get messed up

am 12.02.2010 09:53:01 von Johan De Meersman

--0050450158d7157b54047f6365fa
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Feb 12, 2010 at 9:19 AM, Vikram A wrote:

> I am in the situation to storing student and staff images. every year 2000
> new photos has to be added in our application.
>
> Can i have your suggestion, which is the best one, storing as a blob Or
> using NFS?
> It will be great help to me, because such experts are sharing your own
> experience on this binary storage issue.
>


I never said you had to grovel, though :-p

This whole thread has been a discussion of just that. My personal opinion is
that it's better to store binary objects (like images) out-of-band, for
instance on an NFS system like you suggest. Other people on the list have
made their own arguments for BLOB storage.

In the end, it's down to your own situation and decisions, but I will keep
defending the position that filesystems are made for storing files, and
databases are made for storing data - it saves you on both database and PHP
requests, as (from a web point of view) you can't return the image data
inside your HTML - it requires a second HTTP call. Filesystem image serving,
however, could perfectly be offloaded to a subdomain that runs a
lightweight, threaded HTTP server that need not run the heavy PHP processes.
You could even run that on your NFS server, if you want.

If you do go for BLOBs, though, for god's sake keep them in a separate
table, lest you fragment your datafiles. Split records are a disaster for
performance.




--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0050450158d7157b54047f6365fa--

Re: how things get messed up

am 12.02.2010 10:12:42 von Vikram A

--0-566488995-1265965962=:96632
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Sir, Thanks for your suggestion,=0AI will go for blob storage, because=
our application will maintain the data on yearly basis[stupersonal2008, st=
upersonal2009 etc.]. So i feel we may not face such kind of performance iss=
ue in our application. =0AThank you VIKRAM A =0A____________=
____________________=0AFrom: Johan De Meersman =0ATo: V=
ikram A =0ACc: MY SQL Mailing list ..com>=0ASent: Fri, 12 February, 2010 2:23:01 PM=0ASubject: Re: how things g=
et messed up =0AOn Fri, Feb 12, 2010 at 9:19 AM, Vikram A @yahoo.in> wrote: I am in the situation to storing student and staff i=
mages. every year 2000 new photos has to be added in our application.=0A>=
=0A>Can i have your suggestion, which is the best one, storing as a blob Or=
using NFS?=0A>>It will be great help to me, because such experts are shari=
ng your own experience on this binary storage issue.=0A> =0AI never sa=
id you had to grovel, though :-p This whole thread has been a discussi=
on of just that. My personal opinion is that it's better to store binary ob=
jects (like images) out-of-band, for instance on an NFS system like you sug=
gest. Other people on the list have made their own arguments for BLOB stora=
ge. In the end, it's down to your own situation and decisions, but I w=
ill keep defending the position that filesystems are made for storing files=
, and databases are made for storing data - it saves you on both database a=
nd PHP requests, as (from a web point of view) you can't return the image d=
ata inside your HTML - it requires a second HTTP call. Filesystem image ser=
ving, however, could perfectly be offloaded to a subdomain that runs a ligh=
tweight, threaded HTTP server that need not run the heavy PHP processes. Yo=
u could even run that on your NFS server, if you want. If you do go fo=
r BLOBs, though, for god's sake keep them in a separate table, lest you fra=
gment your datafiles. Split records are a disaster for performance. =
=0A-- =0ABier met grenadyn=0AIs als mosterd by den wyn=0ASy die't dri=
nkt, is eene kwezel=0AHy die't drinkt, is ras een ezel The=
INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in=
..yahoo.com/
--0-566488995-1265965962=:96632--

Re: how things get messed up

am 12.02.2010 16:32:39 von Ann

Martijn Tonies wrote:
>
> For example, the Firebird DBMS stores (longer) Blob data not right
> there in the record, so whenever you don't request the blob (that is,
> not selecting it), it ignores it completely and it can go through the file
> quickly.

As do most of the MySQL storage engines.


Cheers,

Ann

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 15.02.2010 10:51:17 von Martijn Tonies

>> Sounds logical, what's also nice to see, is that even though people here
>> tend to say "don't put binaries in the database", apparently Facebook
>> thought it would be nice to do so (for all sorts of reasons) and even
>> took
>> the time to write their own blob storage mechanism ;-)
>
>
> The whole point is that they *aren't*' putting blobs in their database -
> that has way too much overhead. They're using a custom service that does
> nothing but "read from byte X to byte Y". No concepts of tablespaces,
> integrity, indices, whatever.
>
> The only thing they store in their database, is the start- and end-byte of
> each image.
>
> I doubt they even took it as far as to write a plugin engine - that would
> again bring too much overhead.

Right, not the actual files, indeed. But not "just" on the file system
either,
interesting approach.

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 15.02.2010 10:53:29 von Martijn Tonies

>> I am in the situation to storing student and staff images. every year
>> 2000
>> new photos has to be added in our application.
>>
>> Can i have your suggestion, which is the best one, storing as a blob Or
>> using NFS?
>> It will be great help to me, because such experts are sharing your own
>> experience on this binary storage issue.
>>
>
>
> I never said you had to grovel, though :-p
>
> This whole thread has been a discussion of just that. My personal opinion
> is
> that it's better to store binary objects (like images) out-of-band, for
> instance on an NFS system like you suggest. Other people on the list have
> made their own arguments for BLOB storage.
>
> In the end, it's down to your own situation and decisions, but I will keep
> defending the position that filesystems are made for storing files, and
> databases are made for storing data - it saves you on both database and
> PHP
> requests, as (from a web point of view) you can't return the image data
> inside your HTML - it requires a second HTTP call. Filesystem image
> serving,

Doesn't an image always required additional http calls from the tag?

> however, could perfectly be offloaded to a subdomain that runs a
> lightweight, threaded HTTP server that need not run the heavy PHP
> processes.
> You could even run that on your NFS server, if you want.
>
> If you do go for BLOBs, though, for god's sake keep them in a separate
> table, lest you fragment your datafiles. Split records are a disaster for
> performance.

I agree -for MySQL-, but this is a MySQL limitation!! Not a DBMS one.

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 15.02.2010 10:55:38 von Martijn Tonies

Hi Ann,

From: "Ann W. Harrison"
> Martijn Tonies wrote:
>>
>> For example, the Firebird DBMS stores (longer) Blob data not right
>> there in the record, so whenever you don't request the blob (that is,
>> not selecting it), it ignores it completely and it can go through the
>> file
>> quickly.
>
> As do most of the MySQL storage engines.

But they do seem to store "in line", right? If not, why would there be
any performance problems?

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 15.02.2010 18:11:58 von Ann

Martijn Tonies wrote:
>>>
>>> For example, the Firebird DBMS stores (longer) Blob data not right
>>> there in the record, so whenever you don't request the blob (that is,
>>> not selecting it), it ignores it completely and it can go through the
>>> file quickly.
>>
>> As do most of the MySQL storage engines.
>

InnoDB uses a variety of strategies,

http://www.innodb.com/doc/innodb_plugin-1.0/innodb-row-forma t.html

MyISAM uses a strategy more like Firebird

http://dev.mysql.com/doc/refman/5.0/en/storage-requirements. html

And here's a comment from a very experienced MySQL DBA

http://sheeri.com/archives/39

> But they do seem to store "in line", right? If not, why would there be
> any performance problems?

There are lots of ways to screw up storage.

Cheers,

Ann




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 16.02.2010 15:23:06 von Johan De Meersman

--005045014455f23dea047fb878d6
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Feb 15, 2010 at 10:53 AM, Martijn Tonies wrote:

> databases are made for storing data - it saves you on both database and PHP
>> requests, as (from a web point of view) you can't return the image data
>> inside your HTML - it requires a second HTTP call. Filesystem image
>> serving,
>>
>
> Doesn't an image always required additional http calls from the tag?


Yes, that's what I'm saying. However, there's a significant difference
between an HTTP call that only needs to pump a file on to the network, and
an HTTP call that needs to start a PHP process that then in turn needs to
connect to the DB, which then needs to parse and execute a query.

Offloading static files (images, css, whatnot) to a separate server that
runs a lightweight, threaded httpd without PHP and whatnot compiled in, also
means that you'll need less hardware to serve the same amount of requests -
our PHP machines typically run 400 apache processes, but a static server on
the same hardware could easily serve a few thousand threads - suddenly it's
not memory/cpu but disk/network bandwidth that becomes your bottleneck.



> If you do go for BLOBs, though, for god's sake keep them in a separate
>> table, lest you fragment your datafiles. Split records are a disaster for
>> performance.
>>
>
> I agree -for MySQL-, but this is a MySQL limitation!! Not a DBMS one.


True. I think (but am too lazy to verify) that even in MySQL this is mainly
an issue with MyISAM, not InnoDB. Given how MyISAM is the default engine,
though, I thought it worthwile to mention.

Separate LOB storage still leaves the overhead of query parsing and other
generic DBMS stuff which isn't there on a filesystem, though, not to mention
that each image requests takes up a database connection for as long as it
takes to transfer the (potentially huge) data. Don't forget that in a
well-tuned database, network transfer is often a significant part of your
total connection lifetime for select statements.



--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--005045014455f23dea047fb878d6--

Re: how things get messed up

am 16.02.2010 15:34:57 von Johan De Meersman

--001636c931ca55dee1047fb8a3ec
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Feb 15, 2010 at 6:11 PM, Ann W. Harrison wrote:

> And here's a comment from a very experienced MySQL DBA
>
> http://sheeri.com/archives/39



Not so much a comment as her starting up the same discussion we're having
:-)

If you have the kind of needs where replicating NFS servers won't hold, go
for distributed redundant storage, also known as cluster filesystems.




--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--001636c931ca55dee1047fb8a3ec--

RE: how things get messed up

am 16.02.2010 17:02:22 von Jerry Schwartz

>-----Original Message-----
>From: Ann W. Harrison [mailto:ann@mysql.com]
>Sent: Monday, February 15, 2010 12:12 PM
>To: Martijn Tonies
>Cc: mysql@lists.mysql.com
>Subject: Re: how things get messed up
>
>There are lots of ways to screw up storage.
>
[JS] As the one who started this topic, I can't agree more. I was away for
several days, but I must insert that I'm not a novice (although I haven't
looked at the internals of a database for several years). Some of the things
that can affect data transfer between the client and the physical storage are:

- Seek time of the mechanics
- Internal buffering in the physical drives
- Delayed writing at the device level
- Ability, or lack thereof, of the drives and/or controllers to chain commands
- Ability, or lack thereof, of the drives and/or controllers to reorder
chained commands
- Raid level
- Buss speed
- Raw vs. file system storage
- Fragmentation of the file allocation on the disks
- Ability of the file system to do anticipatory reads
- File system buffer strategies
- File system buffer size/number
- Database engine buffer strategies
- Database engine buffer size/number
- Fragmentation of the tables within the data file (if applicable)
- Fragmentation of the data within the tables
- Design of the database/tables
- Ability of the engine to optimize queries
- Ability of the programmer to write sane queries
- Use, design, and optimization of stored procedures
- Speed of the pipe between the client and the database host
- Efficiency of the database access library
- Language in which the client application is written (compiled vs.
interpreted being the big differentiator)
- Efficiency of the client's handling of the retrieved data
- Efficiency of the presentation layer
- Speed of the pipe between the client and the user
- ROI on optimizing any or all of the above
- Management meddling

A profiler that could integrate all of the above would be a nifty tool,
wouldn't it?

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com



>Cheers,
>
>Ann
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartz@the-
>infoshop.com





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 16.02.2010 17:02:22 von Jerry Schwartz

>-----Original Message-----
>From: Vikram A [mailto:vikkiatbipl@yahoo.in]
>Sent: Friday, February 12, 2010 4:13 AM
>To: Johan De Meersman
>Cc: MY SQL Mailing list
>Subject: Re: how things get messed up
>
>Sir,
>
>Thanks for your suggestion,
>I will go for blob storage, because our application will maintain the data on
>yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may not
>face
>such kind of performance issue in our application.
>
[JS] It sounds like you are planning to have one table per year. Regardless of
where you put your blobs, I think that is a bad idea from a design standpoint.
It will make it harder to find historical information.

If your database is relatively small, then I'd just keep everything in one
table. If it is big, then roll data that is five years old into an archive
table. That will give you only two places, and an easy-to-follow rule to tell
you where to look.

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 16.02.2010 17:21:29 von Martin Gainty

--_3f307ba6-f93c-4cf5-a38a-eb62bdab0c5e_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


i agree with jerry

=20

put date/timestamps on each record..(that way you know when the record was =
created/modified)

Martin Gainty=20
______________________________________________=20
Verzicht und Vertraulichkeitanmerkung/Note de d=E9ni et de confidentialit=
=E9

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaeng=
er sein=2C so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiter=
leitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient l=
ediglich dem Austausch von Informationen und entfaltet keine rechtliche Bin=
dungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen w=
ir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut =EAtre privil=E9gi=E9. Si vous n'=EAtes=
pas le destinataire pr=E9vu=2C nous te demandons avec bont=E9 que pour sat=
isfaire informez l'exp=E9diteur. N'importe quelle diffusion non autoris=E9e=
ou la copie de ceci est interdite. Ce message sert =E0 l'information seule=
ment et n'aura pas n'importe quel effet l=E9galement obligatoire. =C9tant d=
onn=E9 que les email peuvent facilement =EAtre sujets =E0 la manipulation=
=2C nous ne pouvons accepter aucune responsabilit=E9 pour le contenu fourni=
..



=20

> From: jschwartz@the-infoshop.com
> To: vikkiatbipl@yahoo.in=3B vegivamp@tuxera.be
> CC: mysql@lists.mysql.com
> Subject: RE: how things get messed up
> Date: Tue=2C 16 Feb 2010 11:02:22 -0500
>=20
> >-----Original Message-----
> >From: Vikram A [mailto:vikkiatbipl@yahoo.in]
> >Sent: Friday=2C February 12=2C 2010 4:13 AM
> >To: Johan De Meersman
> >Cc: MY SQL Mailing list
> >Subject: Re: how things get messed up
> >
> >Sir=2C
> >
> >Thanks for your suggestion=2C
> >I will go for blob storage=2C because our application will maintain the =
data on
> >yearly basis[stupersonal2008=2C stupersonal2009 etc.]. So i feel we may =
not=20
> >face
> >such kind of performance issue in our application.
> >
> [JS] It sounds like you are planning to have one table per year. Regardle=
ss of=20
> where you put your blobs=2C I think that is a bad idea from a design stan=
dpoint.=20
> It will make it harder to find historical information.
>=20
> If your database is relatively small=2C then I'd just keep everything in =
one=20
> table. If it is big=2C then roll data that is five years old into an arch=
ive=20
> table. That will give you only two places=2C and an easy-to-follow rule t=
o tell=20
> you where to look.
>=20
> Regards=2C
>=20
> Jerry Schwartz
> The Infoshop by Global Information Incorporated
> 195 Farmington Ave.
> Farmington=2C CT 06032
>=20
> 860.674.8796 / FAX: 860.674.8341
>=20
> www.the-infoshop.com
>=20
>=20
>=20
>=20
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmgainty@hotmail.com
>=20
=20
____________________________________________________________ _____
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/=

--_3f307ba6-f93c-4cf5-a38a-eb62bdab0c5e_--

Re: how things get messed up

am 17.02.2010 08:59:22 von Paul McCullagh


Just like to mention that http://www.blobstreaming.org was created to
solve this problem in MySQL.

The Launchpad project is here: https://launchpad.net/pbxt


On Feb 16, 2010, at 3:23 PM, Johan De Meersman wrote:

> On Mon, Feb 15, 2010 at 10:53 AM, Martijn Tonies
> wrote:
>
>> databases are made for storing data - it saves you on both database
>> and PHP
>>> requests, as (from a web point of view) you can't return the image
>>> data
>>> inside your HTML - it requires a second HTTP call. Filesystem image
>>> serving,
>>>
>>
>> Doesn't an image always required additional http calls from the
>> tag?
>
>
> Yes, that's what I'm saying. However, there's a significant difference
> between an HTTP call that only needs to pump a file on to the
> network, and
> an HTTP call that needs to start a PHP process that then in turn
> needs to
> connect to the DB, which then needs to parse and execute a query.
>
> Offloading static files (images, css, whatnot) to a separate server
> that
> runs a lightweight, threaded httpd without PHP and whatnot compiled
> in, also
> means that you'll need less hardware to serve the same amount of
> requests -
> our PHP machines typically run 400 apache processes, but a static
> server on
> the same hardware could easily serve a few thousand threads -
> suddenly it's
> not memory/cpu but disk/network bandwidth that becomes your
> bottleneck.
>
>
>
>> If you do go for BLOBs, though, for god's sake keep them in a
>> separate
>>> table, lest you fragment your datafiles. Split records are a
>>> disaster for
>>> performance.
>>>
>>
>> I agree -for MySQL-, but this is a MySQL limitation!! Not a DBMS one.
>
>
> True. I think (but am too lazy to verify) that even in MySQL this is
> mainly
> an issue with MyISAM, not InnoDB. Given how MyISAM is the default
> engine,
> though, I thought it worthwile to mention.
>
> Separate LOB storage still leaves the overhead of query parsing and
> other
> generic DBMS stuff which isn't there on a filesystem, though, not to
> mention
> that each image requests takes up a database connection for as long
> as it
> takes to transfer the (potentially huge) data. Don't forget that in a
> well-tuned database, network transfer is often a significant part of
> your
> total connection lifetime for select statements.
>
>
>
> --
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: how things get messed up

am 17.02.2010 17:00:44 von Barry Leslie

Hi,

The problem of BLOB storage with MySQL is not the actual storage of the BLO=
B
data, it is the getting BLOBs to and from the client. Traditionally MySQL
treats the BLOB data as any other data and as a result there can be major
memory usage and performance issues as the BLOB is passed back to the
client app. This is why no matter how clever the storage engine is in how i=
t
stores the BLOB, the main problem remains.

That is why the BLOB streaming daemon was developed because it allows the
actual BLOB data to be streamed to and from the database server outside of
the normal MySQL client/server connection.

So you can have the best of both worlds:

- Similar to storing the BLOB in the file system: BLOB references are store=
d
with the normal table data, not the actual BLOB data, so the tables do not
get bloated, and because the BLOB data is accessed through the BLOB
streaming daemon it has little impact on server performance and memory
requirements are not dependent on BLOB size.

- Similar to traditional storage of BLOBs in tables: the BLOB will
automatically be delete when the row referencing it is deleted and the
application designer doesn't need to implement their own file system based
BLOB storage.

So if you are looking at BLOB storage systems please have a look at

http://www.blobstreaming.org

or the launch pad project:

https://launchpad.net/pbms (not pbxt as Paul mentioned.)

Sorry if this is a bit of a plug for PBMS but the subject was BLOB storage
and that is PBMS's sole purpose in life.

Barry


On 2/16/10 8:21 AM, "Martin Gainty" wrote:

>=20
> i agree with jerry
>=20
> =20
>=20
> put date/timestamps on each record..(that way you know when the record wa=
s
> created/modified)
>=20
> Martin Gainty=20
> ______________________________________________
> Verzicht und Vertraulichkeitanmerkung/Note de d=E9ni et de confidentialit=E9
>=20
> Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfae=
nger
> sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterlei=
tung
> oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient ledigli=
ch
> dem Austausch von Informationen und entfaltet keine rechtliche
> Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koen=
nen
> wir keine Haftung fuer den Inhalt uebernehmen.
>=20
> Ce message est confidentiel et peut =EAtre privil=E9gi=E9. Si vous n'=EAtes pas l=
e
> destinataire pr=E9vu, nous te demandons avec bont=E9 que pour satisfaire info=
rmez
> l'exp=E9diteur. N'importe quelle diffusion non autoris=E9e ou la copie de cec=
i est
> interdite. Ce message sert =E0 l'information seulement et n'aura pas n'impo=
rte
> quel effet l=E9galement obligatoire. =C9tant donn=E9 que les email peuvent
> facilement =EAtre sujets =E0 la manipulation, nous ne pouvons accepter aucune
> responsabilit=E9 pour le contenu fourni.
>=20
>=20
>=20
> =20
>=20
>> From: jschwartz@the-infoshop.com
>> To: vikkiatbipl@yahoo.in; vegivamp@tuxera.be
>> CC: mysql@lists.mysql.com
>> Subject: RE: how things get messed up
>> Date: Tue, 16 Feb 2010 11:02:22 -0500
>>=20
>>> -----Original Message-----
>>> From: Vikram A [mailto:vikkiatbipl@yahoo.in]
>>> Sent: Friday, February 12, 2010 4:13 AM
>>> To: Johan De Meersman
>>> Cc: MY SQL Mailing list
>>> Subject: Re: how things get messed up
>>>=20
>>> Sir,
>>>=20
>>> Thanks for your suggestion,
>>> I will go for blob storage, because our application will maintain the d=
ata
>>> on
>>> yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may n=
ot
>>> face
>>> such kind of performance issue in our application.
>>>=20
>> [JS] It sounds like you are planning to have one table per year. Regardl=
ess
>> of=20
>> where you put your blobs, I think that is a bad idea from a design
>> standpoint.=20
>> It will make it harder to find historical information.
>>=20
>> If your database is relatively small, then I'd just keep everything in o=
ne
>> table. If it is big, then roll data that is five years old into an archi=
ve
>> table. That will give you only two places, and an easy-to-follow rule to=
tell
>> you where to look.
>>=20
>> Regards,
>>=20
>> Jerry Schwartz
>> The Infoshop by Global Information Incorporated
>> 195 Farmington Ave.
>> Farmington, CT 06032
>>=20
>> 860.674.8796 / FAX: 860.674.8341
>>=20
>> www.the-infoshop.com
>>=20
>>=20
>>=20
>>=20
>>=20
>> --=20
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmgainty@hotmail.com
>>=20
> =20
> ____________________________________________________________ _____
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/

------------------------------------------------------------ -------------
Barry Leslie

SNAP Innovation Softwareentwicklung GmbH
Senior Software Engineer

Tel: (001) 250 595 4228
Fax: (001) 250 595 4233
Email: Barry.Leslie@PrimeBase.com
Web: www.PrimeBase.com

SNAP Innovation Softwareentwicklung GmbH, D-22765 Hamburg,
Max-Brauer-Allee 50, Germany
Amtsgericht Hamburg HRB 61066, Geschäftsführer: Ulrich Zimmer, Paul
McCullagh
------------------------------------------------------------ -------------




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: how things get messed up

am 18.02.2010 05:41:16 von Vikram A

--0-2065974179-1266468076=:50461
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Dear Jerry Schwartz We have applications for colleges in India. The sa=
me idea of having single table for manipulating students records. but we ar=
e not following archiving concept. Ex stupersonal. and stuclass these =
tables are playing wide role in our application. After 7 years now there ar=
e 9000 records[postgresql backend] are there in the table. Because of this =
the entire application [ Fees, attendance, exams etc] performance is gettin=
g down. For the remedy of this I proposed this year wise architecture for o=
ur new version [mysql]. I have problem in year wise also, i have numbe=
r of mutual related tables for students such as stu_last_studies, stu_famil=
y_details, stu_address, stu_extracurri and so on. If i go for year basisis =
i have to make all the above tables also year basis.=0AHence, I feel it dif=
ficult have such number of tables after few years. As you said the ar=
chive system, can you the idea about the archive system[If needed i will gi=
ve the table structures]. It will be grate help to me. Thank you=
VIKRAM A =0A________________________________=0AFrom: Jerry=
Schwartz =0ATo: Vikram A >; Johan De Meersman =0ACc: MY SQL Mailing list lists.mysql.com>=0ASent: Tue, 16 February, 2010 9:32:22 PM=0ASubject: RE: h=
ow things get messed up >-----Original Message-----=0A>From: Vikram A =
[mailto:vikkiatbipl@yahoo.in]=0A>Sent: Friday, February 12, 2010 4:13 AM=0A=
>To: Johan De Meersman=0A>Cc: MY SQL Mailing list=0A>Subject: Re: how thing=
s get messed up=0A>=0A>Sir,=0A>=0A>Thanks for your suggestion,=0A>I will go=
for blob storage, because our application will maintain the data on=0A>yea=
rly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may not =0A>=
face=0A>such kind of performance issue in our application.=0A>=0A[JS] It so=
unds like you are planning to have one table per year. Regardless of =0Awhe=
re you put your blobs, I think that is a bad idea from a design standpoint.=
=0AIt will make it harder to find historical information. If your dat=
abase is relatively small, then I'd just keep everything in one =0Atable. I=
f it is big, then roll data that is five years old into an archive =0Atable=
.. That will give you only two places, and an easy-to-follow rule to tell =
=0Ayou where to look. Regards, Jerry Schwartz=0AThe Infoshop by G=
lobal Information Incorporated=0A195 Farmington Ave.=0AFarmington, CT 06032=
860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com =0A=
=0A-- =0AMySQL General Mailing List=0AFor list archives: http://lists=
..mysql.com/mysql=0ATo unsubscribe: http://lists.mysql.com/mysql?unsub=3D=
vikkiatbipl@yahoo.in =0A The INTERNET now has a personality. YOUR=
S! See your Yahoo! Homepage. http://in.yahoo.com/
--0-2065974179-1266468076=:50461--

Re: how things get messed up

am 18.02.2010 11:31:23 von Martijn Tonies

>We have applications for colleges in India. The same idea of having single
>table for manipulating students records. but we are not following archiving
> >concept.
>
>Ex stupersonal. and stuclass these tables are playing wide role in our
>application. After 7 years now there are 9000 records[postgresql backend]
> >are there in the table. Because of this the entire application [ Fees,
>attendance, exams etc] performance is getting down. For the remedy of this
>I >proposed this year wise architecture for our new version [mysql].
>
>I have problem in year wise also, i have number of mutual related tables
>for students such as stu_last_studies, stu_family_details, stu_address,
> >stu_extracurri and so on. If i go for year basisis i have to make all the
>above tables also year basis.
>Hence, I feel it difficult have such number of tables after few years.
>
>As you said the archive system, can you the idea about the archive
>system[If needed i will give the table structures].
>
>It will be grate help to me.

If you have performance problems with just 9000 records, there's something
seriously wrong with your queries and indices and/or your application code.

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: how things get messed up

am 18.02.2010 17:24:57 von Jerry Schwartz

------=_NextPart_000_0119_01CAB08D.02902980
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable

From: Vikram A [mailto:vikkiatbipl@yahoo.in]=20
Sent: Wednesday, February 17, 2010 11:41 PM
To: Jerry Schwartz
Cc: MY SQL Mailing list
Subject: Re: how things get messed up

=20

Dear Jerry Schwartz

We have applications for colleges in India. The same idea of having =
single table for manipulating students records. but we are not following =
archiving concept.

Ex stupersonal. and stuclass these tables are playing wide role in our =
application. After 7 years now there are 9000 records[postgresql =
backend] are there in the table. Because of this the entire application =
[ Fees, attendance, exams etc] performance is getting down. For the =
remedy of this I proposed this year wise architecture for our new =
version [mysql].



[JS] You have 9000 records? That should not slow down any application. I =
must not understand you.


I have problem in year wise also, i have number of mutual related tables =
for students such as stu_last_studies, stu_family_details, stu_address, =
stu_extracurri and so on. If i go for year basisis i have to make all =
the above tables also year basis.=20
Hence, I feel it difficult have such number of tables after few years.=20



[JS] I did not mean that you should have tables for each year. I was =
suggesting that you have tables for recent data and tables for archived =
data.=20


As you said the archive system, can you the idea about the archive =
system[If needed i will give the table structures].=20



[JS] This is best described with a picture. Here is a small example of =
what I meant:

=20

`student_master_table` (all years)

/\

/ \

`grades_current` `grades_archive`

| /

`class_master_table`

=20

The structures of the two grades tables should be almost the same, =
something like

=20

grade_id

student_id

class_id

class_start_date

grade_received

=20

You would add new grade records to the `grades_current` table.

=20

Now, suppose that you donâ€=99t usually need data more than five =
years old. Once a year you would run these queries:

=20

INSERT INTO `grades_archive` SELECT * FROM `grades_current` WHERE =
`class_start_date` < YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));

DELETE FROM `grades_current` WHERE `class_start_date` < =
YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));

=20

That would keep the `grades_current` table small. If you want to find a =
studentâ€=99s recent grade history, you would use a query like

=20

SELECT * FROM `grades_current` WHERE `student_id` =3D 12345;

=20

If you decide that you need a studentâ€=99s complete history, you =
could do=20

=20

SELECT * FROM `grades_current` WHERE `student_id` =3D 12345 UNION ALL =
SELECT * FROM `grades_archive` WHERE `student_id` =3D 12345;

=20

That is a quick outline of what I was saying.

=20

I donâ€=99t know how big your database is, so I canâ€=99t begin =
to guess whether or not this is necessary. On my desktop computer, where =
I do my testing, I have two tables: one has about 104000 records, the =
other has about 200000 records. The query=20

=20

SELECT `prod`.`prod_num`, `prod_price`.`prod_price_del_format`, =
`prod_price`.`prod_price_end_price` FROM `prod` JOIN `prod_price` ON =
`prod`.`prod_id` =3D `prod_price`.`prod_id` WHERE `prod`.`prod_num` =3D =
40967;

=20

took .70 seconds. Repeating the same query with different values of =
`prod_num` gave increasingly faster results, showing that caching is =
working as expected: after three such queries, the response time was .14 =
seconds.

=20

I understand that schools in India can be very, very big; so perhaps you =
need an archive scheme such as the one I described. In fact, it might be =
useful to extend this whole concept to using an archive database, rather =
than archive tables within the same database. The database engine =
wouldnâ€=99t really care, but since the archive database =
wouldnâ€=99t change very often you wouldnâ€=99t have to back it =
up very often, either.

=20

Regards,

=20

Jerry Schwartz

The Infoshop by Global Information Incorporated

195 Farmington Ave.

Farmington, CT 06032

=20

860.674.8796 / FAX: 860.674.8341

=20

www.the-infoshop.com

=20

=20


It will be grate help to me.

Thank you

VIKRAM A

=20

_____ =20

From: Jerry Schwartz
To: Vikram A ; Johan De Meersman =

Cc: MY SQL Mailing list
Sent: Tue, 16 February, 2010 9:32:22 PM
Subject: RE: how things get messed up

>-----Original Message-----
>From: Vikram A [mailto:vikkiatbipl@yahoo.in]
>Sent: Friday, February 12, 2010 4:13 AM
>To: Johan De Meersman
>Cc: MY SQL Mailing list
>Subject: Re: how things get messed up
>
>Sir,
>
>Thanks for your suggestion,
>I will go for blob storage, because our application will maintain the =
data on
>yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may =
not=20
>face
>such kind of performance issue in our application.
>
[JS] It sounds like you are planning to have one table per year. =
Regardless of=20
where you put your blobs, I think that is a bad idea from a design =
standpoint.=20
It will make it harder to find historical information.

If your database is relatively small, then I'd just keep everything in =
one=20
table. If it is big, then roll data that is five years old into an =
archive=20
table. That will give you only two places, and an easy-to-follow rule to =
tell=20
you where to look.

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com





--=20
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Dvikkiatbipl@yahoo.in





_____ =20

Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! =
ernetexplorer/> .


------=_NextPart_000_0119_01CAB08D.02902980--

Re: how things get messed up

am 18.02.2010 17:35:50 von Johan De Meersman

--0016e6498a5c53a501047fe28f77
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

*cough*partitioning*cough*

On Thu, Feb 18, 2010 at 5:24 PM, Jerry Schwartz
wrote:

> From: Vikram A [mailto:vikkiatbipl@yahoo.in]
> Sent: Wednesday, February 17, 2010 11:41 PM
> To: Jerry Schwartz
> Cc: MY SQL Mailing list
> Subject: Re: how things get messed up
>
>
>
> Dear Jerry Schwartz
>
> We have applications for colleges in India. The same idea of having singl=
e
> table for manipulating students records. but we are not following archivi=
ng
> concept.
>
> Ex stupersonal. and stuclass these tables are playing wide role in our
> application. After 7 years now there are 9000 records[postgresql backend]
> are there in the table. Because of this the entire application [ Fees,
> attendance, exams etc] performance is getting down. For the remedy of thi=
s I
> proposed this year wise architecture for our new version [mysql].
>
>
>
> [JS] You have 9000 records? That should not slow down any application. I
> must not understand you.
>
>
> I have problem in year wise also, i have number of mutual related tables
> for students such as stu_last_studies, stu_family_details, stu_address,
> stu_extracurri and so on. If i go for year basisis i have to make all the
> above tables also year basis.
> Hence, I feel it difficult have such number of tables after few years.
>
>
>
> [JS] I did not mean that you should have tables for each year. I was
> suggesting that you have tables for recent data and tables for archived
> data.
>
>
> As you said the archive system, can you the idea about the archive
> system[If needed i will give the table structures].
>
>
>
> [JS] This is best described with a picture. Here is a small example of wh=
at
> I meant:
>
>
>
> `student_master_table` (all years)
>
> /\
>
> / \
>
> `grades_current` `grades_archive`
>
> | /
>
> `class_master_table`
>
>
>
> The structures of the two grades tables should be almost the same,
> something like
>
>
>
> grade_id
>
> student_id
>
> class_id
>
> class_start_date
>
> grade_received
>
>
>
> You would add new grade records to the `grades_current` table.
>
>
>
> Now, suppose that you don=92t usually need data more than five years old.
> Once a year you would run these queries:
>
>
>
> INSERT INTO `grades_archive` SELECT * FROM `grades_current` WHERE
> `class_start_date` < YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));
>
> DELETE FROM `grades_current` WHERE `class_start_date` <
> YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));
>
>
>
> That would keep the `grades_current` table small. If you want to find a
> student=92s recent grade history, you would use a query like
>
>
>
> SELECT * FROM `grades_current` WHERE `student_id` =3D 12345;
>
>
>
> If you decide that you need a student=92s complete history, you could do
>
>
>
> SELECT * FROM `grades_current` WHERE `student_id` =3D 12345 UNION ALL SEL=
ECT
> * FROM `grades_archive` WHERE `student_id` =3D 12345;
>
>
>
> That is a quick outline of what I was saying.
>
>
>
> I don=92t know how big your database is, so I can=92t begin to guess whet=
her or
> not this is necessary. On my desktop computer, where I do my testing, I h=
ave
> two tables: one has about 104000 records, the other has about 200000
> records. The query
>
>
>
> SELECT `prod`.`prod_num`, `prod_price`.`prod_price_del_format`,
> `prod_price`.`prod_price_end_price` FROM `prod` JOIN `prod_price` ON
> `prod`.`prod_id` =3D `prod_price`.`prod_id` WHERE `prod`.`prod_num` =3D 4=
0967;
>
>
>
> took .70 seconds. Repeating the same query with different values of
> `prod_num` gave increasingly faster results, showing that caching is work=
ing
> as expected: after three such queries, the response time was .14 seconds.
>
>
>
> I understand that schools in India can be very, very big; so perhaps you
> need an archive scheme such as the one I described. In fact, it might be
> useful to extend this whole concept to using an archive database, rather
> than archive tables within the same database. The database engine wouldn=
=92t
> really care, but since the archive database wouldn=92t change very often =
you
> wouldn=92t have to back it up very often, either.
>
>
>
> Regards,
>
>
>
> Jerry Schwartz
>
> The Infoshop by Global Information Incorporated
>
> 195 Farmington Ave.
>
> Farmington, CT 06032
>
>
>
> 860.674.8796 / FAX: 860.674.8341
>
>
>
> www.the-infoshop.com
>
>
>
>
>
>
> It will be grate help to me.
>
> Thank you
>
> VIKRAM A
>
>
>
> _____
>
> From: Jerry Schwartz
> To: Vikram A ; Johan De Meersman e
> >
> Cc: MY SQL Mailing list
> Sent: Tue, 16 February, 2010 9:32:22 PM
> Subject: RE: how things get messed up
>
> >-----Original Message-----
> >From: Vikram A [mailto:vikkiatbipl@yahoo.in]
> >Sent: Friday, February 12, 2010 4:13 AM
> >To: Johan De Meersman
> >Cc: MY SQL Mailing list
> >Subject: Re: how things get messed up
> >
> >Sir,
> >
> >Thanks for your suggestion,
> >I will go for blob storage, because our application will maintain the da=
ta
> on
> >yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may no=
t
> >face
> >such kind of performance issue in our application.
> >
> [JS] It sounds like you are planning to have one table per year. Regardle=
ss
> of
> where you put your blobs, I think that is a bad idea from a design
> standpoint.
> It will make it harder to find historical information.
>
> If your database is relatively small, then I'd just keep everything in on=
e
> table. If it is big, then roll data that is five years old into an archiv=
e
> table. That will give you only two places, and an easy-to-follow rule to
> tell
> you where to look.
>
> Regards,
>
> Jerry Schwartz
> The Infoshop by Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796 / FAX: 860.674.8341
>
> www.the-infoshop.com
>
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dvikkiatbipl@yahoo=
..in
>
>
>
>
>
> _____
>
> Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! <
> http://in.rd.yahoo.com/tagline_ie8_new/*http:/downloads.yaho o.com/in/inte=
rnetexplorer/>
> .
>
>


--=20
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0016e6498a5c53a501047fe28f77--

RE: how things get messed up

am 22.02.2010 23:23:38 von Jerry Schwartz

------=_NextPart_000_00A5_01CAB3E3.EB07B760
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable

I thought I had replied publicly to Johanâ€=99s suggestion, with =
some personal experience.

=20

Heâ€=99s absolutely right, that would give you a solution that would =
be completely transparent to your application and therefore much easier =
to implement. You could keep re-arranging your partitions as necessary.

=20

I, myself, have never used portioning so I hope someone with experience =
will chime in here.

=20

One disadvantage is that all of your data would be in one database, =
making your backups bigger and bigger. If you used a separate database =
as an archive, the archive database wouldnâ€=99t have to be backed =
up very often.

=20

I never did get a good feel for how big your database will be. Even if =
youâ€=99re talking about IIT, and assuming 100000 students, 6 =
classes per semester, three semesters, per year, 20 years of history, =
youâ€=99re going to have 36 million class records. I think there are =
much bigger databases running quite well.

=20

Regards,

=20

Jerry Schwartz

The Infoshop by Global Information Incorporated

195 Farmington Ave.

Farmington, CT 06032

=20

860.674.8796 / FAX: 860.674.8341

=20

www.the-infoshop.com

=20

From: Vikram A [mailto:vikkiatbipl@yahoo.in]=20
Sent: Friday, February 19, 2010 11:17 PM
To: Jerry Schwartz
Cc: Johan De Meersman
Subject: Re: how things get messed up

=20

Dear Sir,

I agree with the solution proposed.=20

But one of the member[Johan De Meersman ] of this =
list has commented it.


Do you have any opposition/Suggestions?

Thank you

VIKRAM A

_____ =20

From: Jerry Schwartz
To: Vikram A
Cc: MY SQL Mailing list
Sent: Thu, 18 February, 2010 9:54:57 PM
Subject: RE: how things get messed up

From: Vikram A [mailto:vikkiatbipl@yahoo.in]=20
Sent: Wednesday, February 17, 2010 11:41 PM
To: Jerry Schwartz
Cc: MY SQL Mailing list
Subject: Re: how things get messed up

=20

Dear Jerry Schwartz

We have applications for colleges in India. The same idea of having =
single table for manipulating students records. but we are not following =
archiving concept.

Ex stupersonal. and stuclass these tables are playing wide role in our =
application. After 7 years now there are 9000 records[postgresql =
backend] are there in the table. Because of this the entire application =
[ Fees, attendance, exams etc] performance is getting down. For the =
remedy of this I proposed this year wise architecture for our new =
version [mysql].

[JS] You have 9000 records? That should not slow down any application. I =
must not understand you.


I have problem in year wise also, i have number of mutual related tables =
for students such as stu_last_studies, stu_family_details, stu_address, =
stu_extracurri and so on. If i go for year basisis i have to make all =
the above tables also year basis.=20
Hence, I feel it difficult have such number of tables after few years.=20

[JS] I did not mean that you should have tables for each year. I was =
suggesting that you have tables for recent data and tables for archived =
data.=20


As you said the archive system, can you the idea about the archive =
system[If needed i will give the table structures].=20

[JS] This is best described with a picture. Here is a small example of =
what I meant:

=20

`student_master_table` (all years)

/\

/ \

`grades_current` `grades_archive`

| /

`class_master_table`

=20

The structures of the two grades tables should be almost the same, =
something like

=20

grade_id

student_id

class_id

class_start_date

grade_received

=20

You would add new grade records to the `grades_current` table.

=20

Now, suppose that you donâ€=99t usually need data more than five =
years old. Once a year you would run these queries:

=20

INSERT INTO `grades_archive` SELECT * FROM `grades_current` WHERE =
`class_start_date` < YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));

DELETE FROM `grades_current` WHERE `class_start_date` < =
YEAR(DATE_SUB(NOW(), INTERVAL 4 YEAR));

=20

That would keep the `grades_current` table small. If you want to find a =
studentâ€=99s recent grade history, you would use a query like

=20

SELECT * FROM `grades_current` WHERE `student_id` =3D 12345;

=20

If you decide that you need a studentâ€=99s complete history, you =
could do=20

=20

SELECT * FROM `grades_current` WHERE `student_id` =3D 12345 UNION ALL =
SELECT * FROM `grades_archive` WHERE `student_id` =3D 12345;

=20

That is a quick outline of what I was saying.

=20

I donâ€=99t know how big your database is, so I canâ€=99t begin =
to guess whether or not this is necessary. On my desktop computer, where =
I do my testing, I have two tables: one has about 104000 records, the =
other has about 200000 records. The query=20

=20

SELECT `prod`.`prod_num`, `prod_price`.`prod_price_del_format`, =
`prod_price`.`prod_price_end_price` FROM `prod` JOIN `prod_price` ON =
`prod`.`prod_id` =3D `prod_price`.`prod_id` WHERE `prod`.`prod_num` =3D =
40967;

=20

took .70 seconds. Repeating the same query with different values of =
`prod_num` gave increasingly faster results, showing that caching is =
working as expected: after three such queries, the response time was .14 =
seconds.

=20

I understand that schools in India can be very, very big; so perhaps you =
need an archive scheme such as the one I described. In fact, it might be =
useful to extend this whole concept to using an archive database, rather =
than archive tables within the same database. The database engine =
wouldnâ€=99t really care, but since the archive database =
wouldnâ€=99t change very often you wouldnâ€=99t have to back it =
up very often, either.

=20

Regards,

=20

Jerry Schwartz

The Infoshop by Global Information Incorporated

195 Farmington Ave.

Farmington, CT 06032

=20

860.674.8796 / FAX: 860.674.8341

=20

www.the-infoshop.com

=20

=20


It will be grate help to me.

Thank you

VIKRAM A

=20

_____ =20

From: Jerry Schwartz
To: Vikram A ; Johan De Meersman =

Cc: MY SQL Mailing list
Sent: Tue, 16 February, 2010 9:32:22 PM
Subject: RE: how things get messed up

>-----Original Message-----
>From: Vikram A [mailto:vikkiatbipl@yahoo.in]
>Sent: Friday, February 12, 2010 4:13 AM
>To: Johan De Meersman
>Cc: MY SQL Mailing list
>Subject: Re: how things get messed up
>
>Sir,
>
>Thanks for your suggestion,
>I will go for blob storage, because our application will maintain the =
data on
>yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel we may =
not=20
>face
>such kind of performance issue in our application.
>
[JS] It sounds like you are planning to have one table per year. =
Regardless of=20
where you put your blobs, I think that is a bad idea from a design =
standpoint.=20
It will make it harder to find historical information.

If your database is relatively small, then I'd just keep everything in =
one=20
table. If it is big, then roll data that is five years old into an =
archive=20
table. That will give you only two places, and an easy-to-follow rule to =
tell=20
you where to look.

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com





--=20
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Dvikkiatbipl@yahoo.in

=20

_____ =20

Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! =
ernetexplorer/> .





_____ =20

The INTERNET now has a personality. YOURS! See =
your Yahoo! =
Homepage.


------=_NextPart_000_00A5_01CAB3E3.EB07B760--

Re: how things get messed up

am 24.02.2010 05:15:16 von Vikram A

--0-327255169-1266984916=:61457
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Sirs, Because one table will hold the large amount of data, only the r=
ecent data will be used for transactions; so rest of the old records are re=
main same with out any transaction. So we have decided to go for year based=
storage; here even old records can be taken out by join queries. I h=
ope you experts will agree with this. Or your comments and suggestions are =
welcome for the better design. Thank you VIKRAM A =
=0A________________________________=0AFrom: Jerry Schwartz nfoshop.com>=0ATo: Vikram A =0ACc: Johan De Meersman =
; MY SQL Mailing list =0ASent: T=
ue, 23 February, 2010 3:53:38 AM=0ASubject: RE: how things get messed up=0A=
=0A =0AI thought I had replied publicly to Johanâ€=99s suggestion, with=
=0Asome personal experience.=0A =0AHeâ€=99s absolutely right, that woul=
d give you a solution that would=0Abe completely transparent to your applic=
ation and therefore much easier to=0Aimplement. You could keep re-arranging=
your partitions as necessary.=0A =0AI, myself, have never used portioning =
so I hope someone with=0Aexperience will chime in here.=0A =0AOne disadvant=
age is that all of your data would be in one=0Adatabase, making your backup=
s bigger and bigger. If you used a separate=0Adatabase as an archive, the a=
rchive database wouldnâ€=99t have to be backed up very=0Aoften.=0A =0AI=
never did get a good feel for how big your database will be.=0AEven if you=
â€=99re talking about IIT, and assuming 100000 students, 6 classes per=
=0Asemester, three semesters, per year, 20 years of history, youâ€=99re=
going to have 36=0Amillion class records. I think there are much bigger da=
tabases running quite=0Awell.=0A =0ARegards,=0A =0AJerry Schwartz=0AThe Inf=
oshop by Global Information Incorporated=0A195 Farmington Ave.=0AFarmington=
, CT 06032=0A =0A860.674.8796 / FAX: 860.674.8341=0A =0Awww.the-infoshop.co=
m=0A =0AFrom:Vikram A=0A[mailto:vikkiatbipl@yahoo.in] =0ASent: Friday, Febr=
uary 19, 2010 11:17 PM=0ATo: Jerry Schwartz=0ACc: Johan De Meersman=0ASubje=
ct: Re: how things get messed up=0A =0ADear=0ASir, I agree with the so=
lution proposed. But one of the member[Johan De Meersman era.be>] of this list has=0Acommented it. Do you have any opposition/S=
uggestions? Thank you VIKRAM A _____________________________=
___=0A =0AFrom:Jerry Schwartz=0A=0ATo: Vikram A=
=0ACc: MY SQL Mailing list =
=0ASent: Thu, 18 February, 2010 9:54:57 PM=0ASubject: RE: how things get me=
ssed up=0AFrom:Vikram A=0A[mailto:vikkiatbipl@yahoo.in] =0ASent: Wednesday,=
February 17, 2010 11:41 PM=0ATo: Jerry Schwartz=0ACc: MY SQL Mailing list=
=0ASubject: Re: how things get messed up=0A =0ADear Jerry Schwartz We =
have applications for colleges in India. The same idea of having single=0At=
able for manipulating students records. but we are not following archiving=
=0Aconcept. Ex stupersonal. and stuclass these tables are playing wide=
role=0Ain our application. After 7 years now there are 9000 records[postgr=
esql=0Abackend] are there in the table. Because of this the entire applicat=
ion [ Fees,=0Aattendance, exams etc] performance is getting down. For the r=
emedy of this I=0Aproposed this year wise architecture for our new version =
[mysql].=0A[JS] You have 9000 records? That should not slow down any=0Aappl=
ication. I must not understand you. I have problem in year wise also, =
i have number of mutual related tables for=0Astudents such as stu_last_stud=
ies, stu_family_details, stu_address,=0Astu_extracurri and so on. If i go f=
or year basisis i have to make all the above=0Atables also year basis. =0AH=
ence, I feel it difficult have such number of tables after few years. =0A[J=
S] I did not mean that you should have tables for each year. I=0Awas sugges=
ting that you have tables for recent data and tables for archived=0Adata. =
As you said the archive system, can you the idea about the archive sy=
stem[If=0Aneeded i will give the table structures]. =0A[JS] This is best de=
scribed with a picture. Here is a small=0Aexample of what I meant:=0A =0A =
=0A`student_master_table` (all years)=0A =0A/\=0A=
=0A/ \=0A `grades_current` `grades_archive`=0A =
=0A| /=0A `class_master_table`=0A =0AThe structures o=
f the two grades tables should be almost the=0Asame, something like=0A =0Ag=
rade_id =0Astudent_id =0Aclass=
_id =0Aclass_start_date=0Agrade_received=0A =0AYou would add new gra=
de records to the `grades_current` table.=0A =0ANow, suppose that you don=
â€=99t usually need data more than five=0Ayears old. Once a year you wo=
uld run these queries:=0A =0AINSERT INTO `grades_archive` SELECT * FROM `gr=
ades_current` WHERE=0A`class_start_date` < YEAR(DATE_SUB(NOW(), INTERVAL 4 =
YEAR));=0ADELETE FROM `grades_current` WHERE `class_start_date` <=0AYEAR(DA=
TE_SUB(NOW(), INTERVAL 4 YEAR));=0A =0AThat would keep the `grades_current`=
table small. If you want to=0Afind a studentâ€=99s recent grade histor=
y, you would use a query like=0A =0ASELECT * FROM `grades_current` WHERE `s=
tudent_id` =3D 12345;=0A =0AIf you decide that you need a studentâ€=99s=
complete history, you=0Acould do =0A =0ASELECT * FROM `grades_current` WHE=
RE `student_id` =3D 12345 UNION=0AALL SELECT * FROM `grades_archive` WHERE =
`student_id` =3D 12345;=0A =0AThat is a quick outline of what I was saying.=
=0A =0AI donâ€=99t know how big your database is, so I canâ€=99t be=
gin to guess=0Awhether or not this is necessary. On my desktop computer, wh=
ere I do my=0Atesting, I have two tables: one has about 104000 records, the=
other has about=0A200000 records. The query =0A =0ASELECT `prod`.`prod_num=
`, `prod_price`.`prod_price_del_format`,=0A`prod_price`.`prod_p rice_end_pri=
ce` FROM `prod` JOIN `prod_price` ON=0A`prod`.`prod_id` =3D `prod_price`.`p=
rod_id` WHERE `prod`.`prod_num` =3D 40967;=0A =0Atook .70 seconds. Repeatin=
g the same query with different values=0Aof `prod_num` gave increasingly fa=
ster results, showing that caching is working=0Aas expected: after three su=
ch queries, the response time was .14 seconds.=0A =0AI understand that scho=
ols in India can be very, very big; so=0Aperhaps you need an archive scheme=
such as the one I described. In fact, it=0Amight be useful to extend this =
whole concept to using an archive database,=0Arather than archive tables wi=
thin the same database. The database engine=0Awouldnâ€=99t really care,=
but since the archive database wouldnâ€=99t change very often=0Ayou wo=
uldnâ€=99t have to back it up very often, either.=0A =0ARegards,=0A =0A=
Jerry Schwartz=0AThe Infoshop by Global Information Incorporated=0A195 Farm=
ington Ave.=0AFarmington, CT 06032=0A =0A860.674.8796 / FAX: 860.674.8341=
=0A =0Awww.the-infoshop.com=0A =0A It will be grate help to me. =
Thank you VIKRAM A=0A ________________________________=0A =0AFro=
m:Jerry Schwartz=0A=0ATo: Vikram A @yahoo.in>; Johan De Meersman=0A=0ACc: MY SQL Mailing l=
ist =0ASent: Tue, 16 February, 2010 9:32:22 PM=0ASub=
ject: RE: how things get messed up >-----Original Message-----=0A>From=
: Vikram A [mailto:vikkiatbipl@yahoo.in]=0A>Sent: Friday, February 12, 2010=
4:13 AM=0A>To: Johan De Meersman=0A>Cc: MY SQL Mailing list=0A>Subject: Re=
: how things get messed up=0A>=0A>Sir,=0A>=0A>Thanks for your suggestion,=
=0A>I will go for blob storage, because our application will maintain the d=
ata=0Aon=0A>yearly basis[stupersonal2008, stupersonal2009 etc.]. So i feel =
we may not =0A>face=0A>such kind of performance issue in our application.=
=0A>=0A[JS] It sounds like you are planning to have one table per year. Reg=
ardless of =0Awhere you put your blobs, I think that is a bad idea from a d=
esign standpoint. =0AIt will make it harder to find historical information.=
If your database is relatively small, then I'd just keep everything i=
n one =0Atable. If it is big, then roll data that is five years old into an=
archive =0Atable. That will give you only two places, and an easy-to-follo=
w rule to tell =0Ayou where to look. Regards, Jerry Schwartz=0ATh=
e Infoshop by Global Information Incorporated=0A195 Farmington Ave.=0AFarmi=
ngton, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop=
..com -- =0AMySQL General Mailing List=0AFor list archives:=
http://lists.mysql.com/mysql=0ATo unsubscribe: http://lists.mysql.com/m=
ysql?unsub=3Dvikkiatbipl@yahoo.in=0A ________________________________=
=0A =0AYour Mail works best with the New Yahoo Optimized IE8. Get it NOW!.=
________________________________=0A =0AThe INTERNET now has a p=
ersonality. YOURS! See=0Ayour Yahoo! Homepage. =0A Your Mail work=
s best with the New Yahoo Optimized IE8. Get it NOW! http://downloads.yahoo=
..com/in/internetexplorer/
--0-327255169-1266984916=:61457--