Running Queries When INSERTing Data?

am 08.11.2010 19:18:24 von Hal Vaughan

I'm redesigning some software that's been in use since 2002. I'll be =
working with databases that will start small and grow along the way. =20

In the old format, data would come to us in mega-big text files that had =
to be parsed and manipulated and so on with Perl to remove crap and =
finally produce one tab delimited file. Once that file was created, =
another simple program would go through and use the data in each line =
for an INSERT statement that would put the data in a table. This table =
also has an Idx field that is an auto-incrementing primary key for the =
table.

Each night at 3:30 am, a program would run and would go through the same =
process for each client. I never timed it, but it could take something =
like 30-60 seconds per client, but timing wasn't a major issue, since it =
had a LONG time from then until new data would be inserted into the DB. =
The SELECT statements to pull the data for each client involve a number =
of AND and OR conditions. The first one of these would create a =
temporary table with its results, then another long SELECT statement =
would create a 2nd temporary table by filtering the data out more. This =
would continue for a few temporary tables until the data was filtered. =
Then it would be packaged up and encrypted, then sent out to the client, =
who has a program on his computer to read that data and print it out if =
desired.

This has worked, but for a number of reasons, a once-a-day data pull and =
send won't work as well with the new design. The program on the =
clients' computers will be able to access a special server just for them =
directly. (I know the concept of a server for each human client sounds =
inefficient, but it actually improves operations in a number of ways.) =
So each server will only have to provide data for one client.

The big difference is that I'd like to make it so they can access the =
data live, or almost live. I don't mean all the data, but the subset =
that meets their needs. In other words, the equivalent of what was sent =
to them daily in the old system. Their individual servers will still =
get the big tab-delimited file that will still be INSERTed in to their =
DB line by line. But I'd like to be able to select from the new data as =
it comes in, once it's been given a new number in the Idx field.

Is there any way to run a row of data through SELECT queries as it is =
being INSERTed into a table -- or just after?

The reason for doing this, instead of INSERTing all the data, then =
running a program is that as the database grows, pulling out the data =
will take longer and longer, so if there were a way to screen data as it =
comes in, that would make it easier to provide "instantly available" =
data.

I also know my knowledge of MySQL is quite limited, so if this can be =
done in better ways, I'd be interested in hearing about them.

Thank you!

Hal=

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: Running Queries When INSERTing Data?

am 08.11.2010 19:23:20 von Gavin Towey

I'm not sure I understand exactly what you mean, but I think you just need =
to keep a timestamp associated with each row as it is inserted, put an inde=
x on it, then you can select "new" data just by using the appropriate time =
range.

Also, if you're parsing files into tab delimited format, you don't need to =
write a separate parser to insert rows line by line. MySQL has LOAD DATA I=
NFILE which takes delimited text files and inserts data in bulk, which is m=
uch faster than inserting line by line.

Regards,
Gavin Towey

-----Original Message-----
From: Hal Vaughan [mailto:hal@halblog.com]
Sent: Monday, November 08, 2010 10:18 AM
To: mysql@lists.mysql.com
Subject: Running Queries When INSERTing Data?

I'm redesigning some software that's been in use since 2002. I'll be worki=
ng with databases that will start small and grow along the way.

In the old format, data would come to us in mega-big text files that had to=
be parsed and manipulated and so on with Perl to remove crap and finally p=
roduce one tab delimited file. Once that file was created, another simple =
program would go through and use the data in each line for an INSERT statem=
ent that would put the data in a table. This table also has an Idx field t=
hat is an auto-incrementing primary key for the table.

Each night at 3:30 am, a program would run and would go through the same pr=
ocess for each client. I never timed it, but it could take something like =
30-60 seconds per client, but timing wasn't a major issue, since it had a L=
ONG time from then until new data would be inserted into the DB. The SELEC=
T statements to pull the data for each client involve a number of AND and O=
R conditions. The first one of these would create a temporary table with i=
ts results, then another long SELECT statement would create a 2nd temporary=
table by filtering the data out more. This would continue for a few tempo=
rary tables until the data was filtered. Then it would be packaged up and =
encrypted, then sent out to the client, who has a program on his computer t=
o read that data and print it out if desired.

This has worked, but for a number of reasons, a once-a-day data pull and se=
nd won't work as well with the new design. The program on the clients' com=
puters will be able to access a special server just for them directly. (I =
know the concept of a server for each human client sounds inefficient, but =
it actually improves operations in a number of ways.) So each server will =
only have to provide data for one client.

The big difference is that I'd like to make it so they can access the data =
live, or almost live. I don't mean all the data, but the subset that meets=
their needs. In other words, the equivalent of what was sent to them dail=
y in the old system. Their individual servers will still get the big tab-d=
elimited file that will still be INSERTed in to their DB line by line. But=
I'd like to be able to select from the new data as it comes in, once it's =
been given a new number in the Idx field.

Is there any way to run a row of data through SELECT queries as it is being=
INSERTed into a table -- or just after?

The reason for doing this, instead of INSERTing all the data, then running =
a program is that as the database grows, pulling out the data will take lon=
ger and longer, so if there were a way to screen data as it comes in, that =
would make it easier to provide "instantly available" data.

I also know my knowledge of MySQL is quite limited, so if this can be done =
in better ways, I'd be interested in hearing about them.

Thank you!

Hal
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com

This message contains confidential information and is intended only for the=
individual named. If you are not the named addressee, you are notified th=
at reviewing, disseminating, disclosing, copying or distributing this e-mai=
l is strictly prohibited. Please notify the sender immediately by e-mail i=
f you have received this e-mail by mistake and delete this e-mail from your=
system. E-mail transmission cannot be guaranteed to be secure or error-fre=
e as information could be intercepted, corrupted, lost, destroyed, arrive l=
ate or incomplete, or contain viruses. The sender therefore does not accept=
liability for any loss or damage caused by viruses or errors or omissions =
in the contents of this message, which arise as a result of e-mail transmis=
sion. [FriendFinder Networks, Inc., 220 Humboldt Court, Sunnyvale, CA 94089=
, USA, FriendFinder.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Running Queries When INSERTing Data?

am 08.11.2010 19:39:06 von Hal Vaughan

On Nov 8, 2010, at 1:23 PM, Gavin Towey wrote:

> I'm not sure I understand exactly what you mean, but I think you just =
need to keep a timestamp associated with each row as it is inserted, put =
an index on it, then you can select "new" data just by using the =
appropriate time range.

But won't that take just as long as any other queries? Or will it be =
speeded up because all the matching records would be adjacent to each =
other -- like all at the end?

> Also, if you're parsing files into tab delimited format, you don't =
need to write a separate parser to insert rows line by line. MySQL has =
LOAD DATA INFILE which takes delimited text files and inserts data in =
bulk, which is much faster than inserting line by line.

THANKS! Is this new? I never read about it before, but then again, =
from about 2005-2006 until now the system was running smoothly and =
that's a long gap with no programming work!

Hal

>=20
> Regards,
> Gavin Towey
>=20
> -----Original Message-----
> From: Hal Vaughan [mailto:hal@halblog.com]
> Sent: Monday, November 08, 2010 10:18 AM
> To: mysql@lists.mysql.com
> Subject: Running Queries When INSERTing Data?
>=20
> I'm redesigning some software that's been in use since 2002. I'll be =
working with databases that will start small and grow along the way.
>=20
> In the old format, data would come to us in mega-big text files that =
had to be parsed and manipulated and so on with Perl to remove crap and =
finally produce one tab delimited file. Once that file was created, =
another simple program would go through and use the data in each line =
for an INSERT statement that would put the data in a table. This table =
also has an Idx field that is an auto-incrementing primary key for the =
table.
>=20
> Each night at 3:30 am, a program would run and would go through the =
same process for each client. I never timed it, but it could take =
something like 30-60 seconds per client, but timing wasn't a major =
issue, since it had a LONG time from then until new data would be =
inserted into the DB. The SELECT statements to pull the data for each =
client involve a number of AND and OR conditions. The first one of =
these would create a temporary table with its results, then another long =
SELECT statement would create a 2nd temporary table by filtering the =
data out more. This would continue for a few temporary tables until the =
data was filtered. Then it would be packaged up and encrypted, then =
sent out to the client, who has a program on his computer to read that =
data and print it out if desired.
>=20
> This has worked, but for a number of reasons, a once-a-day data pull =
and send won't work as well with the new design. The program on the =
clients' computers will be able to access a special server just for them =
directly. (I know the concept of a server for each human client sounds =
inefficient, but it actually improves operations in a number of ways.) =
So each server will only have to provide data for one client.
>=20
> The big difference is that I'd like to make it so they can access the =
data live, or almost live. I don't mean all the data, but the subset =
that meets their needs. In other words, the equivalent of what was sent =
to them daily in the old system. Their individual servers will still =
get the big tab-delimited file that will still be INSERTed in to their =
DB line by line. But I'd like to be able to select from the new data as =
it comes in, once it's been given a new number in the Idx field.
>=20
> Is there any way to run a row of data through SELECT queries as it is =
being INSERTed into a table -- or just after?
>=20
> The reason for doing this, instead of INSERTing all the data, then =
running a program is that as the database grows, pulling out the data =
will take longer and longer, so if there were a way to screen data as it =
comes in, that would make it easier to provide "instantly available" =
data.
>=20
> I also know my knowledge of MySQL is quite limited, so if this can be =
done in better ways, I'd be interested in hearing about them.
>=20
> Thank you!
>=20
>=20
>=20
> Hal
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com
>=20
>=20
> This message contains confidential information and is intended only =
for the individual named. If you are not the named addressee, you are =
notified that reviewing, disseminating, disclosing, copying or =
distributing this e-mail is strictly prohibited. Please notify the =
sender immediately by e-mail if you have received this e-mail by mistake =
and delete this e-mail from your system. E-mail transmission cannot be =
guaranteed to be secure or error-free as information could be =
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or =
contain viruses. The sender therefore does not accept liability for any =
loss or damage caused by viruses or errors or omissions in the contents =
of this message, which arise as a result of e-mail transmission. =
[FriendFinder Networks, Inc., 220 Humboldt Court, Sunnyvale, CA 94089, =
USA, FriendFinder.com
>=20

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: Running Queries When INSERTing Data?

am 08.11.2010 21:33:45 von BMBasal

>> But won't that take just as long as any other queries? Or will it be
speeded up because all the matching records would be adjacent to each other
-- like all at the end?

You can order the result data set by timestamp in descending order, so the
latest will come up first, i.e., LIFO.

-----Original Message-----
From: Hal Vaughan [mailto:hal@halblog.com]
Sent: Monday, November 08, 2010 1:39 PM
To: mysql@lists.mysql.com
Subject: Re: Running Queries When INSERTing Data?

On Nov 8, 2010, at 1:23 PM, Gavin Towey wrote:

> I'm not sure I understand exactly what you mean, but I think you just need
to keep a timestamp associated with each row as it is inserted, put an index
on it, then you can select "new" data just by using the appropriate time
range.

But won't that take just as long as any other queries? Or will it be
speeded up because all the matching records would be adjacent to each other
-- like all at the end?

> Also, if you're parsing files into tab delimited format, you don't need to
write a separate parser to insert rows line by line. MySQL has LOAD DATA
INFILE which takes delimited text files and inserts data in bulk, which is
much faster than inserting line by line.

THANKS! Is this new? I never read about it before, but then again, from
about 2005-2006 until now the system was running smoothly and that's a long
gap with no programming work!

Hal

>
> Regards,
> Gavin Towey
>
> -----Original Message-----
> From: Hal Vaughan [mailto:hal@halblog.com]
> Sent: Monday, November 08, 2010 10:18 AM
> To: mysql@lists.mysql.com
> Subject: Running Queries When INSERTing Data?
>
> I'm redesigning some software that's been in use since 2002. I'll be
working with databases that will start small and grow along the way.
>
> In the old format, data would come to us in mega-big text files that had
to be parsed and manipulated and so on with Perl to remove crap and finally
produce one tab delimited file. Once that file was created, another simple
program would go through and use the data in each line for an INSERT
statement that would put the data in a table. This table also has an Idx
field that is an auto-incrementing primary key for the table.
>
> Each night at 3:30 am, a program would run and would go through the same
process for each client. I never timed it, but it could take something like
30-60 seconds per client, but timing wasn't a major issue, since it had a
LONG time from then until new data would be inserted into the DB. The
SELECT statements to pull the data for each client involve a number of AND
and OR conditions. The first one of these would create a temporary table
with its results, then another long SELECT statement would create a 2nd
temporary table by filtering the data out more. This would continue for a
few temporary tables until the data was filtered. Then it would be packaged
up and encrypted, then sent out to the client, who has a program on his
computer to read that data and print it out if desired.
>
> This has worked, but for a number of reasons, a once-a-day data pull and
send won't work as well with the new design. The program on the clients'
computers will be able to access a special server just for them directly.
(I know the concept of a server for each human client sounds inefficient,
but it actually improves operations in a number of ways.) So each server
will only have to provide data for one client.
>
> The big difference is that I'd like to make it so they can access the data
live, or almost live. I don't mean all the data, but the subset that meets
their needs. In other words, the equivalent of what was sent to them daily
in the old system. Their individual servers will still get the big
tab-delimited file that will still be INSERTed in to their DB line by line.
But I'd like to be able to select from the new data as it comes in, once
it's been given a new number in the Idx field.
>
> Is there any way to run a row of data through SELECT queries as it is
being INSERTed into a table -- or just after?
>
> The reason for doing this, instead of INSERTing all the data, then running
a program is that as the database grows, pulling out the data will take
longer and longer, so if there were a way to screen data as it comes in,
that would make it easier to provide "instantly available" data.
>
> I also know my knowledge of MySQL is quite limited, so if this can be done
in better ways, I'd be interested in hearing about them.
>
> Thank you!
>
>
>
> Hal
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=gtowey@ffn.com
>
>
> This message contains confidential information and is intended only for
the individual named. If you are not the named addressee, you are notified
that reviewing, disseminating, disclosing, copying or distributing this
e-mail is strictly prohibited. Please notify the sender immediately by
e-mail if you have received this e-mail by mistake and delete this e-mail
from your system. E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain viruses. The sender therefore does not
accept liability for any loss or damage caused by viruses or errors or
omissions in the contents of this message, which arise as a result of e-mail
transmission. [FriendFinder Networks, Inc., 220 Humboldt Court, Sunnyvale,
CA 94089, USA, FriendFinder.com
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=linyu@cs.albany.edu

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Running Queries When INSERTing Data?

am 08.11.2010 22:53:42 von Gavin Towey

If you are selecting records within a certain time range that is a subset o=
f the entire set of data, then indexes which use the timestamp column will =
be fine.

More generally: create appropriate indexes to optimize queries.

Although typically, you should design the database to be "correct" first, t=
hen identify performance bottlenecks in the production system, and fix them=
individually. Trying to predict performance is difficult. If you're conc=
erned, then it's best to create a test that reproduces the exact situation =
of which you are unsure so you can observe real performance, and experiment=
with optimizations.

-----Original Message-----
From: Hal Vaughan [mailto:hal@halblog.com]
Sent: Monday, November 08, 2010 10:39 AM
To: mysql@lists.mysql.com
Subject: Re: Running Queries When INSERTing Data?

On Nov 8, 2010, at 1:23 PM, Gavin Towey wrote:

> I'm not sure I understand exactly what you mean, but I think you just nee=
d to keep a timestamp associated with each row as it is inserted, put an in=
dex on it, then you can select "new" data just by using the appropriate tim=
e range.

But won't that take just as long as any other queries? Or will it be speed=
ed up because all the matching records would be adjacent to each other -- l=
ike all at the end?

> Also, if you're parsing files into tab delimited format, you don't need t=
o write a separate parser to insert rows line by line. MySQL has LOAD DATA=
INFILE which takes delimited text files and inserts data in bulk, which is=
much faster than inserting line by line.

THANKS! Is this new? I never read about it before, but then again, from a=
bout 2005-2006 until now the system was running smoothly and that's a long =
gap with no programming work!

Hal

>
> Regards,
> Gavin Towey
>
> -----Original Message-----
> From: Hal Vaughan [mailto:hal@halblog.com]
> Sent: Monday, November 08, 2010 10:18 AM
> To: mysql@lists.mysql.com
> Subject: Running Queries When INSERTing Data?
>
> I'm redesigning some software that's been in use since 2002. I'll be wor=
king with databases that will start small and grow along the way.
>
> In the old format, data would come to us in mega-big text files that had =
to be parsed and manipulated and so on with Perl to remove crap and finally=
produce one tab delimited file. Once that file was created, another simpl=
e program would go through and use the data in each line for an INSERT stat=
ement that would put the data in a table. This table also has an Idx field=
that is an auto-incrementing primary key for the table.
>
> Each night at 3:30 am, a program would run and would go through the same =
process for each client. I never timed it, but it could take something lik=
e 30-60 seconds per client, but timing wasn't a major issue, since it had a=
LONG time from then until new data would be inserted into the DB. The SEL=
ECT statements to pull the data for each client involve a number of AND and=
OR conditions. The first one of these would create a temporary table with=
its results, then another long SELECT statement would create a 2nd tempora=
ry table by filtering the data out more. This would continue for a few tem=
porary tables until the data was filtered. Then it would be packaged up an=
d encrypted, then sent out to the client, who has a program on his computer=
to read that data and print it out if desired.
>
> This has worked, but for a number of reasons, a once-a-day data pull and =
send won't work as well with the new design. The program on the clients' c=
omputers will be able to access a special server just for them directly. (=
I know the concept of a server for each human client sounds inefficient, bu=
t it actually improves operations in a number of ways.) So each server wil=
l only have to provide data for one client.
>
> The big difference is that I'd like to make it so they can access the dat=
a live, or almost live. I don't mean all the data, but the subset that mee=
ts their needs. In other words, the equivalent of what was sent to them da=
ily in the old system. Their individual servers will still get the big tab=
-delimited file that will still be INSERTed in to their DB line by line. B=
ut I'd like to be able to select from the new data as it comes in, once it'=
s been given a new number in the Idx field.
>
> Is there any way to run a row of data through SELECT queries as it is bei=
ng INSERTed into a table -- or just after?
>
> The reason for doing this, instead of INSERTing all the data, then runnin=
g a program is that as the database grows, pulling out the data will take l=
onger and longer, so if there were a way to screen data as it comes in, tha=
t would make it easier to provide "instantly available" data.
>
> I also know my knowledge of MySQL is quite limited, so if this can be don=
e in better ways, I'd be interested in hearing about them.
>
> Thank you!
>
>
>
> Hal
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com
>
>
> This message contains confidential information and is intended only for t=
he individual named. If you are not the named addressee, you are notified =
that reviewing, disseminating, disclosing, copying or distributing this e-m=
ail is strictly prohibited. Please notify the sender immediately by e-mail=
if you have received this e-mail by mistake and delete this e-mail from yo=
ur system. E-mail transmission cannot be guaranteed to be secure or error-f=
ree as information could be intercepted, corrupted, lost, destroyed, arrive=
late or incomplete, or contain viruses. The sender therefore does not acce=
pt liability for any loss or damage caused by viruses or errors or omission=
s in the contents of this message, which arise as a result of e-mail transm=
ission. [FriendFinder Networks, Inc., 220 Humboldt Court, Sunnyvale, CA 940=
89, USA, FriendFinder.com
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com

This message contains confidential information and is intended only for the=
individual named. If you are not the named addressee, you are notified th=
at reviewing, disseminating, disclosing, copying or distributing this e-mai=
l is strictly prohibited. Please notify the sender immediately by e-mail i=
f you have received this e-mail by mistake and delete this e-mail from your=
system. E-mail transmission cannot be guaranteed to be secure or error-fre=
e as information could be intercepted, corrupted, lost, destroyed, arrive l=
ate or incomplete, or contain viruses. The sender therefore does not accept=
liability for any loss or damage caused by viruses or errors or omissions =
in the contents of this message, which arise as a result of e-mail transmis=
sion. [FriendFinder Networks, Inc., 220 Humboldt Court, Sunnyvale, CA 94089=
, USA, FriendFinder.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg