Working with large datasets
Working with large datasets
am 10.10.2011 17:30:22 von Jason Pruim
Hey everyone,
I am working with a database that has close to 8 million records in it =
and it will be growing. I have a state field in the data, and I am =
attempting to test some query's on it, all but 2 records right now have =
the same state.
My test info won't get pulled up... I believe it keeps timing out the =
connection.
Is there any advice for working with large datasets? I'm wanting this to =
be able to load quickly.
Thanks in advance!=20
Jason Pruim
lists@pruimphotography.com
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
RE: Working with large datasets
am 10.10.2011 18:22:19 von Toby Hart Dyke
It sounds as though you don't have an index on the right field. 8 million
records should be no problem if you have the right indexes applied, and
you're not trying to do anything too complicated.
Toby
-----Original Message-----
From: Jason Pruim [mailto:lists@pruimphotography.com]
Sent: Monday, October 10, 2011 11:30 AM
To: php-db@lists.php.net
Subject: [PHP-DB] Working with large datasets
Hey everyone,
I am working with a database that has close to 8 million records in it and
it will be growing. I have a state field in the data, and I am attempting to
test some query's on it, all but 2 records right now have the same state.
My test info won't get pulled up... I believe it keeps timing out the
connection.
Is there any advice for working with large datasets? I'm wanting this to be
able to load quickly.
Thanks in advance!
Jason Pruim
lists@pruimphotography.com
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 20:20:23 von Jim Giner
""Toby Hart Dyke"" wrote in message
news:00da01cc8768$ca9e9200$5fdbb600$@hartdyke.com...
>
> It sounds as though you don't have an index on the right field. 8 million
> records should be no problem if you have the right indexes applied, and
> you're not trying to do anything too complicated.
>
> Toby
I doubt that the State field is a primary index, or that it would be used as
one, which means that it could be a secondary one. If it is - that would be
a pretty long record itself and could be the problem therein. With
virtually all the records tied to one secondary key it is mostly a worthless
secondary index. I'd try removing it and seeing what happens.
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 20:37:25 von Jason Pruim
Jason Pruim
pruimj@gmail.com
On Oct 10, 2011, at 2:20 PM, Jim Giner wrote:
>=20
> ""Toby Hart Dyke"" wrote in message=20
> news:00da01cc8768$ca9e9200$5fdbb600$@hartdyke.com...
>>=20
>> It sounds as though you don't have an index on the right field. 8 =
million
>> records should be no problem if you have the right indexes applied, =
and
>> you're not trying to do anything too complicated.
>>=20
>> Toby
>=20
> I doubt that the State field is a primary index, or that it would be =
used as=20
> one, which means that it could be a secondary one. If it is - that =
would be=20
> a pretty long record itself and could be the problem therein. With=20
> virtually all the records tied to one secondary key it is mostly a =
worthless=20
> secondary index. I'd try removing it and seeing what happens.=20
Actually it will be in the end if they keep going with the site... It's =
a "Report who called you and why" type of site. So for SEO purposes I'm =
changing the links from: index.php?phone=3DXXXXXXXXX&state=3DNY to: =
/new-york/XXXXXXXXXX=20
RIght now though I only have 1 state inputed to work with though. I may =
need to just increase the max execution time as well... But it still =
runs too slowly ... Even from the commandline searching for a simple: =
SELECT * from Table WHERE state=3D"test"; takes 56.96 seconds to search =
and returns only 2 records with 4 columns... Could this just be a =
hardware problem?
Here is the structure of the table Im working with:
+--------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+----------------+
| ID | int(11)| NO | PRI | NULL | auto_increment |
| phone| text | NO | MUL | NULL | |
| config | text | NO | | NULL | |
| state | text | YES | | NULL | |
+--------+---------+------+-----+---------+----------------+
I'm starting to lean more towards it being a problem with hardware =
though...
I'm going to try and get the specs of the machine it's running on... =
(Not my host :))
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 20:42:37 von Jim Giner
I don't do command line stuff so I may not be right in my thinking. If you
are running a php query from a client, does the query get executed on the
database server, or does all the data have to come down to you to be
queried?
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 20:48:06 von Jason Pruim
Jason Pruim
pruimj@gmail.com
On Oct 10, 2011, at 2:42 PM, Jim Giner wrote:
> I don't do command line stuff so I may not be right in my thinking. =
If you=20
> are running a php query from a client, does the query get executed on =
the=20
> database server, or does all the data have to come down to you to be=20=
> queried?
When you do it from the commandline it gets executed directly on the =
machine. When you do it from PHP (Or something else) It gets executed =
first on the server and then pipped down to you.
Basically commandline should be faster by a long shot since it's being =
executed directly.
But when you run from a web browser, you have to throw in the overhead =
for apache, or what ever web server you're using.
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 20:52:28 von Jim Giner
When you say machine, do you mean the client that you're sitting at, or the
machine that hosts the data?
As for doing it thru a web server, the amount of time Apache, et al, would
consume is miniscule. The web interface would not be involved in the
reading of the data or the processing, just the output-ing of the results.
:)
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
RE: Working with large datasets
am 10.10.2011 23:02:03 von Toby Hart Dyke
You have a field in your WHERE clause that isn't indexed - you need an
index. Try something like this:
ALTER TABLE `Database`.`Table` ADD INDEX `state`(`state`);
Think about it - you're asking for the rows that have a certain value in
the 'state' field. If you don't provide the database with an index, it has
to do a full table scan to retrieve the results.
Toby
-----Original Message-----
From: Jason Pruim [mailto:pruimj@gmail.com]
Sent: Monday, October 10, 2011 2:37 PM
To: Jim Giner
Cc: php-db@lists.php.net
Subject: Re: [PHP-DB] Working with large datasets
RIght now though I only have 1 state inputed to work with though. I may need
to just increase the max execution time as well... But it still runs too
slowly ... Even from the commandline searching for a simple: SELECT * from
Table WHERE state="test"; takes 56.96 seconds to search and returns only 2
records with 4 columns... Could this just be a hardware problem?
Here is the structure of the table Im working with:
+--------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+----------------+
| ID | int(11)| NO | PRI | NULL | auto_increment |
| phone| text | NO | MUL | NULL | |
| config | text | NO | | NULL | |
| state | text | YES | | NULL | |
+--------+---------+------+-----+---------+----------------+
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 10.10.2011 23:19:28 von Phpster
On 2011-10-10, at 11:30 AM, Jason Pruim wrote:
> Hey everyone,
>=20
>=20
> I am working with a database that has close to 8 million records in it and=
it will be growing. I have a state field in the data, and I am attempting t=
o test some query's on it, all but 2 records right now have the same state.
>=20
> My test info won't get pulled up... I believe it keeps timing out the conn=
ection.
>=20
> Is there any advice for working with large datasets? I'm wanting this to b=
e able to load quickly.
>=20
> Thanks in advance!=20
>=20
>=20
> Jason Pruim
> lists@pruimphotography.com
>=20
>=20
>=20
>=20
> --=20
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>=20
Assuming mysql, what is the my.conf set for? Check that you are using the la=
rge dataset one. By default it's usually a small one. That will give you mor=
e memory and sort spaces work with the data.=20
We routinely handle 8-10mm records and it's not tough. The tricks are
1: ensure enough sort space
2: ensure enough memory for large sets
3: ensure about php memory for results
4: try to add additional filters to reduce the data sets. A cardinality of t=
wo on a status will always return tons of records and you want to reduce tha=
t, maybe with a date range=20
Bastien Koert
905-904-0334=
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
RE: Working with large datasets
am 10.10.2011 23:27:58 von JimiT
I really think that you should try running it from the command line and see=
what the issues are. Get both Apache and php out of the way. I've seen s=
ome PHP scripts use up all the file handles (OS limit) even on a 64 bit ser=
ver when they start doing complex things with data sets. =0A=
=0A=
If it works ok without PHP/Apache then you can start lookig at PHP and APac=
he. =0A=
=0A=
ISOLATE the issue not complicate it....=0A=
=0A=
My 2 cents,=0A=
=0A=
Jimi=0A=
________________________________________=0A=
From: Bastien [phpster@gmail.com]=0A=
Sent: Monday, October 10, 2011 4:19 PM=0A=
To: Jason Pruim=0A=
Cc: php-db@lists.php.net=0A=
Subject: Re: [PHP-DB] Working with large datasets=0A=
=0A=
On 2011-10-10, at 11:30 AM, Jason Pruim wrote:=
=0A=
=0A=
> Hey everyone,=0A=
>=0A=
>=0A=
> I am working with a database that has close to 8 million records in it an=
d it will be growing. I have a state field in the data, and I am attempting=
to test some query's on it, all but 2 records right now have the same stat=
e.=0A=
>=0A=
> My test info won't get pulled up... I believe it keeps timing out the con=
nection.=0A=
>=0A=
> Is there any advice for working with large datasets? I'm wanting this to =
be able to load quickly.=0A=
>=0A=
> Thanks in advance!=0A=
>=0A=
>=0A=
> Jason Pruim=0A=
> lists@pruimphotography.com=0A=
>=0A=
>=0A=
>=0A=
>=0A=
> --=0A=
> PHP Database Mailing List (http://www.php.net/)=0A=
> To unsubscribe, visit: http://www.php.net/unsub.php=0A=
>=0A=
=0A=
Assuming mysql, what is the my.conf set for? Check that you are using the l=
arge dataset one. By default it's usually a small one. That will give you m=
ore memory and sort spaces work with the data.=0A=
=0A=
We routinely handle 8-10mm records and it's not tough. The tricks are=0A=
=0A=
1: ensure enough sort space=0A=
2: ensure enough memory for large sets=0A=
3: ensure about php memory for results=0A=
4: try to add additional filters to reduce the data sets. A cardinality of =
two on a status will always return tons of records and you want to reduce t=
hat, maybe with a date range=0A=
=0A=
=0A=
=0A=
=0A=
Bastien Koert=0A=
905-904-0334=0A=
--=0A=
PHP Database Mailing List (http://www.php.net/)=0A=
To unsubscribe, visit: http://www.php.net/unsub.php=0A=
=0A=
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 11.10.2011 00:05:40 von Karl DeSaulniers
On Oct 10, 2011, at 4:27 PM, Thompson, Jimi wrote:
> I really think that you should try running it from the command line
> and see what the issues are. Get both Apache and php out of the
> way. I've seen some PHP scripts use up all the file handles (OS
> limit) even on a 64 bit server when they start doing complex things
> with data sets.
>
> If it works ok without PHP/Apache then you can start lookig at PHP
> and APache.
>
> ISOLATE the issue not complicate it....
>
> My 2 cents,
>
> Jimi
> ________________________________________
> From: Bastien [phpster@gmail.com]
> Sent: Monday, October 10, 2011 4:19 PM
> To: Jason Pruim
> Cc: php-db@lists.php.net
> Subject: Re: [PHP-DB] Working with large datasets
>
> On 2011-10-10, at 11:30 AM, Jason Pruim
> wrote:
>
>> Hey everyone,
>>
>>
>> I am working with a database that has close to 8 million records in
>> it and it will be growing. I have a state field in the data, and I
>> am attempting to test some query's on it, all but 2 records right
>> now have the same state.
>>
>> My test info won't get pulled up... I believe it keeps timing out
>> the connection.
>>
>> Is there any advice for working with large datasets? I'm wanting
>> this to be able to load quickly.
>>
>> Thanks in advance!
>>
>>
>> Jason Pruim
>> lists@pruimphotography.com
>>
>>
>>
>>
>> --
>> PHP Database Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>
> Assuming mysql, what is the my.conf set for? Check that you are
> using the large dataset one. By default it's usually a small one.
> That will give you more memory and sort spaces work with the data.
>
> We routinely handle 8-10mm records and it's not tough. The tricks are
>
> 1: ensure enough sort space
> 2: ensure enough memory for large sets
> 3: ensure about php memory for results
> 4: try to add additional filters to reduce the data sets. A
> cardinality of two on a status will always return tons of records
> and you want to reduce that, maybe with a date range
>
>
>
>
> Bastien Koert
> 905-904-0334
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
Hi Jason,
I'd start with your max execution and max memory first
and go from there before you change much on your server.
maybe try a .httaccess file in the directory of the php file
that makes the call to allow only that file the extra memory and
execution?
(I believe I am correct in my thinking that you can specify max memory
and max execution per directory or even per file with an .httaccess,
please correct me if I am wrong)
I think that and the indexing of your database would help.
After that I would go with retrieving sets of info at a time instead
of retrieving the whole database at once.
Or like Bastien stated, narrowing it to certain date ranges.
Then you can create a pagination if your displaying results.
HTH,
Karl DeSaulniers
Design Drumm
http://designdrumm.com
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 11.10.2011 01:35:21 von David Robley
Jason Pruim wrote:
> Hey everyone,
>
>
> I am working with a database that has close to 8 million records in it and
> it will be growing. I have a state field in the data, and I am attempting
> to test some query's on it, all but 2 records right now have the same
> state.
>
> My test info won't get pulled up... I believe it keeps timing out the
> connection.
>
> Is there any advice for working with large datasets? I'm wanting this to
> be able to load quickly.
>
> Thanks in advance!
>
>
> Jason Pruim
> lists@pruimphotography.com
Nobody else has mentioned it yet, so I'll throw EXPLAIN into the mix.
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
Amongst other things "With the help of EXPLAIN, you can see where you should
add indexes to tables so that the statement executes faster by using
indexes to find rows."
Cheers
--
David Robley
"I've been to San Francisco" said Tom heartlessly.
Today is Prickle-Prickle, the 65th day of Bureaucracy in the YOLD 3177.
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 11.10.2011 01:39:41 von Jason Pruim
Jason Pruim
lists@pruimphotography.com
On Oct 10, 2011, at 5:27 PM, Thompson, Jimi wrote:
> I really think that you should try running it from the command line =
and see what the issues are. Get both Apache and php out of the way. =
I've seen some PHP scripts use up all the file handles (OS limit) even =
on a 64 bit server when they start doing complex things with data sets. =20=
>=20
> If it works ok without PHP/Apache then you can start lookig at PHP and =
APache. =20
>=20
> ISOLATE the issue not complicate it....
>=20
> My 2 cents,
Hi Jimi,
I've done it from the command line a few times, first time I run a =
simple: SELECT * FROM Main WHERE state=3D"test"; it takes: 2 rows in =
set (1 min 44.20 sec)
after that initial run it's pretty responsive, usually around 0.01 =
seconds.=20
If I do a select based on the new-york state, which is all of my almost =
9 million records after the initial run, it returns it all VERY quickly: =
25000 rows in set (0.02 sec)
So commandline is running fine after the initial... I'm leaning towards =
either a problem with web server, PHP setup, or my PHP code... I've used =
the code many times before but never on such a large dataset...
When I pull the pagination out completely.... It's pretty much the same =
result... fetching the info for new-york works just fine but not "test"
One thing I am noticing right now though is the fact that when I switch =
over to using the test state.... Right now it's not displaying =
anything... Not even able to view the source...
Okay... Enough rambling right now... Need to do some more checking =
before I can come to a conclusion :)
=20=
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 11.10.2011 06:09:08 von Jack van Zanen
--90e6ba6e836cb50a8b04aefe11e8
Content-Type: text/plain; charset=ISO-8859-1
Hi
You need to index the right fields. even on a laptop a select from 8 million
rows with two rows returned should take a few seconds max only.
The first time you run the query the data has to come from disk, second time
you run same query you'd expect that data to sit in cache and be very quick.
Jack van Zanen
-------------------------
This e-mail and any attachments may contain confidential material for the
sole use of the intended recipient. If you are not the intended recipient,
please be aware that any disclosure, copying, distribution or use of this
e-mail or any attachment is prohibited. If you have received this e-mail in
error, please contact the sender and delete all copies.
Thank you for your cooperation
On Tue, Oct 11, 2011 at 10:39 AM, Jason Pruim wrote:
>
> Jason Pruim
> lists@pruimphotography.com
>
>
>
> On Oct 10, 2011, at 5:27 PM, Thompson, Jimi wrote:
>
> > I really think that you should try running it from the command line and
> see what the issues are. Get both Apache and php out of the way. I've seen
> some PHP scripts use up all the file handles (OS limit) even on a 64 bit
> server when they start doing complex things with data sets.
> >
> > If it works ok without PHP/Apache then you can start lookig at PHP and
> APache.
> >
> > ISOLATE the issue not complicate it....
> >
> > My 2 cents,
>
> Hi Jimi,
>
> I've done it from the command line a few times, first time I run a simple:
> SELECT * FROM Main WHERE state="test"; it takes: 2 rows in set (1 min 44.20
> sec)
> after that initial run it's pretty responsive, usually around 0.01
> seconds.
>
> If I do a select based on the new-york state, which is all of my almost 9
> million records after the initial run, it returns it all VERY quickly:
> 25000 rows in set (0.02 sec)
>
> So commandline is running fine after the initial... I'm leaning towards
> either a problem with web server, PHP setup, or my PHP code... I've used the
> code many times before but never on such a large dataset...
>
> When I pull the pagination out completely.... It's pretty much the same
> result... fetching the info for new-york works just fine but not "test"
>
> One thing I am noticing right now though is the fact that when I switch
> over to using the test state.... Right now it's not displaying anything...
> Not even able to view the source...
>
> Okay... Enough rambling right now... Need to do some more checking before I
> can come to a conclusion :)
>
>
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--90e6ba6e836cb50a8b04aefe11e8--
Re: Working with large datasets
am 11.10.2011 15:00:41 von Jim Giner
Exactly. That was my first guess - that his commandline request is first
having to download 8M records which can take a long time. The OP's fear of
"overhead from apache..." is not only unfounded, but would most definitely
improve his response by simply running the query on the server and avoid the
download.
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Working with large datasets
am 11.10.2011 17:14:34 von user080701
--------------------------------------------------
From: "Jack van Zanen"
Sent: Tuesday, October 11, 2011 12:09 PM
To: "Jason Pruim"
Cc: "Thompson, Jimi" ; "Bastien"
;
Subject: Re: [PHP-DB] Working with large datasets
> Hi
>
>
> You need to index the right fields. even on a laptop a select from 8
> million
> rows with two rows returned should take a few seconds max only.
> The first time you run the query the data has to come from disk, second
> time
> you run same query you'd expect that data to sit in cache and be very
> quick.
>
>
> Jack van Zanen
>
> -------------------------
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient. If you are not the intended recipient,
> please be aware that any disclosure, copying, distribution or use of this
> e-mail or any attachment is prohibited. If you have received this e-mail
> in
> error, please contact the sender and delete all copies.
> Thank you for your cooperation
>
>
> On Tue, Oct 11, 2011 at 10:39 AM, Jason Pruim
> wrote:
>
>>
>> Jason Pruim
>> lists@pruimphotography.com
>>
>>
>>
>> On Oct 10, 2011, at 5:27 PM, Thompson, Jimi wrote:
>>
>> > I really think that you should try running it from the command line and
>> see what the issues are. Get both Apache and php out of the way. I've
>> seen
>> some PHP scripts use up all the file handles (OS limit) even on a 64 bit
>> server when they start doing complex things with data sets.
>> >
>> > If it works ok without PHP/Apache then you can start lookig at PHP and
>> APache.
>> >
>> > ISOLATE the issue not complicate it....
>> >
>> > My 2 cents,
>>
>> Hi Jimi,
>>
>> I've done it from the command line a few times, first time I run a
>> simple:
>> SELECT * FROM Main WHERE state="test"; it takes: 2 rows in set (1 min
>> 44.20
>> sec)
>> after that initial run it's pretty responsive, usually around 0.01
>> seconds.
>>
>> If I do a select based on the new-york state, which is all of my almost 9
>> million records after the initial run, it returns it all VERY quickly:
>> 25000 rows in set (0.02 sec)
>>
>> So commandline is running fine after the initial... I'm leaning towards
>> either a problem with web server, PHP setup, or my PHP code... I've used
>> the
>> code many times before but never on such a large dataset...
>>
>> When I pull the pagination out completely.... It's pretty much the same
>> result... fetching the info for new-york works just fine but not "test"
>>
>> One thing I am noticing right now though is the fact that when I switch
>> over to using the test state.... Right now it's not displaying
>> anything...
>> Not even able to view the source...
>>
>> Okay... Enough rambling right now... Need to do some more checking before
>> I
>> can come to a conclusion :)
>>
>>
>>
>> --
>> PHP Database Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 6532 (20111010) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
__________ Information from ESET NOD32 Antivirus, version of virus signature database 6532 (20111010) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php