Table advice.

Table advice.

am 01.08.2009 16:29:01 von Paul Halliday

I have a database that I am (will) be using to track URL's. The table
structure looks like this:

CREATE TABLE event
(
eid INT UNSIGNED NOT NULL AUTO_INCREMENT,
timestamp INT(10) UNSIGNED NOT NULL DEFAULT 0,
ip INT(10) UNSIGNED NOT NULL DEFAULT 0,
fqdn VARCHAR(255),
domain VARCHAR(63),
tld VARCHAR(63),
action VARCHAR(4),
request TEXT,
referrer TEXT,
client VARCHAR(255),
INDEX eid (eid),
INDEX timestamp (timestamp),
INDEX ip (ip),
INDEX fqdn (fqdn),
INDEX domain (domain),
INDEX tld (tld)
);

The is no real logic behind the indexes, the table was hobbled
together looking at examples. Currently I am trying queries on about
300 million records and the results are pretty crappy. for example, a
query like this:

select domain,count(domain) as count from event where domain like
'%facebook%' group by domain order by count desc;

takes about 5 minutes to complete.

Most of the queries will be like that above but probably with
additional filters like date constraints or IP constraints or a
mixture of both. I can also see searches through the requests for
filetypes etc.

Any suggestions or comments would be appreciated.

Thanks.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Table advice.

am 01.08.2009 20:10:10 von Alexander Kolesen

Hello.
Your query performs a full table scan, because if you match text with '%...' wildcard, MySQL can't using index. Try to use external full-text
search engines like Sphinx (http://www.sphinxsearch.com/) or Lucene (http://lucene.apache.org).
> I have a database that I am (will) be using to track URL's. The table
> structure looks like this:
>
> CREATE TABLE event
> (
> eid INT UNSIGNED NOT NULL AUTO_INCREMENT,
> timestamp INT(10) UNSIGNED NOT NULL DEFAULT 0,
> ip INT(10) UNSIGNED NOT NULL DEFAULT 0,
> fqdn VARCHAR(255),
> domain VARCHAR(63),
> tld VARCHAR(63),
> action VARCHAR(4),
> request TEXT,
> referrer TEXT,
> client VARCHAR(255),
> INDEX eid (eid),
> INDEX timestamp (timestamp),
> INDEX ip (ip),
> INDEX fqdn (fqdn),
> INDEX domain (domain),
> INDEX tld (tld)
> );
>
> The is no real logic behind the indexes, the table was hobbled
> together looking at examples. Currently I am trying queries on about
> 300 million records and the results are pretty crappy. for example, a
> query like this:
>
> select domain,count(domain) as count from event where domain like
> '%facebook%' group by domain order by count desc;
>
> takes about 5 minutes to complete.
>
> Most of the queries will be like that above but probably with
> additional filters like date constraints or IP constraints or a
> mixture of both. I can also see searches through the requests for
> filetypes etc.
>
> Any suggestions or comments would be appreciated.
>
> Thanks.
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=kolesen.a@gmail.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Table advice.

am 03.08.2009 20:05:22 von Gavin Towey

Another trick is to reverse the domain when you store it. For example ins=
tead of storing www.facebook.com, store it as com.facebook.www. That way y=
ou write a query like:

WHERE domain LIKE 'com.facebook%'

This will use the index, since you're not using a wildcard at the beginning=
of the string, and the results will include all subdomains for the given d=
omain you're looking for.

Regards,
Gavin Towey

-----Original Message-----
From: Alexander Kolesen [mailto:kolesen_a@mail.by]
Sent: Saturday, August 01, 2009 11:10 AM
To: mysql@lists.mysql.com
Subject: Re: Table advice.

Hello.
Your query performs a full table scan, because if you match text with '%...=
' wildcard, MySQL can't using index. Try to use external full-text
search engines like Sphinx (http://www.sphinxsearch.com/) or Lucene (http:/=
/lucene.apache.org).
> I have a database that I am (will) be using to track URL's. The table
> structure looks like this:
>
> CREATE TABLE event
> (
> eid INT UNSIGNED NOT NULL AUTO_INCREMENT,
> timestamp INT(10) UNSIGNED NOT NULL DEFAULT 0,
> ip INT(10) UNSIGNED NOT NULL DEFAULT 0,
> fqdn VARCHAR(255),
> domain VARCHAR(63),
> tld VARCHAR(63),
> action VARCHAR(4),
> request TEXT,
> referrer TEXT,
> client VARCHAR(255),
> INDEX eid (eid),
> INDEX timestamp (timestamp),
> INDEX ip (ip),
> INDEX fqdn (fqdn),
> INDEX domain (domain),
> INDEX tld (tld)
> );
>
> The is no real logic behind the indexes, the table was hobbled
> together looking at examples. Currently I am trying queries on about
> 300 million records and the results are pretty crappy. for example, a
> query like this:
>
> select domain,count(domain) as count from event where domain like
> '%facebook%' group by domain order by count desc;
>
> takes about 5 minutes to complete.
>
> Most of the queries will be like that above but probably with
> additional filters like date constraints or IP constraints or a
> mixture of both. I can also see searches through the requests for
> filetypes etc.
>
> Any suggestions or comments would be appreciated.
>
> Thanks.
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dkolesen.a@gmail.c=
om

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com


The information contained in this transmission may contain privileged and c=
onfidential information. It is intended only for the use of the person(s) n=
amed above. If you are not the intended recipient, you are hereby notified =
that any review, dissemination, distribution or duplication of this communi=
cation is strictly prohibited. If you are not the intended recipient, pleas=
e contact the sender by reply email and destroy all copies of the original =
message.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg