joining two tables for a search engine
am 23.04.2006 05:24:47 von none
I am trying to optimize a search engine.
let's say I have 3 tables. one has titles, one has words, and an index
where each title is broken into words (title_id and word_id)
I want to be able to search terms in any order, so "potter harry" is
the same as "harry potter".
select t.name from titles t, words w, index i where (w.name in
('harry", "potter"))
and (w.word_id = i.id) and (i.title_id = t.id);
This produces a list of titles matching my search times. however, with
over 3 million products, it can be quite slow. 3 seconds is too slow.
sometimes it takes a minute.
is there a better way to do a join when there are more search terms
like "harry potter and the chamber of secrets"?
the fastest way I found was to get the word count for each term, join
on the least used word, and then make sure the remaining terms are in
the titles. but sometimes it returns 10,000+ titles.
what is the best way to do this?
I searched for information on join types but it is way too confusing
for me and does not explain it in a way I can understand.
I am willing to pay via paypal $20 if someone can explain it to me over
the phone in detail.
Re: joining two tables for a search engine
am 24.04.2006 04:58:17 von Michael Austin
none wrote:
> I am trying to optimize a search engine.
>
>
> let's say I have 3 tables. one has titles, one has words, and an index
> where each title is broken into words (title_id and word_id)
>
> I want to be able to search terms in any order, so "potter harry" is
> the same as "harry potter".
>
> select t.name from titles t, words w, index i where (w.name in
> ('harry", "potter"))
> and (w.word_id = i.id) and (i.title_id = t.id);
>
> This produces a list of titles matching my search times. however, with
> over 3 million products, it can be quite slow. 3 seconds is too slow.
> sometimes it takes a minute.
>
> is there a better way to do a join when there are more search terms
> like "harry potter and the chamber of secrets"?
>
> the fastest way I found was to get the word count for each term, join
> on the least used word, and then make sure the remaining terms are in
> the titles. but sometimes it returns 10,000+ titles.
>
> what is the best way to do this?
>
> I searched for information on join types but it is way too confusing
> for me and does not explain it in a way I can understand.
>
> I am willing to pay via paypal $20 if someone can explain it to me over
> the phone in detail.
>
what indexes do you have on the 3 tables? any?
make sure the
create index words_I on words (words,word_id)
hopefully the ids are indexed on the other 2 tables as well.
you could also restructure the query to get:
select c.name from (
select t.name from titles t, words w, index i where (w.name in
('harry', 'potter'))
and (w.word_id = i.id) and (i.title_id = t.id)
) c where strtoupper(t.name) like strtoupper('%harry%potter%');