Table structure and query efficiency

Table structure and query efficiency

am 19.10.2006 10:11:54 von Jody

Hi

I've been working on a database which basically incorporates 3 tables to
describe say a widget which is either sold or leased.

I have the Widget table which stores the information related to the
widget itself.

I then have a WidgetSale table which stores only information related to
the sale of the widget (advertised price, headline, copy, date of sale
etc) if it is currently for sale.

Finally I have the WidgetLease table which stores only information on
the lease of the widget (cost of lease, headline, copy, when current
lease is up, etc) if it is available for lease/loan.

A widget may be available for sale and for lease at the same time, in
which case it will have entries in all 3 tables.

When it comes to searching, the searcher may want to include both
widgets available for sale and for lease - as such the one widget should
be returned twice - once with the details for sale and once with the
details for lease. The results are sorted as well so the two entries
may appear in well apart from each other in the result set. This was
the main reason behind separating the tables in the first place.

So basically I create a union of two SQL queries - one which pulls the
same details and one which pulls the lease details. Because the fields
in both queries need to match I select the common fields from the Widget
table first and then for the sale table I select the appropriate sale
fields, then nulls for each field which will come from the lease table.
In the lease query I do the reverse so effectively I end up with -

select w.field1,w.field2,w.field3,ws.sale1,ws.sale2,null as
'lease1',null as 'lease2'
from widgets w inner join widgetsale ws on w.widgetid=ws.widgetid
union
select w.field1,w.field2,w.field3,null as 'sale1',null as
'sale2',wl.lease1,wl.lease2
from widgets w inner join widgetlease ws on w.widgetid=wl.widgetid

Now various criteria can be specified for the widget search which means
both the queries have a where clause. They may also do something like
'widgets for sale between $x and $y or for lease between $a and $b'

I'm wondering is this the best way to achieve it and how efficient is it
assuming that potentially there could be hundreds of thousands of
widgets ?

I'd also considered using a pivot table with 3 columns - widgetid,
saleid and leaseid and then joining the widget table to the pivot table
and from there a left outer join to the sale and lease tables. So for a
widget on both sale and lease I'd end up with two rows in the pivot
table - one with a null saleid and leaseid=widgetid and the other row
with saleid=widgetid and leaseid=null so I'd end up with two records
returned

Thanks
Jody

Re: Table structure and query efficiency

am 20.10.2006 17:34:16 von Skarjune

Jody wrote:

> I'm wondering is this the best way to achieve it and how efficient is it
> assuming that potentially there could be hundreds of thousands of
> widgets ?

> I'd also considered using a pivot table with 3 columns - widgetid,
> saleid and leaseid and then joining the widget table to the pivot table...

Jody, you've walked through your scenario quite well. So, here's a
comment.

I like UNIONs, but your "pivot table" idea could provide a useful
intermediary as an alternative. As for scalable performance, that will
rely upon indexing the criteria fields properly. If the application is
mostly a read for the customers, you can index as many fields as
possible, which should give efficiency as long as you are using numeric
datatypes where applicable and avoiding the LIKE operator (BETWEEN is
OK as it's actually just a compound Boolean).

For categories with known lists, use lookup tables and relate/index
those to the widget table so that you can search categoryids with
integers rather than text fields. If you pull the ids from the user
selections, then the queries are no more complicated. Even if you want
to use text criteria, it will always run faster if you run text
conditions against the lookup table with that joined to the widget
table via ids--although you can end up with lots of JOINs to manage.

Try modeling both and see what works for you. Either way, it seems that
you might want to label the rows for Sale or Lease for the customer,
which can be accomplished by hacking a flag field such as ['Buy' AS
Availability] and ['Lease' AS Availability]

Re: Table structure and query efficiency

am 22.10.2006 02:10:53 von Jody

"Skarjune" wrote in
news:1161358456.365654.252480@b28g2000cwb.googlegroups.com:

Thanks for the advice. I already created a field called forSaleLease in
the main widget table which uses bits 0 and 1 as flags to indicate which
it is. I include this check in each of the unions so it shouldn't need
to go on with the rest of the joins. As you suggest also in the first
part of each query I select a constant as the flag so I know which
select statement generated the result (select 1 as searchType... union
select 2 as searchType).

I also have a table for the categories and just use the id in the widget
table (although I validate the category first and as they are so small I
plan on caching them in my application code and then just looking them
up in code from the index returned). I usually try to avoid any
redundant fields and use lookup tables with id's instead of literal
values where possible.

I have always preferred to do it that way, althugh you end up with often
complicated joins (nested queries, dericed tables, etc), or at least
many tables involved. I wasn't sure whether as databases grew it
sometimes became neccessary to implement some redundancy or storing
literial values in main tables to avoid overhead of additional joins.

I'd already implemented the system using the unions so I guess will
leave it for now and try to find some time later to test the
alternative. As the data is only small at the moment the basic tests I
have done thus far have been inconclusive.

I find I spend more time on the initial design and stressing over the
'correct' or 'most efficient' design than I do on coding the thing! I
guess until data reaches a certain point it is hard to evaluate any
performance penalties and adding indexes or additional restructing is
always an option.

Thanks again.
Jody


> Jody wrote:
>
>> I'm wondering is this the best way to achieve it and how efficient is
>> it assuming that potentially there could be hundreds of thousands of
>> widgets ?
>
>> I'd also considered using a pivot table with 3 columns - widgetid,
>> saleid and leaseid and then joining the widget table to the pivot
>> table...
>
> Jody, you've walked through your scenario quite well. So, here's a
> comment.
>
> I like UNIONs, but your "pivot table" idea could provide a useful
> intermediary as an alternative. As for scalable performance, that will
> rely upon indexing the criteria fields properly. If the application is
> mostly a read for the customers, you can index as many fields as
> possible, which should give efficiency as long as you are using
> numeric datatypes where applicable and avoiding the LIKE operator
> (BETWEEN is OK as it's actually just a compound Boolean).
>
> For categories with known lists, use lookup tables and relate/index
> those to the widget table so that you can search categoryids with
> integers rather than text fields. If you pull the ids from the user
> selections, then the queries are no more complicated. Even if you want
> to use text criteria, it will always run faster if you run text
> conditions against the lookup table with that joined to the widget
> table via ids--although you can end up with lots of JOINs to manage.
>
> Try modeling both and see what works for you. Either way, it seems
> that you might want to label the rows for Sale or Lease for the
> customer, which can be accomplished by hacking a flag field such as
> ['Buy' AS Availability] and ['Lease' AS Availability]
>
>