Selecting SUM and COUNT, but not straightforward..
Selecting SUM and COUNT, but not straightforward..
am 09.01.2008 12:04:42 von KingofGing
Hi All,
I'm having a problem writing an SQL statement that I can't quite wrap
my head around.
First, the background:
I have a journal subscription system including 3 tables,
tblSubscription, tblTransaction and tblIssue, detailed below.
tblSubscription:
CREATE TABLE [dbo].[tblSubscription](
[SubscriptionID] [int] NOT NULL,
[SubscriberID] [int] NOT NULL,
[Status] [int] NOT NULL,
[JournalID] [int] NOT NULL,
[Created] [datetime] NOT NULL,
CONSTRAINT [PK_tblSubscription] PRIMARY KEY CLUSTERED
(
[SubscriptionID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
tblTransaction:
CREATE TABLE [dbo].[tblTransaction](
[TransactionID] [bigint] NOT NULL,
[SubscriptionID] [int] NOT NULL,
[Copies] [int] NOT NULL,
[IssueStart] [int] NOT NULL,
[IssueEnd] [int] NOT NULL,
[LastUpdated] [datetime] NOT NULL,
CONSTRAINT [PK_tblTransaction] PRIMARY KEY CLUSTERED
(
[TransactionID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[tblTransaction] WITH NOCHECK ADD CONSTRAINT
[FK_tblTransaction_tblSubscription] FOREIGN KEY([SubscriptionID])
REFERENCES [dbo].[tblSubscription] ([SubscriptionID])
GO
ALTER TABLE [dbo].[tblTransaction] CHECK CONSTRAINT
[FK_tblTransaction_tblSubscription]
GO
tblIssue
CREATE TABLE [dbo].[tblIssue](
[IssueID] [int] NOT NULL,
[JournalID] [int] NOT NULL,
[JournalSequence] [int] NOT NULL,
[Status] [int] NOT NULL,
[DispatchDate] [datetime] NULL,
CONSTRAINT [PK_tblIssue] PRIMARY KEY CLUSTERED
(
[IssueID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[tblIssue] WITH NOCHECK ADD CONSTRAINT
[FK_tblIssue_tblJournal] FOREIGN KEY([JournalID])
REFERENCES [dbo].[tblJournal] ([JournalID])
GO
ALTER TABLE [dbo].[tblIssue] CHECK CONSTRAINT [FK_tblIssue_tblJournal]
A subscription is to one individual journal and consists of one or
more transactions, and a transaction covers a period of time, say a
year.
If there are 6 issues of this journal per year then a 2 year
subscription might consist of 2 transactions for 1 year each, so for
example
Year 1: Issue13 - Issue18
Year 2: Issue19 - Issue24
However it is possible for a subscription to pause, or lapse, for a
period of time between two transactions and miss some issues, for
example
Year 1: Issue11 - Issue16
Year 2: Issue19 - Issue24
tblIssue is not linked to tblTransaction by any foreign keys, and
issues are referenced by JournalSequence number not IssueID, i.e. for
Year 2 in the second example above, tblTransaction.IssueStart contains
'19' and tblTransaction.IssueEnd contains '24'. Issues are not added
to tblIssue until they are current, so the Issue in tblIssue with the
highest JournalSequence number is the current one (i.e. SELECT
MAX(JournalSequence) FROM tblIssue will select the current issue)
Journal ID is an integer and will be passed into the SQL statement as
a parameter, i.e. @JournalID = 1013
What I need is to be able to determine the number of subscriptions
(and also the total number of copies for those subscriptions) that are
returning with the current issue (e.g. Issue19 in the examples above)
after a lapsed period (the second example), EXCLUDING any that haven't
lapsed, i.e. that have continued straight on (the first example) for
any particular journal.
I currently have (this returns no results, although there should be
some):
(Apologies for the tabs, they appear to have gone a bit crazy)
SELECT
COUNT(tblSubscription.SubscriptionID) AS NoSubs,
SUM(tblTransaction.Copies) AS NoCopies
FROM
tblSubscription INNER JOIN tblTransaction ON
tblSubscription.SubscriptionID = tblTransaction.SubscriptionID
WHERE
(tblSubscription.JournalID = @JournalID) AND
(tblTransaction.IssueStart =
(SELECT
MAX(JournalSequence) AS Expr1
FROM
tblIssue AS tblIssue_1
WHERE
(JournalID = @JournalID))) AND
(tblTransaction.TransactionTypeID = 11) AND
((SELECT
MAX(Transactions.IssueStart) AS RestartIssue
FROM
tblSubscription AS Subscriptions INNER JOIN tblTransaction AS
Transactions ON Subscriptions.SubscriptionID =
Transactions.SubscriptionID
WHERE
(Subscriptions.JournalID = @JournalID) AND
(Subscriptions.SubscriptionID = tblSubscription.SubscriptionID)) >
1 +
(SELECT
MIN(IssueEnd) AS ExpiredIssue
FROM
SELECT
TOP (2) IssueEnd
FROM
(SELECT
Transactions.IssueEnd
FROM
tblSubscription AS Subscriptions INNER JOIN tblTransaction AS
Transactions ON Subscriptions.SubscriptionID =
Transactions.SubscriptionID
WHERE
(Subscriptions.JournalID = @JournalID) AND
(Subscriptions.SubscriptionID =
tblSubscription.SubscriptionID))
AS derivedtbl_2
- AS derivedtbl_1))
Re: Selecting SUM and COUNT, but not straightforward..
am 09.01.2008 12:13:40 von KingofGing
I should clarify, I'm looking for any subscriptions that have a break
of at least 1 issue before returning with the current transaction,
i.e. in the examples above, any subscription whose last transaction
ended on or before issue 17 and are coming back with a new transaction
starting with the current issue of 19.
Re: Selecting SUM and COUNT, but not straightforward..
am 09.01.2008 12:53:09 von KingofGing
On Jan 9, 11:13 am, KingofGing wrote:
> I should clarify, I'm looking for any subscriptions that have a break
> of at least 1 issue before returning with the current transaction,
> i.e. in the examples above, any subscription whose last transaction
> ended on or before issue 17 and are coming back with a new transaction
> starting with the current issue of 19.
OK, rather surprisingly I think I've got it.
It's all the same except the final couple of select statements - I
took one out and re-jigged the other a bit:
SELECT
COUNT(tblSubscription.SubscriptionID) AS NoSubs,
SUM(tblTransaction.Copies) AS NoCopies
FROM
tblSubscription INNER JOIN tblTransaction ON
tblSubscription.SubscriptionID = tblTransaction.SubscriptionID
WHERE
(tblSubscription.JournalID = @JournalID) AND
(tblTransaction.IssueStart =
(SELECT
MAX(JournalSequence) AS Expr1
FROM
tblIssue AS tblIssue_1
WHERE
(JournalID = @JournalID))) AND
(tblTransaction.TransactionTypeID = 11) AND
((SELECT
MAX(Transactions.IssueStart) AS RestartIssue
FROM
tblSubscription AS Subscriptions INNER JOIN tblTransaction AS
Transactions ON Subscriptions.SubscriptionID =
Transactions.SubscriptionID
WHERE
(Subscriptions.JournalID = @JournalID) AND
(Subscriptions.SubscriptionID = tblSubscription.SubscriptionID)) >
1 +
(SELECT
MIN(IssueEnd) AS ExpiredIssue
FROM
(SELECT
TOP (2) IssueEnd
FROM
tblSubscription AS Subscriptions INNER JOIN tblTransaction AS
Transactions ON Subscriptions.SubscriptionID =
Transactions.SubscriptionID
WHERE
(Subscriptions.JournalID = @JournalID) AND
(Subscriptions.SubscriptionID = tblSubscription.SubscriptionID)
ORDER BY IssueEnd DESC)
AS derivedtbl_1))
There might be a more simple solution - I'd be happy to hear it if
anyone comes up with one.
Other than that, thanks for your time!
Re: Selecting SUM and COUNT, but not straightforward..
am 10.01.2008 18:47:06 von Joe Celko
I'm having a problem writing an SQL statement that I can't quite wrap
my head around. Your design is a mess. Journals have a standard
CHAR(8) industry identifier called the ISSN; this is basic research
that you failed to do. You have vague data element names (status of
what? Marriage? shipment?) and keep audit trail data in the tables
being audited. A subscription is usually a relationship between
publication and subscriber; you don't have a subscriber in this model
and why is a transaction totally different from a subscription? It
looks useless in the data model. Never use those silly "tbl-"
affixes, but do use plural or collective noun if more than one element
exists in the set that table models. Why did you avoid multi-column
natural keys that you have to enforce anyway?
Let's try a different schema, with real keys:
CREATE TABLE Journals
(issn CHAR(8) NOT NULL PRIMARY KEY
CHECK ( <>),
journal_title VARCHAR(25) NOT NULL,
etc.);
CREATE TABLE Subscribers
(subscriber_id INTEGER NOT NULL PRIMARY KEY,
subscriber_name VARCHAR(25) NOT NULL,
subscriber_street VARCHAR(25) NOT NULL,
etc.);
CREATE TABLE Subscriptions
(subscriber_id INTEGER NOT NULL
REFERENCES Subscribers(subscriber_id)
ON UPDATE CASCADE,
issn CHAR(8) NOT NULL
REFERENCES Journals (issn)
ON UPDATE CASCADE,
start_issue_seq INTEGER NOT NULL,
end_issue_seq INTEGER NOT NULL,
CHECK (start_issue_seq <= end_issue_seq),
copy_cnt INTEGER DEFAULT 1 NOT NULL
CHECK (copy_cnt > 0),
PRIMARY KEY (subscriber_id, issn, start_issue_seq), --natural key!
etc.);
CREATE TABLE JournalIssues -- subordinate table
(issn CHAR(8) NOT NULL
REFERENCES Journals (issn)
ON UPDATE CASCADE,
issue_seq INTEGER NOT NULL,
PRIMARY KEY (issn, issue_seq),
publication_date DATETIME NOT NULL, -- assuming this exists
etc.);
You might want to have a VIEW on this that uses CURRENT_TIMESTAMP to
filter out future scheduled issues.
>> If there are 6 issues of this journal per year then a 2 year
subscription might consist of 2 transactions for 1 year each,.. <<
Why not say the subscription is for 12 issues, starting with issue
#13 to #24? The year has nothing to do with the number of issues, and
you can locate that fact as an attribute of an issue (i.e. issue #13
is the 2008 July issue of that publication). You need to quit
splitting a single fact across two rows -- look up "attribute
splitting" as a data modeling error.
You will need a stored procedure that looks for existing
subscriptions,then extends the last_issue_seq when someone renews
before expiration.
>> However it is possible for a subscription to pause, or lapse, for a period of time between two transactions and miss some issues, for example
Year 1: Issue11 - Issue16
Year 2: Issue19 - Issue24 <<
That is two separate subscriptions.
>> Issues are not added to Issue until they are current, so the Issue in Issue with the highest Journal Sequence number is the current one <<
No, the current issue is the issue with either the next publication
date or the most recent publication date (if I want to good to the
warehouse and send it out).
>> What I need is to be able to determine the number of subscriptions (and also the total number of copies for those subscriptions) that are returning with the current issue (e.g. Issue19 in the examples above) after a lapsed period (the second example), EXCLUDING any that haven't lapsed, i.e. that have continued straight on (the first example) for any particular journal. <<
We can look for the subscriptions that have a first issue scheduled
for a future date in this data model. You did not say how to handle a
new subscription, so I lumped them into the same bucket with this
query:
SELECT @issn, COUNT(*) AS new_cnt, SUM(copy_cnt) AS issues_cnt
FROM Subscriptions AS S1
WHERE S1.issn = @my_issn
AND S1.start_issue_seq
= (SELECT J1.issue_seq -- next issue number
FROM JournalsIssue AS J1
WHERE J1.issn = S1.issn
AND J1.publication_date
= (SELECT MIN(J2.publication_date) -- next issue date
FROM JournalsIssue AS J2
WHERE J2.issn = S1.issn
AND publication_date >= CURRENT_TIMESTAMP));
You can add another search condition to keep only the renewals:
AND (SELECT COUNT(*)
FROM Subscriptions AS S2
WHERE S2.issn = S2.issn
GROUP BY subscriber_id) > 1 --previous subscription