Query has me stumped
am 12.07.2007 20:24:54 von markdavies2003
Assume the following table, where the columns represent two-word
strings (word1 and word2) and the part of speech of the second word,
along with their frequency:
[TagTable]
freq word1 word2 partOfSpeechWord2
---- ---- ---- ----
1 the plant verb (incorrectly tagged)
21 the plant noun
6 to plant verb
27 to stop verb
3 to stop noun (incorrectly tagged)
4 the stop noun
I want to generate the following list, which show the most frequent
partOfSpeech for a given word1 / word2 combination, e.g.:
21 the plant noun
6 to plant verb
27 to stop verb
4 the stop noun
(but would not include the following, which are presumably errors from
a part of speech tagger):
1 the plant verb
3 to stop noun
What's the SELECT command to get the four correct results above?
----------------------------------------
A somewhat more complicated scenario --
Assume the same [TagTable] above, and then another table with NULL
partOfSpeech tags:
[NeedsTag]
word1 word2 partOfSpeechWord2
---- ---- ----
the plant NULL
to plant NULL
to stop NULL
the stop NULL
What would be the correct UPDATE command to insert noun, verb, verb,
noun (in that order) into this table, based on the most frequent tag
from the first table (i.e. ignoring the incorrect "the plant = V" and
"to stop = N")?
Thanks in advance for your help.
Mark Davies
Re: Query has me stumped
am 12.07.2007 22:49:13 von Roy Harvey
The query:
SELECT A.*
FROM TagTable as A
WHERE partOfSpeechWord2 =
(SELECT TOP 1 B.partOfSpeechWord2
FROM TagTable as B
WHERE A.word1 = B.word1
AND A.word2 = B.word2
ORDER BY B.freq desc)
And the update, which includes the above query as a derived table.
UPDATE NeedsTag
SET partOfSpeechWord2 = X.partOfSpeechWord2
FROM (SELECT A.*
FROM TagTable as A
WHERE partOfSpeechWord2 =
(SELECT TOP 1 B.partOfSpeechWord2
FROM TagTable as B
WHERE A.word1 = B.word1
AND A.word2 = B.word2
ORDER BY B.freq desc)) as X
WHERE NeedsTag.word1 = X.word1
AND NeedsTag.word2 = X.word2
Roy Harvey
Beacon Falls, CT
On Thu, 12 Jul 2007 11:24:54 -0700, markdavies2003@msn.com wrote:
>Assume the following table, where the columns represent two-word
>strings (word1 and word2) and the part of speech of the second word,
>along with their frequency:
>
>[TagTable]
>
>freq word1 word2 partOfSpeechWord2
>---- ---- ---- ----
>1 the plant verb (incorrectly tagged)
>21 the plant noun
>6 to plant verb
>27 to stop verb
>3 to stop noun (incorrectly tagged)
>4 the stop noun
>
>I want to generate the following list, which show the most frequent
>partOfSpeech for a given word1 / word2 combination, e.g.:
>
>21 the plant noun
>6 to plant verb
>27 to stop verb
>4 the stop noun
>
>(but would not include the following, which are presumably errors from
>a part of speech tagger):
>1 the plant verb
>3 to stop noun
>
>What's the SELECT command to get the four correct results above?
>
>----------------------------------------
>
>A somewhat more complicated scenario --
>
>Assume the same [TagTable] above, and then another table with NULL
>partOfSpeech tags:
>
>[NeedsTag]
>
>word1 word2 partOfSpeechWord2
>---- ---- ----
>the plant NULL
>to plant NULL
>to stop NULL
>the stop NULL
>
>What would be the correct UPDATE command to insert noun, verb, verb,
>noun (in that order) into this table, based on the most frequent tag
>from the first table (i.e. ignoring the incorrect "the plant = V" and
>"to stop = N")?
>
>Thanks in advance for your help.
>
>Mark Davies
Re: Query has me stumped
am 13.07.2007 04:29:25 von Joe Celko
>> Assume the following table, where the columns represent two-word strings (word1 and word2) and the part of speech of the second word, along with their frequency: <<
Since the part of speech coumn is computed from word1, why does it
exist at all? Put it in a VIEW.