4.0.5 - Full text search (ft_min_word_len)

4.0.5 - Full text search (ft_min_word_len)

am 19.11.2002 12:49:20 von Irek

I think, that MySQL 4.0.4/4.0.5 have bug:

How-To-Repeat:

selcect ...
from table
where match(field1, field2, field3, firld4) against ("+T*" in boolean mode)

finded a lot of rows with starting letter "T", but my ft_min_word_len=4 !


Is it a bug ?

Regards
IKS
ICQ: 67420570

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13030@lists.mysql.com
To unsubscribe, e-mail

Re: 4.0.5 - Full text search (ft_min_word_len)

am 19.11.2002 17:37:25 von Sergei Golubchik

Hi!

On Nov 19, Irek wrote:
> I think, that MySQL 4.0.4/4.0.5 have bug:
>
> How-To-Repeat:
>
> selcect ...
> from table
> where match(field1, field2, field3, firld4) against ("+T*" in boolean mode)
>
> finded a lot of rows with starting letter "T", but my ft_min_word_len=4 !

Sorry ?
You asked for words that start from "T", so you got them.
What did you expect ?

Regards,
Sergei

--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13035@lists.mysql.com
To unsubscribe, e-mail

Re: 4.0.5 - Full text search (ft_min_word_len)

am 20.11.2002 08:26:29 von Irek

> > I think, that MySQL 4.0.4/4.0.5 have bug:
> >
> > How-To-Repeat:
> >
> > selcect ...
> > from table
> > where match(field1, field2, field3, firld4) against ("+T*" in boolean
mode)
> >
> > finded a lot of rows with starting letter "T", but my ft_min_word_len=4
!
>
> Sorry ?
> You asked for words that start from "T", so you got them.
> What did you expect ?
>
> Regards,
> Sergei

Ok, it is correct. Variable ft_min_word_len is minimum length of words to be
indexed but I was thinking that is the same with minimum length of phrase to
be searched. It was my mistake, sorry.

Regards
IKS


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13041@lists.mysql.com
To unsubscribe, e-mail

BUG in full text search ?

am 20.11.2002 13:20:51 von Irek

Hi,

I have problem with MySQL 4.0.4 with fulltext search.
How to repeat it:

CREATE TABLE mytest (
id int(10) unsigned NOT NULL auto_increment,
m_item char(16) NOT NULL default '',
m_tyt char(128) NOT NULL default '',
PRIMARY KEY (id),
UNIQUE KEY idx1 (m_item),
FULLTEXT KEY f1 (m_item,m_tyt)
) TYPE=MyISAM;

INSERT INTO mytest (id, m_item, m_tyt) VALUES("1", "100", "HISTORY XXI");
INSERT INTO mytest (id, m_item, m_tyt) VALUES("2", "101", "HISTORY XX");
INSERT INTO mytest (id, m_item, m_tyt) VALUES("3", "102", "HISTORY XXII");
INSERT INTO mytest (id, m_item, m_tyt) VALUES("4", "103", "HISTORY AND
OTHERS");

My select is:

select mytest.*
from mytest
where 1 and match(mytest.m_item, mytest.m_tyt) against ('+HISTORY*' in
boolean mode)

and working OK, but:

1) against ('+"HISTORY AN"*' in boolean mode)
returned 1 rows (it is OK)

2) against ('+"HISTORY XX"*' in boolean mode)
returned 3 rows (it is OK)

3) against ('+"HISTORY A"*' in boolean mode)
returned 4 rows (not 1 - WHY ???)

4) against ('+"HISTORY X"*' in boolean mode)
returned 4 rows (not 3 - WHY ???)


What with that ? It is correct ?


Regards
IKS
ICQ: 67420570


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13045@lists.mysql.com
To unsubscribe, e-mail

Re: BUG in full text search ?

am 20.11.2002 15:07:53 von Sergei Golubchik

Hi!

On Nov 20, Irek wrote:
> Hi,
>
> I have problem with MySQL 4.0.4 with fulltext search.
> How to repeat it:
>
> CREATE TABLE mytest (
> id int(10) unsigned NOT NULL auto_increment,
> m_item char(16) NOT NULL default '',
> m_tyt char(128) NOT NULL default '',
> PRIMARY KEY (id),
> UNIQUE KEY idx1 (m_item),
> FULLTEXT KEY f1 (m_item,m_tyt)
> ) TYPE=MyISAM;
>
> INSERT INTO mytest (id, m_item, m_tyt) VALUES("1", "100", "HISTORY XXI");
> INSERT INTO mytest (id, m_item, m_tyt) VALUES("2", "101", "HISTORY XX");
> INSERT INTO mytest (id, m_item, m_tyt) VALUES("3", "102", "HISTORY XXII");
> INSERT INTO mytest (id, m_item, m_tyt) VALUES("4", "103", "HISTORY AND
> OTHERS");
>
> My select is:
>
> select mytest.*
> from mytest
> where 1 and match(mytest.m_item, mytest.m_tyt) against ('+HISTORY*' in
> boolean mode)
>
> 3) against ('+"HISTORY A"*' in boolean mode)
> returned 4 rows (not 1 - WHY ???)
>
> What with that ? It is correct ?

No, it was a bug - and thank you for a test case, the bug is fixed and
fix will come with 4.0.6.

Regards,
Sergei

--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13046@lists.mysql.com
To unsubscribe, e-mail

Re: BUG in full text search ? - other examples ...

am 20.11.2002 15:35:06 von Irek

Other examples with this bug:

1a)
against ('+"HIS*" -OTHER*' in boolean mode)
returned 4 rows, but correct 3 rows

1b)
against ("+'HIS*' -OTHER*" in boolean mode)
returned 4 rows, but correct 3 rows

2a)
against ('+"HIS*R*" -OTHER*' in boolean mode)
returned 0 rows, but correct 3 rows

2b)
against ("+'HIS*R*' -OTHER*" in boolean mode)
returned 4 rows, but correct 3 rows

I think, that is the same bug with my previous email, only new examples.


I don't understand differences with ' and ". Could you explain it (or give
my the link to www) ?


Regards
Irek Smaczny
ICQ: 67420570


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13047@lists.mysql.com
To unsubscribe, e-mail

Re: BUG in full text search ? - other examples ...

am 20.11.2002 18:47:45 von Sergei Golubchik

Hi!

On Nov 20, Irek wrote:
> Other examples with this bug:
>
> 1a)
> against ('+"HIS*" -OTHER*' in boolean mode)
> returned 4 rows, but correct 3 rows

Yes, it's the same bug as in the last bugreport.
"..." does strict (though, case-insensitive) substring
match, and there is no "HIS*" substring in your table, so the correct
result is 0 rows.

> 1b)
> against ("+'HIS*' -OTHER*" in boolean mode)
> returned 4 rows, but correct 3 rows

' (single quote) has no special meaning and is ignored,
The word "OTHERS" is a stopword, so you cannot filter on it - stopword
are ignored completely. So, four rows is correct result.
Compare to

... against("+'HIS*' -XXI*" in boolean mode)

which, indeed, returns only two rows.

The misfeature is, that boolean search should not - by pure logic - be
subject to stopword filtering. To fix that I should rewrite fulltext
search engine almost from the scratch. So, fixing stopword issue is in
TODO, but it won't appear before MySQL 4.1.

> 2a)
> against ('+"HIS*R*" -OTHER*' in boolean mode)
> returned 0 rows, but correct 3 rows

No. '*' is a _truncation_ operator. You can search for a word with a
given prefix with 'prefix*', but '*' in the middle of the word doesn't
work as you may expect. Also, "..." looks for a substring, and there is
no row in your table that contain "HIS*R*" substring, so 0 rows is
correct result here.

> 2b)
> against ("+'HIS*R*' -OTHER*" in boolean mode)
> returned 4 rows, but correct 3 rows

As above, -OTHER* is ignored. ' (single quote) means nothing, so you
query is identical to "HIS* R*", and MySQL correctly returns all the four
rows.

> I don't understand differences with ' and ". Could you explain it (or give
> my the link to www) ?

" (double quote) is phrase search (substring) operator.It will match
only rows that contain substring in quotes verbatim. Think of it as of
LIKE '%substring%' match.

' (single quote) has no special meaning, and is ignored, as such.

Regards,
Sergei

--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13051@lists.mysql.com
To unsubscribe, e-mail

Re: BUG in full text search ? - other examples ...

am 21.11.2002 22:22:00 von Irek

Ok, I undersand that this bug in full text search will be corrected from
version 4.1 ? When this version be in download page on MySQL www ?

Regards
Irek Smaczny
smaczny@dst.com.pl


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13067@lists.mysql.com
To unsubscribe, e-mail

Re: BUG in full text search ? - other examples ...

am 21.11.2002 23:33:11 von Sergei Golubchik

Hi!

On Nov 21, Irek wrote:
> Ok, I undersand that this bug in full text search will be corrected from
> version 4.1 ? When this version be in download page on MySQL www ?

It's not a bug.
By design boolean fulltext search is now subject
to stopword filtering.

I plan to change this, but this change cannot go into 4.0 as it's
already beta. And beta version accepts only bugfixes.

So this change can be done in 4.1.
It's not done yet, it's still in todo.

Regards,
Sergei

--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13068@lists.mysql.com
To unsubscribe, e-mail