Searching on AlphaNumeric Content Only

Searching on AlphaNumeric Content Only

am 03.09.2009 05:47:15 von sono-io

Is there is a way to search only for the alphanumeric content of
field in a db? I have an itemID field that contains item #'s that
include dashes, forward slashes, etc, and I want people to be able to
search for an item # even if they don't enter the punctuation exactly.

Here's an example: let's say there is an itemID of 4D-2448-7PS but
someone omits the dashes and searches on 4D24487PS. Is it possible in
PHP to have the find be successful, even if the search criteria
doesn't exactly match what's stored in the field?

If this is possible, I'd appreciate it if someone could just point me
in the right direction so I can read up on it.

Thanks,
Frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 06:26:40 von geek.de

--00c09f9b062d4744c00472a4cad8
Content-Type: text/plain; charset=UTF-8

Hi,

It's definitely possible to do when you do it in PHP, but not sure about on
the database side. You could read all records into memory and then iterate
over it with something like:

$toSearch = "4D24487PS"
$charsToIgnore = array('-','+',...);

foreach ($items as $k=>$item) {
$itemVal = str_replace($charsToIgnore, '', $item);
if (strcmp(str_replace($charsToIgnore, '', $toSearch), $itemVal) == 0) {
$return = $item;
break;
}
}

This however might use a lot of memory, but if your DB is a manageable size
it should be ok. You can probably optimise it by iterating over a db result
set instead of reading everything into an array.

Cheers,
Tim
++Tim Hinnerk Heuer++

http://www.ihostnz.com

2009/9/3

> Is there is a way to search only for the alphanumeric content of
> field in a db? I have an itemID field that contains item #'s that include
> dashes, forward slashes, etc, and I want people to be able to search for an
> item # even if they don't enter the punctuation exactly.
>
> Here's an example: let's say there is an itemID of 4D-2448-7PS but
> someone omits the dashes and searches on 4D24487PS. Is it possible in PHP
> to have the find be successful, even if the search criteria doesn't exactly
> match what's stored in the field?
>
> If this is possible, I'd appreciate it if someone could just point
> me in the right direction so I can read up on it.
>
> Thanks,
> Frank
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--00c09f9b062d4744c00472a4cad8--

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 06:30:09 von Ben Dunlap

> =A0 =A0 =A0 =A0Is there is a way to search only for the alphanumeric cont=
ent of
> field in a db? =A0I have an itemID field that contains item #'s that incl=
ude
> dashes, forward slashes, etc, and I want people to be able to search for =
an
> item # even if they don't enter the punctuation exactly.

Not sure if there's anything specifically PHP-ish that will help you
here, but I would be inclined to start by storing a stripped-down
version of the item ID (alphanumeric characters only) in a separate
column in the database table.

Then, when a user enters some search data, I would remove
non-alphanumeric characters, if any, from the user's input, and then
search the stripped column with this normalized version of the input.

If you want even fuzzier matching (inadvertent transpositions or an
omitted character or two OK, for example), you might read about
Levenshtein distance:

http://en.wikipedia.org/wiki/Levenshtein_distance

PHP has a levenshtein function but you'll have to figure out a way to
use it efficiently with your data set. Or, if Levenshtein isn't quite
right for your needs, the article above might at least point you in a
useful direction.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 10:25:25 von Ashley Sheridan

On Wed, 2009-09-02 at 21:30 -0700, Ben Dunlap wrote:
> > Is there is a way to search only for the alphanumeric content of
> > field in a db? I have an itemID field that contains item #'s that include
> > dashes, forward slashes, etc, and I want people to be able to search for an
> > item # even if they don't enter the punctuation exactly.
>
> Not sure if there's anything specifically PHP-ish that will help you
> here, but I would be inclined to start by storing a stripped-down
> version of the item ID (alphanumeric characters only) in a separate
> column in the database table.
>
> Then, when a user enters some search data, I would remove
> non-alphanumeric characters, if any, from the user's input, and then
> search the stripped column with this normalized version of the input.
>
> If you want even fuzzier matching (inadvertent transpositions or an
> omitted character or two OK, for example), you might read about
> Levenshtein distance:
>
> http://en.wikipedia.org/wiki/Levenshtein_distance
>
> PHP has a levenshtein function but you'll have to figure out a way to
> use it efficiently with your data set. Or, if Levenshtein isn't quite
> right for your needs, the article above might at least point you in a
> useful direction.
>
> Ben
>

What's wrong with using the wildcards that are built into most SQL
variants?

SELECT * FROM table WHERE item_id LIKE '%#abcdef'

Will select all records where the item_id field ends in '#abcdef'

Thanks,
Ash
http://www.ashleysheridan.co.uk




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: Searching on AlphaNumeric Content Only

am 03.09.2009 10:56:51 von Andrea Giammarchi

--_92620e1b-1d7a-4978-8858-6f3185ff13ce_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable


Which DB?
If it is MySQL=2C as example=2C you can simply use REGEXP syntax "^[a-zA-Z0=
-9]+$" via SELECT

Regards

> From: sono-io@fannullone.us
> To: php-general@lists.php.net
> Date: Wed=2C 2 Sep 2009 20:47:15 -0700
> Subject: [PHP] Searching on AlphaNumeric Content Only
>=20
> Is there is a way to search only for the alphanumeric content of =20
> field in a db? I have an itemID field that contains item #'s that =20
> include dashes=2C forward slashes=2C etc=2C and I want people to be able =
to =20
> search for an item # even if they don't enter the punctuation exactly.
>=20
> Here's an example: let's say there is an itemID of 4D-2448-7PS but =20
> someone omits the dashes and searches on 4D24487PS. Is it possible in =20
> PHP to have the find be successful=2C even if the search criteria =20
> doesn't exactly match what's stored in the field?
>=20
> If this is possible=2C I'd appreciate it if someone could just point me =
=20
> in the right direction so I can read up on it.
>=20
> Thanks=2C
> Frank
>=20
> --=20
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe=2C visit: http://www.php.net/unsub.php
>=20

____________________________________________________________ _____
Drag n=92 drop=97Get easy photo sharing with Windows Live=99 Photos.

http://www.microsoft.com/windows/windowslive/products/photos .aspx=

--_92620e1b-1d7a-4978-8858-6f3185ff13ce_--

RE: Searching on AlphaNumeric Content Only

am 03.09.2009 13:41:08 von Andrea Giammarchi

--_c62c9826-1a7f-4a7d-bde5-e84f7c78862a_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable



> Indeed you could do it via a regexp=2C but that uses up quite some memory=
..=20
> Every time you do a SELECT. You can simply add a table column with the=20
> stripped value and let the table update itself (with an ON UPDATE ON=20
> INSERT trigger=2C which takes the input value for the itemID and strips i=
t=20
> once).
>=20
> When doing this on inputting the value into the database=2C you save=20
> yourself the pain (and performance) of doing it on every SELECT-query.

Excuse me? Somebody suggested a PHP loop to solve a query problem and you a=
re saying that REGEXP should not be used?
MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the =
first one if nothing changed inside the table.

At the same time an internal REGEXP is faster than everything else has to m=
ove out and be parsed after via=2C probably=2C the same REGEXP engine. Try =
some bench.

This problem=2C imho=2C is a non-problem=2C at least not a PHP problem.

How MySQL optimizes internally REGEXPs is not PHP problem as well.

It's like to create a loop to read byte after byte because file_get_content=
s could be memory greedy (if you do that with 1 Gb of file you are you doin=
g wrong in any case=2C logs need to be split as example) or avoid MATCH AGA=
INST in query if we have too many rows because of performances problem (tab=
le could be slipt as well to optimize performances) ... and these practices=
to avoid native solutions are a bit hilarious=2C imho.

Regards

____________________________________________________________ _____
Drag n=92 drop=97Get easy photo sharing with Windows Live=99 Photos.

http://www.microsoft.com/windows/windowslive/products/photos .aspx=

--_c62c9826-1a7f-4a7d-bde5-e84f7c78862a_--

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 19:39:48 von Ben Dunlap

> Excuse me? Somebody suggested a PHP loop to solve a query problem and you are saying that REGEXP should not be used?
> MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the first one if nothing changed inside the table.

Even if the REGEXP has to change with every query?

Performance aside, I think REGEXP() could be used here, but not in the
way you've suggested. As the OP has described his table, your regex
("^[a-zA-Z0-9]+$") won't match any rows, because all of his product
IDs have non-alphanumeric characters in them.

Suppose this table:

pk | prod_id
1 | 07-ABCD-98
2 | 98-ZCXQ-21

And now suppose the OP's scenario, where a user tries to search on
product id, but enters "07ABCD98".

If the aim is to use REGEXP() to return row 1, I suppose you could
intersperse the search string with ".?" sequences and end up with this
query:

SELECT * FROM table WHERE prod_id REGEXP '^0.?7.?A.?B.?C.?D.?9.?8$'

I think just stripping the alphanumeric characters would end up being
more flexible, though.

-Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 19:47:23 von Ben Dunlap

> What's wrong with using the wildcards that are built into most SQL
> variants?
>
> SELECT * FROM table WHERE item_id LIKE '%#abcdef'
>
> Will select all records where the item_id field ends in '#abcdef'

That works if you know the user is always going to enter the last 7
characters of the product id, but that's not how the OP characterized
the problem. The OP talked about search strings where multiple
characters had been omitted from different parts of the product id.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: Searching on AlphaNumeric Content Only

am 03.09.2009 20:21:33 von Andrea Giammarchi

--_591d4176-5dc5-4fda-926f-a983bdbf7415_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable



> Even if the REGEXP has to change with every query?

Ben=2C it does not matter=2C this is not a PHP problem but a DB structure/s=
elect/insert/update problem.
Whatever REGEXP you use=2C a REGEXP is what you need to solve this problem=
=2C certainly not a PHP loop over each row with operations for each rows.

These things are OK if you do not know REGEXP or REGEXP MySQL syntax=2C but=
in this case you should ask for the correct REGEXP rather than talk about =
performances=2C obviously slower outside MySQL and via a runtime interprete=
d language as PHP is=2C or other solutions which aim is to end up with some=
thing that just emulate a select with REGEXP.

Did you get my point? Finally=2C when I say "you" I mean generally speaking=
:)

Regards

____________________________________________________________ _____
With Windows Live=2C you can organize=2C edit=2C and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/prod ucts/photo-gall=
ery-edit.aspx=

--_591d4176-5dc5-4fda-926f-a983bdbf7415_--

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 21:12:40 von sono-io

Thanks to everyone who has responded. After reading everyone's
response, I think I have a very simple way to solve my "problem".

Using my original example, if someone wants to find item #
4D-2448-7PS, no matter what they type in, I'll take the input, strip
out all non-alphanumeric characters to make it 4D24487PS, add the
wildcard character between each of the remaining characters like so,
4*D*2*4*4*8*7*P*S, and then do the search.

Still being new at this, it seems to be the simplest approach, or is
my thinking flawed? This also keeps me from having to add another
field in the db to search on.

BTW, this solution needs to work with any db, even ASCII files, so it
has to happen in PHP.

Thanks again,
Frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 21:17:08 von Ashley Sheridan

On Thu, 2009-09-03 at 12:12 -0700, sono-io@fannullone.us wrote:
> Thanks to everyone who has responded. After reading everyone's
> response, I think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item #
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
> out all non-alphanumeric characters to make it 4D24487PS, add the
> wildcard character between each of the remaining characters like so,
> 4*D*2*4*4*8*7*P*S, and then do the search.
>
> Still being new at this, it seems to be the simplest approach, or is
> my thinking flawed? This also keeps me from having to add another
> field in the db to search on.
>
> BTW, this solution needs to work with any db, even ASCII files, so it
> has to happen in PHP.
>
> Thanks again,
> Frank
>
For speed you might want to consider an extra field in the DB in the
future. If the database gets larger, or your query needs to join several
tables together, then things will take a noticeable speed hit. I had a
similar issue myself where I had to search for names based on
mis-spellings of them. In the end I searched with metaphone tags on an
extra field in the DB set up for that purpose, but it was the only way
to do it that didn't affect the speed of the site.

Thanks,
Ash
http://www.ashleysheridan.co.uk




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 21:24:58 von Eddie Drapkin

On Thu, Sep 3, 2009 at 3:17 PM, Ashley Sheridan w=
rote:
> On Thu, 2009-09-03 at 12:12 -0700, sono-io@fannullone.us wrote:
>>       Thanks to everyone who has responded.  After r=
eading everyone's
>> response, I think I have a very simple way to solve my "problem".
>>
>>       Using my original example, if someone wants to find=
item #
>> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
>> out all non-alphanumeric characters to make it 4D24487PS, add the
>> wildcard character between each of the remaining characters like so,
>> 4*D*2*4*4*8*7*P*S, and then do the search.
>>
>>       Still being new at this, it seems to be the simples=
t approach, or is
>> my thinking flawed?  This also keeps me from having to add another
>> field in the db to search on.
>>
>>       BTW, this solution needs to work with any db, even =
ASCII files, so it
>> has to happen in PHP.
>>
>> Thanks again,
>> Frank
>>
> For speed you might want to consider an extra field in the DB in the
> future. If the database gets larger, or your query needs to join several
> tables together, then things will take a noticeable speed hit. I had a
> similar issue myself where I had to search for names based on
> mis-spellings of them. In the end I searched with metaphone tags on an
> extra field in the DB set up for that purpose, but it was the only way
> to do it that didn't affect the speed of the site.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Has anyone considered deploying an actual search engine (Solr, Sphinx,
etc.), as they will take care of the stripping, stemming, spelling
corrections, etc?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 22:02:39 von Tommy Pham

----- Original Message ----
> From: "sono-io@fannullone.us"
> To: PHP General List
> Sent: Thursday, September 3, 2009 12:12:40 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
>
> Thanks to everyone who has responded. After reading everyone's response, I
> think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item # 4D-2448-7PS, no
> matter what they type in, I'll take the input, strip out all non-alphanumeric
> characters to make it 4D24487PS, add the wildcard character between each of the
> remaining characters like so, 4*D*2*4*4*8*7*P*S, and then do the search.

The correct wildcard syntax to work in any DB (Oracle, MySQL, MSSQL, etc), is % and not * if I remember correctly. Searching like this is ok but won't be efficient when you have a lot of rows. As for external file processing txt, csv, etc... I recommend you create a separate mechanism for it since each storage medium is meant for different purposes. txt (both delimited and fix formatted) and csv are usually meant for importing/exporting between various RDBMS types and different companies. They're not mean for fast searching of data. I suggest you think about the amount of the data you have to deal with 1st and how often will the search be done on that data. It's probably easier and faster just to import the ascii into db and do you search on db if you have to work with any ascii.

As for adding another field to the db, perhaps your project just started? If so, wouldn't it be better to do it with the future in mind so later you won't have to go back and redesign the db and modify the codes because now you have over 100k rows to search and the search occurs just about every other hits? That time you now have could be used for code optimizing for better performance, add more features/functionalities to the site, etc... :) Trust me, searching the db table with over 200k rows and return the results with multi-table joins based 1 criteria isn't fun. Keep in mind that you shouldn't keep the users waiting more than 5 seconds. Only exception to that rule is data mining where you'll have millions of rows to work with ;) Then it's no longer your problem. It's the DBA :D


Regards,
Tommy

>
> Still being new at this, it seems to be the simplest approach, or is my
> thinking flawed? This also keeps me from having to add another field in the db
> to search on.
>
> BTW, this solution needs to work with any db, even ASCII files, so it has to
> happen in PHP.
>
> Thanks again,
> Frank
>
> --PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 22:25:53 von Paul M Foster

On Thu, Sep 03, 2009 at 12:12:40PM -0700, sono-io@fannullone.us wrote:

> Thanks to everyone who has responded. After reading everyone's
> response, I think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item #
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
> out all non-alphanumeric characters to make it 4D24487PS, add the
> wildcard character between each of the remaining characters like so,
> 4*D*2*4*4*8*7*P*S, and then do the search.

Your expression, if used to directly search in your SQL table, won't
work. The '*' character isn't a valid wildcard for SQL. In PostgreSQL,
the wildcard for any number of characters is '%', and for a single
character is '_'. I don't know that MySQL understands this same
convention. And who knows about Oracle.

As others have mentioned, it would be ideal (though not very
"normalized") to create a new table column which contains the
alphanumerics without the punctuation characters ('-'). In nearly any
SQL dialect, you could do a simple SELECT using LIKE to find your item,
if you're searching on this extra field.

If you want do the searching in PHP, then it becomes more complicated.
You'll have to strip out the dashes from the user input, and then query
all the keys from your table, and test them using a regular expression.
As mentioned before, this is time-consuming for a large table.

Here's something else to consider: Could there ever be two items which
only differ by the placement of their dashes? Like 4D-2448-7PS versus
4D2-44-87PS? If not, then you should store the item number without
punctuation, and use that as the primary key on your table. Have an
"extra" field which shows the item number with dashes. You can use this
extra field in printing inventory labels or whatever (I don't recall the
context of your original post).

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: Searching on AlphaNumeric Content Only

am 03.09.2009 22:33:57 von Andrea Giammarchi

--_421454c5-f61b-4076-9095-92c080911480_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


stripping=2C stemming=2C spelling corrections ?
... uhm=2C that's probably why they invented regular expressions=2C isn't =
it?

As I said=2C at the end of the day=2C this will be a manual slow=2C potenti=
ally wrong implementation of what we already have and use on daily basis.

But obviously=2C everybody is free to create his own problems=2C no doubts =
about that.

Regards

> Has anyone considered deploying an actual search engine (Solr=2C Sphinx=
=2C
> etc.)=2C as they will take care of the stripping=2C stemming=2C spelling
> corrections=2C etc?


____________________________________________________________ _____
With Windows Live=2C you can organize=2C edit=2C and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/prod ucts/photo-gall=
ery-edit.aspx=

--_421454c5-f61b-4076-9095-92c080911480_--

Re: Searching on AlphaNumeric Content Only

am 03.09.2009 23:05:30 von Ben Dunlap

>
> stripping, stemming, spelling corrections ?
> =A0... uhm, that's probably why they invented regular expressions, isn't =
it?
>
> As I said, at the end of the day, this will be a manual slow, potentially=
wrong implementation of what we already have and use on daily basis.

If you've got a regular-expression-based method in mind that simply
nails the OP's problem, please share. I'm still not seeing how
"regular expressions" is a sufficient answer to the OP's problem,
which is basically fuzzy search.

My sense is that regular expressions are for situations where you
basically know just what you're searching for, but don't really know
where it falls in your search space.

The OP, on the other hand, is building a system where he won't know
just what he's searching for -- all he'll know is that his search key
is "sort of like" the thing he actually needs to find.

You might be able to squeeze this problem, or at least some part of
it, into a regex-based solution, but I don't think it's a natural fit.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 17:00:30 von Lupus Michaelis

Ashley Sheridan a écrit :
> What's wrong with using the wildcards that are built into most SQL
> variants?

Performance issues. Like is an operator to avoid when possible.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 17:02:07 von Ashley Sheridan

On Fri, 2009-09-04 at 17:00 +0200, Lupus Michaelis wrote:
> Ashley Sheridan a écrit :
> > What's wrong with using the wildcards that are built into most SQL
> > variants?
>=20
> Performance issues. Like is an operator to avoid when possible.
>=20
> --=20
> Mickaël Wolff aka Lupus Michaelis
> http://lupusmic.org
>=20
You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...

Thanks,
Ash
http://www.ashleysheridan.co.uk




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 17:06:46 von Lupus Michaelis

sono-io@fannullone.us a écrit :

> Here's an example: let's say there is an itemID of 4D-2448-7PS but
> someone omits the dashes and searches on 4D24487PS. Is it possible in
> PHP to have the find be successful, even if the search criteria doesn't
> exactly match what's stored in the field?

I think you don't follow the good path. A good way could be to have a
strenger UI management. For exemple, here, you have an ID that is
composed about 3 bits. So do the UI present three input fields, or write
a smart input (with JS).

Of course, in the control side (PHP), you have to check the format of
the returned data. Here you can reject the input, or fall through some
segment of code that try to interpret the submitted value. But in first
time, do it simple and be rough to users. Usability can be improved in a
second time.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 18:02:53 von Lupus Michaelis

Ashley Sheridan a écrit :

> You'll have far greater performance issues if you retrieve all those
> records and attempt to do the same thing inside of PHP...

It's why I speak about « avoiding » and not « bannishing ». Like can
be usefull, I used to use it. But it is not the a good answer to all
problems. The problem with like operator is it can't use the index (or
in a very limited way). So I try to warn about it.

So said, I never submit an all-retrieving method. I know it isn't the
solution too.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 22:22:18 von Eddie Drapkin

On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelis wrote:
> Ashley Sheridan a écrit :
>
>> You'll have far greater performance issues if you retrieve all those
>> records and attempt to do the same thing inside of PHP...
>
>  It's why I speak about « avoiding » and not « bannis=
hing ». Like can be
> usefull, I used to use it. But it is not the a good answer to all problem=
s.
> The problem with like operator is it can't use the index (or in a very
> limited way). So I try to warn about it.
>
>  So said, I never submit an all-retrieving method. I know it isn't t=
he
> solution too.
>
> --
> Mickaël Wolff aka Lupus Michaelis
> http://lupusmic.org
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

So far, in this thread, there've been a few solutions:
1) LIKE in SQL.
2) REGEXP in SQL.
3) PCRE in PHP
4) Other fetch all methods in PHP.

The one thing that I'm seeing as a consistent agreement is that the
performance hit of whichever of the aforementioned measures is going
to be enough to be considering something else. I briefly mentioned -
I apologize for the brevity of that email because I was in a hurry -
that a legitimate full text search engine is the right solution to
this problem. The only problem with deploying a full text search
engine is going to be the difficulty in the deployment and perhaps
issues if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).

What a full text search engine gives you is flexibility in your
searches, such that the initial question, when I read it, I thought
"Oh, someone will tell him to use Sphinx or Solr as both have special
filters for word seperation and would handle this without any special
instruction." Instead, this is never even brought up!

Why was using a full text search engine to do this sort of thing - not
to mention the other benefits that it would bring (responsiveness and
flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
rejected so offhandedly? I can't actually think of a better way to do
this without requiring a whole heap of overhead, either processing or
programming.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 04.09.2009 22:36:08 von Robert Cummings

Eddie Drapkin wrote:
> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> Michaelis wrote:
>> Ashley Sheridan a écrit :
>>
>>> You'll have far greater performance issues if you retrieve all those
>>> records and attempt to do the same thing inside of PHP...
>> It's why I speak about « avoiding » and not « bannishing ». Like can be
>> usefull, I used to use it. But it is not the a good answer to all problems.
>> The problem with like operator is it can't use the index (or in a very
>> limited way). So I try to warn about it.
>>
>> So said, I never submit an all-retrieving method. I know it isn't the
>> solution too.
>>
>> --
>> Mickaël Wolff aka Lupus Michaelis
>> http://lupusmic.org
>>
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
> So far, in this thread, there've been a few solutions:
> 1) LIKE in SQL.
> 2) REGEXP in SQL.
> 3) PCRE in PHP
> 4) Other fetch all methods in PHP.
>
> The one thing that I'm seeing as a consistent agreement is that the
> performance hit of whichever of the aforementioned measures is going
> to be enough to be considering something else. I briefly mentioned -
> I apologize for the brevity of that email because I was in a hurry -
> that a legitimate full text search engine is the right solution to
> this problem. The only problem with deploying a full text search
> engine is going to be the difficulty in the deployment and perhaps
> issues if you're on shared hosting (but then again I am of the opinion
> that those who choose to run with shared hosting dig their own graves
> in more ways than one).
>
> What a full text search engine gives you is flexibility in your
> searches, such that the initial question, when I read it, I thought
> "Oh, someone will tell him to use Sphinx or Solr as both have special
> filters for word seperation and would handle this without any special
> instruction." Instead, this is never even brought up!
>
> Why was using a full text search engine to do this sort of thing - not
> to mention the other benefits that it would bring (responsiveness and
> flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
> rejected so offhandedly? I can't actually think of a better way to do
> this without requiring a whole heap of overhead, either processing or
> programming.

I've just sort of stopped in on this thread... but why isn't MySQL's
FULLTEXT engine being considered?

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 05.09.2009 00:11:17 von Tommy Pham

----- Original Message ----=0A> From: Robert Cummings >=0A> To: Eddie Drapkin =0A> Cc: Lupus Michaelis el+php@lupusmic.org>; php-general@lists.php.net=0A> Sent: Friday, September=
4, 2009 1:36:08 PM=0A> Subject: Re: [PHP] Searching on AlphaNumeric Conten=
t Only=0A> =0A> Eddie Drapkin wrote:=0A> > On Fri, Sep 4, 2009 at 12:02 PM,=
Lupus=0A> > Michaeliswrote:=0A> >> Ashley Sheridan a =E9crit :=0A> >>=0A> =
>>> You'll have far greater performance issues if you retrieve all those=0A=
> >>> records and attempt to do the same thing inside of PHP...=0A> >> It'=
s why I speak about =AB avoiding =BB and not =AB bannishing =BB. Like can b=
e=0A> >> usefull, I used to use it. But it is not the a good answer to all =
problems.=0A> >> The problem with like operator is it can't use the index (=
or in a very=0A> >> limited way). So I try to warn about it.=0A> >>=0A> >> =
So said, I never submit an all-retrieving method. I know it isn't the=0A> =
>> solution too.=0A> >>=0A> >> --=0A> >> Micka=EBl Wolff aka Lupus Michaeli=
s=0A> >> http://lupusmic.org=0A> >>=0A> >> --=0A> >> PHP General Mailing Li=
st (http://www.php.net/)=0A> >> To unsubscribe, visit: http://www.php.net/u=
nsub.php=0A> >>=0A> >>=0A> > =0A> > So far, in this thread, there've been a=
few solutions:=0A> > 1) LIKE in SQL.=0A> > 2) REGEXP in SQL.=0A> > 3) PCRE=
in PHP=0A> > 4) Other fetch all methods in PHP.=0A> > =0A> > The one thing=
that I'm seeing as a consistent agreement is that the=0A> > performance hi=
t of whichever of the aforementioned measures is going=0A> > to be enough t=
o be considering something else. I briefly mentioned -=0A> > I apologize =
for the brevity of that email because I was in a hurry -=0A> > that a legit=
imate full text search engine is the right solution to=0A> > this problem. =
The only problem with deploying a full text search=0A> > engine is going t=
o be the difficulty in the deployment and perhaps=0A> > issues if you're on=
shared hosting (but then again I am of the opinion=0A> > that those who ch=
oose to run with shared hosting dig their own graves=0A> > in more ways tha=
n one).=0A> > =0A> > What a full text search engine gives you is flexibilit=
y in your=0A> > searches, such that the initial question, when I read it, I=
thought=0A> > "Oh, someone will tell him to use Sphinx or Solr as both hav=
e special=0A> > filters for word seperation and would handle this without a=
ny special=0A> > instruction." Instead, this is never even brought up!=0A>=
> =0A> > Why was using a full text search engine to do this sort of thing =
- not=0A> > to mention the other benefits that it would bring (responsivene=
ss and=0A> > flexibility in searching, speed, decreased use of MySQL, etc. =
etc.) -=0A> > rejected so offhandedly? I can't actually think of a better =
way to do=0A> > this without requiring a whole heap of overhead, either pro=
cessing or=0A> > programming.=0A> =0A> I've just sort of stopped in on this=
thread... but why isn't MySQL's =0A> FULLTEXT engine being considered?=0A=
=0AIf I remember correctly, it's only on available on MyISAM table type. I=
don't=0Athink the OP mentioned that he's using MyISAM or MySQL for that ma=
tter. > =0A> Cheers,=0A> Rob.=0A> -- =0A> http://www.interjinn.com=0A>=
Application and Templating Framework for PHP=0A> =0A> -- =0A> PHP General =
Mailing List (http://www.php.net/)=0A> To unsubscribe, visit: http://www.ph=
p.net/unsub.php=0A

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 05.09.2009 00:14:09 von Paul M Foster

On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:

> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> Michaelis wrote:

> if you're on shared hosting (but then again I am of the opinion
> that those who choose to run with shared hosting dig their own graves
> in more ways than one).

Any time you or someone else would like to backstop me in setting up a
dedicated server on rackspace or somewhere else, for free or really
cheap, you let me know! Otherwise, those of us with less than complete
expertise in server setup are stuck with shared hosting. ;-}

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 05.09.2009 02:15:41 von Robert Cummings

Paul M Foster wrote:
> On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
>
>> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
>> Michaelis wrote:
>
>> if you're on shared hosting (but then again I am of the opinion
>> that those who choose to run with shared hosting dig their own graves
>> in more ways than one).
>
> Any time you or someone else would like to backstop me in setting up a
> dedicated server on rackspace or somewhere else, for free or really
> cheap, you let me know! Otherwise, those of us with less than complete
> expertise in server setup are stuck with shared hosting. ;-}

For a few hundred bucks a year you can get a VPS (Virtual Private
Server), which gives you root access. once you get a couple of clients
on it, it will pay for itself year after year. If you have clients that
don't use much resources, you can put a few on and even make a profit on
the hosting alone.

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 05.09.2009 06:15:08 von Paul M Foster

On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:

> Paul M Foster wrote:
>> On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
>>
>>> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
>>> Michaelis wrote:
>>
>>> if you're on shared hosting (but then again I am of the opinion
>>> that those who choose to run with shared hosting dig their own graves
>>> in more ways than one).
>>
>> Any time you or someone else would like to backstop me in setting up a
>> dedicated server on rackspace or somewhere else, for free or really
>> cheap, you let me know! Otherwise, those of us with less than complete
>> expertise in server setup are stuck with shared hosting. ;-}
>
> For a few hundred bucks a year you can get a VPS (Virtual Private
> Server), which gives you root access. once you get a couple of clients
> on it, it will pay for itself year after year. If you have clients that
> don't use much resources, you can put a few on and even make a profit on
> the hosting alone.

Oh sure. Well aware of it. The problem is not finding a VPS or working
with root access or anything like that. The problem is expertise. POP3,
SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
outward facing servers, etc. It's more expertise than most people have,
including me. The servers I run are internal and don't have to deal with
the rigors of the internet, and only serve a couple of people. Setting
up multiple domains under an Apache server is black magic to me, for
instance. And then there's backups, and
what-do-I-do-if-the-server-fails, etc. Sheesh.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Searching on AlphaNumeric Content Only

am 05.09.2009 07:33:31 von Tommy Pham

----- Original Message ----
> From: Paul M Foster
> To: php-general@lists.php.net
> Sent: Friday, September 4, 2009 9:15:08 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
>
> On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:
>
> > Paul M Foster wrote:
> >> On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
> >>
> >>> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> >>> Michaeliswrote:
> >>
> >>> if you're on shared hosting (but then again I am of the opinion
> >>> that those who choose to run with shared hosting dig their own graves
> >>> in more ways than one).
> >>
> >> Any time you or someone else would like to backstop me in setting up a
> >> dedicated server on rackspace or somewhere else, for free or really
> >> cheap, you let me know! Otherwise, those of us with less than complete
> >> expertise in server setup are stuck with shared hosting. ;-}
> >
> > For a few hundred bucks a year you can get a VPS (Virtual Private
> > Server), which gives you root access. once you get a couple of clients
> > on it, it will pay for itself year after year. If you have clients that
> > don't use much resources, you can put a few on and even make a profit on
> > the hosting alone.
>
> Oh sure. Well aware of it. The problem is not finding a VPS or working
> with root access or anything like that. The problem is expertise. POP3,
> SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
> outward facing servers, etc. It's more expertise than most people have,
> including me. The servers I run are internal and don't have to deal with
> the rigors of the internet, and only serve a couple of people. Setting
> up multiple domains under an Apache server is black magic to me, for
> instance. And then there's backups, and
> what-do-I-do-if-the-server-fails, etc. Sheesh.

I stand corrected. You paid for peace of mind :D


>
> Paul
>
> --
> Paul M. Foster
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php