Im seeking a solution to check for duplicate entries with an other
am 11.01.2010 12:04:04 von omar zorgui
--0016364167e5ddae49047ce17e10
Content-Type: text/plain; charset=ISO-8859-1
Hello,
I have a question currently i'm programming a function to remove duplicates
from an array of sentences.
The computer must classify sentences with exactly the same words in it, but
with a different sequence as a duplicate.
The array that i use as a feed is as following:
$sentences[] = "this is a example sentence";
$sentences[] = "a this is example sentence";
$sentences[] = "example this is a sentence";
$sentences[] = "this is a example sentence for a function";
I want the computer to keep only two records of the sentence array in this
case (0|1|2) and 3
I made the script as following:
function($sentences) {
foreach($sentences as $sentence) {
$words = explode(" ",$sentence);
$sentenceArray[] = $words;
}
foreach(sentenceArray as $sentence) {
// Here i want the same words are in other sentences
}
}
My question is how could i check the $sentence array if there are records in
it with exactly the same words
as my feed?
--0016364167e5ddae49047ce17e10--
Re: Im seeking a solution to check for duplicate entries
am 11.01.2010 18:32:29 von Richard Quadling
2010/1/11 omar zorgui :
> $sentences[] = "this is a example sentence";
> $sentences[] = "a this is example sentence";
> $sentences[] = "example this is a sentence";
> $sentences[] = "this is a example sentence for a function";
// Define data.
$Sentences = array
(
"this is a example sentence",
"a this is example sentence",
"example this is a sentence",
"this is a example sentence for a function",
);
// Record hashes of the sorted words in each sentence.
$Hashes = array();
$Reduced = array_filter
(
$Sentences,
function($Sentence)
use(&$Hashes)
{
// Explode the sentence into words but forced to lower case.
$Words = explode(' ', strtolower($Sentence));
// Sort the words
sort($Words);
// If the hash of the serialized words array is not already known
if (!in_array($Hash = md5(serialize($Words)), $Hashes))
{
// then add it and return true.
$Hashes[] = $Hash;
return True;
}
else
{
// else return false to filter out this sentence.
return False;
}
}
);
// Show the reduced sentences.
print_r($Reduced);
?>
outputs ...
Array
(
[0] => this is a example sentence
[3] => this is a example sentence for a function
)
--
-----
Richard Quadling
"Standing on the shoulders of some very clever giants!"
EE : http://www.experts-exchange.com/M_248814.html
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
ZOPA : http://uk.zopa.com/member/RQuadling
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Im seeking a solution to check for duplicate entries
am 12.01.2010 17:33:23 von Richard Quadling
MjAxMC8xLzExIFJpY2hhcmQgUXVhZGxpbmcgPHJxdWFkbGluZ0Bnb29nbGVt YWlsLmNvbT46Cj4g
MjAxMC8xLzExIG9tYXIgem9yZ3VpIDxvbWFyem9yZ3VpQGdtYWlsLmNvbT46 Cj4+ICRzZW50ZW5j
ZXNbXSA9ICJ0aGlzIGlzIGEgZXhhbXBsZSBzZW50ZW5jZSI7Cj4+ICRzZW50 ZW5jZXNbXSA9ICJh
IHRoaXMgaXMgZXhhbXBsZSBzZW50ZW5jZSI7Cj4+ICRzZW50ZW5jZXNbXSA9 ICJleGFtcGxlIHRo
aXMgaXMgYSBzZW50ZW5jZSI7Cj4+ICRzZW50ZW5jZXNbXSA9ICJ0aGlzIGlz IGEgZXhhbXBsZSBz
ZW50ZW5jZSBmb3IgYSBmdW5jdGlvbiI7Cj4KPgo+IDw/cGhwCj4gLy8gRGVm aW5lIGRhdGEuCj4g
JFNlbnRlbmNlcyA9IGFycmF5Cj4gwqAgwqAgwqAgwqAoCj4gwqAgwqAgwqAg wqAidGhpcyBpcyBh
IGV4YW1wbGUgc2VudGVuY2UiLAo+IMKgIMKgIMKgIMKgImEgdGhpcyBpcyBl eGFtcGxlIHNlbnRl
bmNlIiwKPiDCoCDCoCDCoCDCoCJleGFtcGxlIHRoaXMgaXMgYSBzZW50ZW5j ZSIsCj4gwqAgwqAg
wqAgwqAidGhpcyBpcyBhIGV4YW1wbGUgc2VudGVuY2UgZm9yIGEgZnVuY3Rp b24iLAo+IMKgIMKg
IMKgIMKgKTsKPgo+IC8vIFJlY29yZCBoYXNoZXMgb2YgdGhlIHNvcnRlZCB3 b3JkcyBpbiBlYWNo
IHNlbnRlbmNlLgo+ICRIYXNoZXMgPSBhcnJheSgpOwo+ICRSZWR1Y2VkID0g YXJyYXlfZmlsdGVy
Cj4gwqAgwqAgwqAgwqAoCj4gwqAgwqAgwqAgwqAkU2VudGVuY2VzLAo+IMKg IMKgIMKgIMKgZnVu
Y3Rpb24oJFNlbnRlbmNlKQo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgdXNl KCYkSGFzaGVzKQo+
IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgewo+IMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgLy8gRXhw
bG9kZSB0aGUgc2VudGVuY2UgaW50byB3b3JkcyBidXQgZm9yY2VkIHRvIGxv d2VyIGNhc2UuCj4g
wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAkV29yZHMgPSBleHBsb2RlKCcgJywg c3RydG9sb3dlcigk
U2VudGVuY2UpKTsKPgo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgLy8gU29y dCB0aGUgd29yZHMK
PiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoHNvcnQoJFdvcmRzKTsKPgo+IMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgLy8gSWYgdGhlIGhhc2ggb2YgdGhlIHNlcmlhbGl6ZWQgd29y ZHMgYXJyYXkgaXMg
bm90IGFscmVhZHkga25vd24KPiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoGlm ICghaW5fYXJyYXko
JEhhc2ggPSBtZDUoc2VyaWFsaXplKCRXb3JkcykpLCAkSGFzaGVzKSkKPiDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoHsKPiDCoCDCoCDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoC8vIHRoZW4gYWRkIGl0IGFuZCByZXR1cm4gdHJ1ZS4KPiDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCRIYXNoZXNbXSA9ICRIYXNoOwo+IMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgIMKgIMKgcmV0dXJuIFRydWU7Cj4gwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgwqAg
wqAgwqAgwqB9Cj4gwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqBlbHNlCj4gwqAg wqAgwqAgwqAgwqAg
wqAgwqAgwqAgwqAgwqAgwqAgwqB7Cj4gwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgwqAg
wqAvLyBlbHNlIHJldHVybiBmYWxzZSB0byBmaWx0ZXIgb3V0IHRoaXMgc2Vu dGVuY2UuCj4gwqAg
wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqByZXR1cm4gRmFsc2U7 Cj4gwqAgwqAgwqAg
wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqB9Cj4gwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqB9Cj4g
wqAgwqAgwqAgwqApOwo+Cj4gLy8gU2hvdyB0aGUgcmVkdWNlZCBzZW50ZW5j ZXMuCj4gcHJpbnRf
cigkUmVkdWNlZCk7Cj4gPz4KPgo+Cj4gb3V0cHV0cyAuLi4KPgo+IEFycmF5 Cj4gKAo+IMKgIMKg
WzBdID0+IHRoaXMgaXMgYSBleGFtcGxlIHNlbnRlbmNlCj4gwqAgwqBbM10g PT4gdGhpcyBpcyBh
IGV4YW1wbGUgc2VudGVuY2UgZm9yIGEgZnVuY3Rpb24KPiApCj4KPgo+Cj4g LS0KPiAtLS0tLQo+
IFJpY2hhcmQgUXVhZGxpbmcKPiAiU3RhbmRpbmcgb24gdGhlIHNob3VsZGVy cyBvZiBzb21lIHZl
cnkgY2xldmVyIGdpYW50cyEiCj4gRUUgOiBodHRwOi8vd3d3LmV4cGVydHMt ZXhjaGFuZ2UuY29t
L01fMjQ4ODE0Lmh0bWwKPiBaZW5kIENlcnRpZmllZCBFbmdpbmVlciA6IGh0 dHA6Ly96ZW5kLmNv
bS96Y2UucGhwP2M9WkVORDAwMjQ5OCZyPTIxMzQ3NDczMQo+IFpPUEEgOiBo dHRwOi8vdWsuem9w
YS5jb20vbWVtYmVyL1JRdWFkbGluZwo+CgpJcyB0aGF0IHdoYXQgeW91IHdh bnRlZD8KCi0tIAot
LS0tLQpSaWNoYXJkIFF1YWRsaW5nCiJTdGFuZGluZyBvbiB0aGUgc2hvdWxk ZXJzIG9mIHNvbWUg
dmVyeSBjbGV2ZXIgZ2lhbnRzISIKRUUgOiBodHRwOi8vd3d3LmV4cGVydHMt ZXhjaGFuZ2UuY29t
L01fMjQ4ODE0Lmh0bWwKWmVuZCBDZXJ0aWZpZWQgRW5naW5lZXIgOiBodHRw Oi8vemVuZC5jb20v
emNlLnBocD9jPVpFTkQwMDI0OTgmcj0yMTM0NzQ3MzEKWk9QQSA6IGh0dHA6 Ly91ay56b3BhLmNv
bS9tZW1iZXIvUlF1YWRsaW5nCg==