# Cómo ordenar una matriz por similitud en relación con una palabra ingresada.

I have on PHP array, for example:

``````\$arr = array("hello", "try", "hel", "hey hello");
``````

Now I want to do rearrange of the array which will be based on the most nearly close words between the array and my \$search var.

¿Cómo puedo hacer eso?

you mean arrange the array in a specific order based on an input value entered by the user? -

Echa un vistazo a soundex y metafono. That are two common ways in PHP to compute the difference between two words. But I have currently no idea how to sort them according to a meaningful metric. Could you explain your sort order a bit more? -

## 5 Respuestas

This is a quick solution by using http://php.net/manual/en/function.similar-text.php:

This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.

``````\$userInput = 'Bradley123';

\$list = array('Bob', 'Brad', 'Britney');

usort(\$list, function (\$a, \$b) use (\$userInput) {
similar_text(\$userInput, \$a, \$percentA);
similar_text(\$userInput, \$b, \$percentB);

return \$percentA === \$percentB ? 0 : (\$percentA > \$percentB ? -1 : 1);
});

var_dump(\$list); //output: array("Brad", "Britney", "Bob");
``````

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2. The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

``````\$userInput = 'Bradley123';

\$list = array('Bob', 'Brad', 'Britney');

usort(\$list, function (\$a, \$b) use (\$userInput) {
\$levA = levenshtein(\$userInput, \$a);
\$levB = levenshtein(\$userInput, \$b);

return \$levA === \$levB ? 0 : (\$levA > \$levB ? 1 : -1);
});

var_dump(\$list); //output: array("Britney", "Brad", "Bob");
``````

How could the similar_text be applied to a multidimensional array? - MikeeeGeee

Great answer. What if there is no input (as in removing it from the equasion altogether)? - Jesse

Puedes usar levenshtein función

``````<?php
// input misspelled word
\$input = 'helllo';

// array of words to check against
\$words  = array('hello' 'try', 'hel', 'hey hello');

// no shortest distance found, yet
\$shortest = -1;

// loop through words to find the closest
foreach (\$words as \$word) {

// calculate the distance between the input word,
// and the current word
\$lev = levenshtein(\$input, \$word);

// check for an exact match
if (\$lev == 0) {

// closest word is this one (exact match)
\$closest = \$word;
\$shortest = 0;

// break out of the loop; we've found an exact match
break;
}

// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if (\$lev <= \$shortest || \$shortest < 0) {
// set the closest match, and shortest distance
\$closest  = \$word;
\$shortest = \$lev;
}
}

echo "Input word: \$input\n";
if (\$shortest == 0) {
echo "Exact match found: \$closest\n";
} else {
echo "Did you mean: \$closest?\n";
}

?>
``````

if you want to sort your array, you can do this:

``````\$arr = array("hello", "try", "hel", "hey hello");
\$search = "hey"; //your search var

for(\$i=0; \$i<count(\$arr); \$i++) {
\$temp_arr[\$i] = levenshtein(\$search, \$arr[\$i]);
}
asort(\$temp_arr);
foreach(\$temp_arr as \$k => \$v) {
\$sorted_arr[] = \$arr[\$k];
}
``````

`\$sorted_arr` should then be in descending order starting with the closest word to your search term.

Otra forma es usar similar_text function which returns result in percents. See more http://www.php.net/manual/en/function.similar-text.php .

While @yceruto's answer is correct and informative, I would like to extend additional insights and demonstrate more modern implementation syntax.

First about the generated scores from respective functions...

1. `levenshtein()` y `similar_text()` ARE case-sensitive so an uppercase `H` is just as much a mismatch as the number `6` cuando se compara con `h`.
2. `levenshtein()` y `similar_text()` ARE NOT multi-byte aware so an accented character like `ê` will not only be deemed a mismatch for `e`, it will potentially receive a heavier penalty based on each individual byte being a mismatch.

If you want to make case-insensitive comparisons, you can simply convert both strings to uppercase/lowercase before executing.

If your application requires multi-byte support, you should search for existing repositories that provide this functionality.

Additional techniques for those willing to research more deeply include metaphone () y soundex (), but I will not delve into these topics in this answer.

Puntuaciones:

``````Test vs "hello" |  levenshtein   |  similar_text  |   similar_text's percent   |
----------------+----------------+----------------+----------------------------|
H3||0           |       5        |      0         |       0                    |
Hallo           |       2        |      3         |      60                    |
aloha           |       5        |      2         |      40                    |
h               |       4        |      1         |      33.333333333333       |
hallo           |       1        |      4         |      80                    |
hallå           |       3        |      3         |      54.545454545455       |
hel             |       2        |      3         |      75                    |
helicopter      |       6        |      4         |      53.333333333333       |
hellacious      |       5        |      5         |      66.666666666667       |
hello           |       0        |      5         |     100                    |
hello y'all     |       6        |      5         |      62.5                  |
hello yall      |       5        |      5         |      66.666666666667       |
helów           |       3        |      3         |      54.545454545455       |
hey hello       |       4        |      5         |      71.428571428571       |
hola            |       3        |      2         |      44.444444444444       |
hêllo           |       2        |      4         |      72.727272727273       |
mellow yellow   |       9        |      4         |      44.444444444444       |
try             |       5        |      0         |       0                    |
``````

Ordenar por `levenshtein()` PHP7+ (Demo)

``````usort(\$testStrings, function(\$a, \$b) use (\$needle) {
return levenshtein(\$needle, \$a) <=> levenshtein(\$needle, \$b);
});
``````

Ordenar por `levenshtein()` PHP7.4+ (Demo)

``````usort(\$testStrings, fn(\$a, \$b) => levenshtein(\$needle, \$a) <=> levenshtein(\$needle, \$b));
``````

**Notice that `\$a` y `\$b` have changed sides of the `<=>` evaluation for DESC ordering. Darse cuenta de `hello` is not assured to be positioned as first element

Ordenar por `similar_text()` PHP7+ (Demo)

``````usort(\$testStrings, function(\$a, \$b) use (\$needle) {
return similar_text(\$needle, \$b) <=> similar_text(\$needle, \$a);
});
``````

Ordenar por `similar_text()` PHP7.4+ (Demo)

``````usort(\$testStrings, fn(\$a, \$b) => similar_text(\$needle, \$b) <=> similar_text(\$needle, \$a));
``````

Notice the difference in scoring of `hallå` y `helicopter` via similar_text()'s return value versus similar_text()'s percent value.

Ordenar por `similar_text()`'s percent PHP7+ (Demo)

``````usort(\$testStrings, function(\$a, \$b) use (\$needle) {
similar_text(\$needle, \$a, \$percentA);
similar_text(\$needle, \$b, \$percentB);
return \$percentB <=> \$percentA;
});
``````

Ordenar por `similar_text()`'s percent PHP7.4+ (Demo)

``````usort(\$testStrings, fn(\$a, \$b) =>
[is_int(similar_text(\$needle, \$b, \$percentB)), \$percentB]
<=>
[is_int(similar_text(\$needle, \$a, \$percentA)), \$percentA]
);
``````

Notice that I am neutralizing the unwanted return value of `similar_text()` by converting its return value to `true`, then using the generated `percent` value -- this allows the generation of the percent value without returning too soon since arrow function syntax does not permit multi-line execution.

Efficiently sort by `levenshtein()` then only call `similar_text()` when a tiebreak is necessary, PHP7+ (Demo)

``````usort(\$testStrings, function(\$a, \$b) use (\$needle) {
return levenshtein(\$needle, \$a) <=> levenshtein(\$needle, \$b)
?: similar_text(\$needle, \$b) <=> similar_text(\$needle, \$a);
});
``````

Efficiently sort by `levenshtein()` then only call `similar_text()` and use its percent when a tiebreak is necessary, PHP7.4+ (Demo)

``````usort(\$testStrings, fn(\$a, \$b) =>
levenshtein(\$needle, \$a) <=> levenshtein(\$needle, \$b)
?: similar_text(\$needle, \$b) <=> similar_text(\$needle, \$a)
);
``````

Personally, I never use anything but `levenshtein()` in my projects because it consistently delivers the results that I'm looking for.

