method of estimating the similarity of an array

Asked 2 years ago, Updated 2 years ago, 462 views

What is the most efficient way to estimate the similarity between n arrays?

For example,

array1=['aaa', 'bbb', 'ccc', 'dd']
array2 = ['bbb', 'ccc']
array3 = ['aaa', 'ccc', 'ddd', 'eee']
array4 = ['aaa', 'bbb', 'ccc', 'ddd']
array5 = ['aaa', 'bbb', 'ccc', 'ddd', 'eee']

There is an array similar to , and I would like to find similarities to array1 (the more elements are the same, the closer they are) in arrays 2 through 5, respectively.
As shown in the example above, the number of elements in the array is not uniform.

The actual language I would like to use in this case is JavaScript, but I would appreciate it if you could give me an approach to thinking regardless of language.
Thank you for your cooperation.

javascript array

2022-09-30 21:56

1 Answers

For simplicity, we believe that only matching or not matching elements contributes to similarity.(Even if there are two similar strings, if there is a mismatch, it is simply treated as a mismatch.)

Also, since the element is a string, I think it is appropriate to base the Levenstein distance defined for the array, with the understanding that it is not seeking numerical treatment (such as an array as a vector).The Levenstein distance is defined as the value of how many times you delete/insert/replace one element when there are two arrays, and how many times you do that to match the other array.The good thing about this is that algorithms that efficiently find Levenstein distances are well known.

Also, because of the similarity, I think I want to normalize it to be exactly 1, completely inconsistent and zero, which is calculated by dividing the Levenstein distance by the longer length of the two arrays to be compared.

Under this definition, the similarity to arr1 is

arr2—0.5
arr3 —0.5
arr4:1
arr5—0.8

will be


2022-09-30 21:56

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.