MySQL collate utf8_unicode_ci does not match Suzuki

I use Rails 3.2.13+MySQL5.

Collation in MySQL is collate utf8_unicode_ci.
I didn't know this function existed, so I was surprised that I searched and matched the data in full-width katakana in full-width hiragana.

So, one customer's data was registered in half-width katakana (bad smell!?)
They still search for it in a good way, but somehow I found out that Suzuki and Suzuki don't match.

>User.where("kana LIKE'%ski%'").count
   (4.1ms) SELECT COUNT(*) FROM `users` WHERE(kana LIKE '%ski%')
=>42
>User.where("kana LIKE'% Suzuki%").count
   (2.6ms) SELECT COUNT(*) FROM `users` WHERE(kana LIKE '% Suzuki %')
= > 0
>User.where("kana LIKE'%Suzuki%'").count
   (2.6ms) SELECT COUNT(*) FROM `users` WHERE(kana LIKE '%Suzuki %')
= > 0

>pp User.where("kana LIKE'%ski%'").limit(10).map {|x|x.kana.sub(/...$/, 'xxx')}
  User Load (1.1ms) SELECT `users`.* FROM `users` WHERE(kana LIKE '%ski%') LIMIT10
["Ski xxx",
 "Ski xxx",
 "Skitage xxx",
 "Ski xxx",
 "Skiﾉxxx",
 "Scream xxx",
 "Ski xxx",
 US>"Sus xxx",
 "Skitter xxx",
 "Ski xxx"]

Why is this?I would appreciate it if you could let me know.

"By the way, ""Fukuda/Fukuda"" matches..."

User.where("kana LIKE'% factor %").count#=>9
User.where("kana LIKE'% Fukuda%'").count#=>9
User.where("kana LIKE'%Fukuda%'").count#=>9

I am curious about the recent article "Japanese Developers' Opinion on utf8_unicode_ci.

ruby-on-rails mysql rails-activerecord

2022-09-30 18:28

1 Answers

"Since ""su"" in half-width kana has two characters, ""su"" and ""cho,"" I don't think it matches ""zu""."

"I think ""Fukuda"" matches ""Fukuda"" and ""O"" matches ""%"" wildcard."

"Also, ""Suki"" and ""Suzuki"" match for =, not LIKE."
LIKE matches one character at a time, but = seems to be for normalization and matching.

http://mysql.gr.jp/mysqlml/mysql/msg/15636 may be helpful.
(Half-width kana has been converted to full-width kana, so it's a little hard to understand, but please read it with your heart's eyes.)

2022-09-30 18:28

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656