I want to convert the file column of csv to katakana using pykakasi.

Asked 2 years ago, Updated 2 years ago, 49 views

I would like to use pykakasi·pandas in Python to display columns of csv files in katakana format.
What should I do as it will result in an error?
Do you mean the version is out of date?

Error Messages

DeprecationWarning: Call to Deprecated method setMode. (Old API will be removed in v3.0.) -- Deprecated since version 2.1.
  kks.setMode("H", "k")
Traceback (most recent call last)
File "C:\Users\test\Desktop\Book.csv", line 29, in<module>
    kks.setMode("H", "k")

What do you want to do

The name (katakana) column of the CSV file below contains furigana and romaji, so
I would like to unify everything and convert it to Frigana.
Ignore empty Excel.

csv file
ダName of dummy

Name, First Name (Katakana)
Shinji Ono and Shinji Ono
Mikako Sengoku, Mikako Sengoku
Nakajima Tomoyo, Nakajima Tomoyo
,
Takemura Riyo, Takemura Riyo
Hiroko Morinaga, Hiroko Morinaga
Takashi Yashima, a Japanese woodpecker
???daka?,
Miyuki Koga, Miyuki Koga
Kumakura Kenji and Kumakura Kenji

Expected behavior

I'd like to convert my name (katakana) column to katakana.

Name, First Name (Katakana)
Shinji Ono and Shinji Ono
Mikako Sengoku, Mikako Sengoku
Tomoyo Nakajima, a black-spotted woodpecker
,
Takemura Riyo, Takemura Riyo
Hiroko Morinaga, Hiroko Morinaga
Takashi Yashima, a Japanese woodpecker
"??daka?",
Miyuki Koga, Japanese woodpecker
Kumakura Kenji and Kumakura Kenji

All Codes

import pandas as pd
from pykakasi import kakasi

# Filename
filename1 = r "C:\Users\test\Desktop\Book.csv"

# csv read dtype = object specified
df = pd.read_csv(filename1)
print(df)

# Check column specification type
print(df['Name (Katakana)'])

kks = kakasi()

# Furigana → Convert to katakana
kks.setMode("H", "k")
conv=kks.getConverter()
df['Name (Katakana)'] = df['Name (Katakana)'].apply(conv.do)
print(df)

# Roman letters → conversion to katakana
kks.setMode("a", "k")
conv=kks.getConverter()
df['Name (Katakana)'] = df['Name (Katakana)'].apply(conv.do)
print(df)

#csv Save
df.to_csv(filename1, encoding='utf_8_sig', index=False)

I look forward to your kind cooperation.

python python3 pykakasi

2022-09-30 19:27

1 Answers

Do you mean the version is out of date?

Yes.
The setMode function has been obsoleted since v3.0 in pykakasi as shown in Note in the error message and official document.

Use the convert function instead.

In the sample code below, call convert with the to_kana function and combine and return the kana representation of the morpheme.

sample code

jaconv I rewritten the romaji to a katakana version using the library.

import pandas as pd
import numpy as np
from io import StringIO
from pykakasi import kakasi
import jaconv
import re

csv=StringIO(""Name, First Name (Katakana)
Shinji Ono and Shinji Ono
Mikako Sengoku, Mikako Sengoku
Nakajima Tomoyo, Nakajima Tomoyo
,
Takemura Riyo, Takemura Riyo
Hiroko Morinaga, Hiroko Morinaga
Takashi Yashima, a Japanese woodpecker
"??daka?"
Miyuki Koga, Miyuki Koga
Kumakura Kenji and Kumakura Kenji
The middle part is the alphabet, and the bald fu is the PI.
Hello QWERTY!
English 2, The quick brown fox jumps over the lazy dog.
Symbol, !?\"#$%&'()[]{}@:`*<>.!"#$%&'()@:[]{}
""")
df = pd.read_csv(csv)

kks = kakasi()
pattern=re.compile("[A-Za-z]")
def to_kana(s):
    if pd.isna(s):
        return np.nan
    # Replace alphabets, if any
    if pattern.search(s):
        s=jaconv.alphabet2kata(s.lower())#jaconv cannot convert uppercase characters, so lower converts to lowercase characters and then replaces them
    results=kks.convert(s)
    return '' .join([x["kana"] for x in results])

df['Name (Katakana)'] = df['Name (Katakana)'].apply(to_kana)

# Pip install table required
print(df.to_markdown())

output results

The results of convert of Koga Miyuki are as follows.

[{'orig': 'Koga Miyuki', 'hira': 'Koga Miyuki', 'kana': 'Koga Miyuki', 'hepburn': 'Koga Miyuki', 'kunrei': 'Koga Miyuki', 'passport': 'Koga Miyuki'}]

The fact that romaji does not become katakana seems to be the latest version of the specification at the time of the response, so it seems that you need to use a different library to convert it to kana.


2022-09-30 19:27

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.