Automatically generated sentences that look like 〇 っぽい using the Markov chain
After installing python 3.7.2, mecab-0.996.exe, I ran learn.py at the command prompt to create a text file that automatically generates sentences using Markov analysis and morpheme analysis.
C:\Users\Desktop>python learn.py
Traceback (most recent call last):
File "learn.py", line 98, in<module>
main()
File "learn.py", line84, in main
print(''.join(sentence.split()))#need to concatenate space-split text
AttributeError: 'NoneType' object has no attribute' split'
learn.py is as follows.
#!/usr/bin/env python3
# -*-coding:utf-8-*-
from globe import iglob
import re
import MeCab
import markovify
defload_from_file(files_pattern):
"Read and merge files which matches give file pattern, prepare for parsing and return it.
"""
# read text
text=""
for path in iglob(files_pattern):
with open(path, 'r') as f:
text+=f.read().trip()
# delete some symbols
unwanted_chars = ['\r', '\u3000', '-', '|']
for uc in unwanted_chars:
text=text.replace(uc, '')
# delete aozora bunko notations
unwanted_patterns = [re.compile(r'(.*')', re.compile(r'[#.*]')]
for up in unwanted_patterns:
text = re.sub(up, '', text)
return text
def split_for_markovify(text):
"""split text to sentences by newline, and split sentence to words by space.
"""
# separate words using mecab
mecab=MeCab.Tagger()
split_text=""
# These chars right break markovify
# # https://github.com/jsvine/markovify/issues/84
breaking_chars=[
'(',
')',
'[',
']',
'"',
"'",
]
# split whole text to sentences by newline, and split sentence to words by space.
for line in text.split():
mp = mecab.parseToNode(line)
whilemp:
try:
ifmp.surface not in breaking_chars:
split_text+=mp.surface#skip if node is markovify breaking char
ifmp.surface!='. 'andmp.surface!=', ':
split_text+='#split words by space
ifmp.surface=='.'.:
splitted_text+='\n'#resent sentence by newline
except UnicodeDecodeError as:
# some times error occurrences
print(line)
finally:
mp = mp.next
return split_text
defmain():
# load text
rampo_text = load_from_file('hoge.txt')
# split text to learnable form
split_text = split_for_markovify(rampo_text)
# learn model from text.
text_model=markovify.NewlineText(split_text, state_size=5)
# ...and generate from model.
sentence=text_model.make_sentence()
print(splitted_text)
print(''.join(sentence.split()))#need to concatenate space-split text
# save learned data
with open('learned_data.json', 'w') asf:
f.write(text_model.to_json())
# later, if you want to reuse learned data...
"""
with open('learned_data.json') asf:
text_model=markovify.NewlineText.from_json(f.read())
"""
if__name__=='__main__':
main()
Also, the text file I used is a notepad that I wrote the appropriate text and saved it to hoge.txt and desktop.
Which of the following learn.py should I rewrite/add to resolve the above errors?Also, the link above says that python3 learn.py should be executed at the end, but that's my own
C:\Users\Desktop>python3 learn.py
'python3' is an internal or external command.
Not recognized as an operational program or batch file.
appears.
python mecab
What if jsvine/markovify or markovify 0.7.1 and markovify
only lowered the number to Python 3.6 series?
It seems that Python 3.6.5 is included in the reference article as well.
This post was edited based on @kunif's Comment and posted as Community Wiki.I edited this post based on @kunif's Comment and posted it as Community Wiki.
© 2024 OneMinuteCode. All rights reserved.