I tried to implement it using mecab, but it's hard...Is there any way?
ruby ruby-on-rails mecab
We've seen TF, IDF on the wiki and implemented it simply.
Code
file_tf = ->file { File.readlines(file).flat_map(&:split).
reduce(Hash.new(0)) {|tf,term| tf[term] += 1; tf} }
dir_tf = ->dir do
file_tfs = Dir[dir+"/*"].map(&file_tf)
file_tfs.reduce(Hash.new(0)) {|tf,f| f.keys.each {|t| tf[t] += f[t]}; tf }
end
tf = ->term,file { Math.log(file_tf[file][term] + 1) } #=> log scale freq
idt = ->term,dir,files=Dir[dir+"/*"] do
Math.log( files.size / files.map(&file_tf).count {|file| file[term] > 0 } )
end
Test
require 'rspec'
include RSpec::Matchers
term = "Starbuck"
doc_dir, file1, file2= %w(./datas ./datas/moby.txt ./datas/moby_big.txt)
expect( dir_tf[doc_dir].size ).to eq 33780
expect( file_tf[file1][term] ).to eq 22
expect( file_tf[file2][term] ).to eq 67
expect( tf[term, file1] ).to eq Math.log(22 + 1)
expect( tf[term, file2] ).to eq Math.log(67 + 1)
expect( idt[term, doc_dir] ).to eq Math.log(2 / 2)
If it's not complicated processing, why don't you implement what you need yourself.
The Hashcode site consists of Ruby on rails. I recently applied elastic search to improve my search. Please refer to Elasticsearch Gem.
© 2024 OneMinuteCode. All rights reserved.