I'd like to implement a document score algorithm through a morpheme analyzer and tf-idf in ruby on rails, is there any site or data I can refer to?

Asked 2 years ago, Updated 2 years ago, 103 views

I tried to implement it using mecab, but it's hard...Is there any way?

ruby ruby-on-rails mecab

2022-09-22 21:51

2 Answers

We've seen TF, IDF on the wiki and implemented it simply.

Code

file_tf = ->file { File.readlines(file).flat_map(&:split).
                        reduce(Hash.new(0)) {|tf,term| tf[term] += 1; tf} }
dir_tf = ->dir do
  file_tfs = Dir[dir+"/*"].map(&file_tf)
  file_tfs.reduce(Hash.new(0)) {|tf,f| f.keys.each {|t| tf[t] += f[t]}; tf }
end
tf = ->term,file { Math.log(file_tf[file][term] + 1) } #=> log scale freq
idt = ->term,dir,files=Dir[dir+"/*"] do
  Math.log( files.size / files.map(&file_tf).count {|file| file[term] > 0 } )
end

Test

require 'rspec'
include RSpec::Matchers

term = "Starbuck"
doc_dir, file1, file2= %w(./datas ./datas/moby.txt ./datas/moby_big.txt)

expect( dir_tf[doc_dir].size ).to eq 33780
expect( file_tf[file1][term] ).to eq 22
expect( file_tf[file2][term] ).to eq 67

expect( tf[term, file1] ).to eq Math.log(22 + 1)
expect( tf[term, file2] ).to eq Math.log(67 + 1)
expect( idt[term, doc_dir] ).to eq Math.log(2 / 2)

If it's not complicated processing, why don't you implement what you need yourself.


2022-09-22 21:51

The Hashcode site consists of Ruby on rails. I recently applied elastic search to improve my search. Please refer to Elasticsearch Gem.


2022-09-22 21:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.