500 errors when scraping on Heroku with Rails + Mechanize

Asked 2 years ago, Updated 2 years ago, 62 views

Prerequisites/What you want to achieve

When I enter the URL in the form, I would like to obtain information by scraping asynchronously using Mechanize, and write the information to another form using jquery.

Local operation.
When running on Heroku, the same address returns with 500 errors.
Enter a description of the image here

Problems/Error Messages you are experiencing

It works well locally, but when I put it up on Heroku, it stopped working.

Heroku addresses are as follows:
https://narou-matome.herokuapp.com/

Operation Flow

  • Fill in the form
  • Detected by jQuery
  • Send parameters in ajax
  • Move Mechanize on the controller to get novel information
  • Remove from variable on js.erb file
  • Deploy with jQuery

Local and Heroku logs.
It will be the same until I get a 500 error from Heroku.

I think the reason is that 500 errors were returned from where I tried to detect them with Mechanize.
Please let me know if there is any reason or solution.

Local Logs

Started GET"/matomes/scraping_novel?url=https%3A%2F%2Fncode.syosetu.com%2Fn5011em%2F" for 127.0.0.1 at 2018-07-0402:32:07+0900
Processing by MatomesController #scraping_novel as HTML
  Parameters: {"url"=>"https://ncode.syosetu.com/n5011em/"}
  Rendering matomes/scraping_novel.js.erb
  Rendered matomes/scraping_novel.js.erb (1.8ms)
Completed 200 OK in 3340 ms (Views: 29.7 ms | ActiveRecord: 0.0 ms)

Heroku Logs

2018-07-03T17:37:10.330369 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] Started GET"/matomes/scraping_novel?url=https%3A%2F%2Fn
code.syosetu.com%2Fn5011em%2F" for 125.12.18.156 at 2018-07-03 17:37:10+0000
2018-07-03T17:37:10.331338 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] Processing by MatomesController #scraping_novel as HTML
2018-07-03T17:37:10.331410 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] Parameters: {"url"=>"https://ncode.syosetu.com/n5011em
/"}
2018-07-03T17:37:11.088137 + 00:00 heroku [router]: at=info method=GET path="/matomes/scraping_novel?url=https%3A%2F%2Fncode.syosetu.com%2Fn50
11em%2F" host=narou-matome.herokuapp.com request_id=5e294d36-10c4-48f1-a4af-30b33ef73acffwd="125.18.156" dyno=web.1 connect=0ms service=
760ms status=500 bytes=1827protocol=https
ここ 500 errors here
2018-07-03T17:37:11.086753 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] Completed 500 Internal Server Error in 755ms (ActiveReco)
rd —0.0 ms)
2018-07-03T17:37:11.091354 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf]
2018-07-03T17:37:11.091358 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] Mechanize::ResponseCodeError (503=>Net::HTTPServiceUna
disable for https://ncode.syosetu.com/n5011em/ --unhandled response):
2018-07-03T17:37:11.091360 + 00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf]
2018-07-03T17:37:11.091362+00:00 app [web.1]: [5e294d36-10c4-48f1-a4af-30b33ef73acf] app/controllers/matomes_controller.rb:74:in`scraping_no
vel'

Source Codes Affected

[html form]

=form_for novel, :remote=>true do|f|
  .form-group
    = f.label: "Novel Address (Automatically enter novel name/summary from address)"
    = f.text_field: url, placeholder: "Novel Address", required: "required", id: "modal-novel-url", class: "form-control"

[JS]

$(document).on('turbolinks:load', function(){
  $("#get-novel-info-button").click(function(){
      $.ajax({
          url: "scraping_novel",
          type: "GET",
          data: {url:$("#modal-novel-url") .val()
                  },
          dataType: "html",
          success:function(data){
              console.log('success');
              console.log(data);
              // app/views/matomes/scraping_novel.js.erb
              // Separate the contents of the above file by the string "delimiter"
              var split_data=data.split("delimiter");
              $("#modal-novel-title").val(split_datas[0]);
              $("#modal-novel-description").val(split_datas[1]);
          },
          error: function(data){
              console.log('error');
              alert("The URL is incorrect or does not support this URL.");
          }
      });
  });
});

[controller]

def scraping_novel
    require 'mechanize'
    require 'nokogiri'

    agent=Mechanize.new
    page=agent.get(params[:url])
    @novel_title=page.at('.novel_title').inner_text
    @novel_description=page.at('#novel_ex') .inner_text

    response_to do | format |
      format.js
    end
  end

[gemfile]

gem 'mechanize'

Route Files

Rails.application.routes.drawdo
  get "matomes/scraping_novel"
  resources:novels
  resources:matomes
  device_for —users
  # For details on the DSL available with this file, see http://guides.rubyonrails.org/routing.html
  root'matomes#index'
end

[scraping.novel.js.erb]

<%=@novel_title%>delimiter<%=@novel_description%>

Tried

  • I tried POSTing GET, but it didn't work
  • require'mechanize' and require'nokogiri' did not work
  • I also tried
  • heroku restart
  • rakeassets:precompile
  • config.assets.compile=true switching between false

Please let me know if you need any other file information.
I will deal with it immediately.
Thank you for your cooperation.

ruby-on-rails heroku rubygems mechanize

2022-09-30 14:39

1 Answers

Due to the following error in the log, when I visited ncode.syosetu.com from heroku (Mechanize), I think ncode.syosetsu.com returned a 503 error (Service Temporarily Unavailable).

Mechanize::ResponseCodeError(503=>Net::HTTPServiceUnavailable for
https://ncode.syosetu.com/n5011em/ --unhandledresponse):

Therefore, there is no problem with the code, and I think it is a problem on the ncode.syosetsu.com side.
(Whether it was temporarily suspended or prohibited access via heroku, etc.)


2022-09-30 14:39

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.