Ruby/Rails program for my family #2 English Posting

Here is the montoring program. It is run by cron at every 5 minutes. It call target site's target page with specific query url (because, it is searching result page). Then it parse html through hpricot lib. Hpricot is so simple and powerful library for ruby. As you will be able to see below my code is not good. To tell the truth, it's so bad. There aren't TestCase and I didn't define Object(there are several candidates of domain model). If you want to find defects in the program, you can make several pages report.

However, this is not commercial and large size program. It's just simple program that has simple goal, architecture, theory. I think if you are programmer, not just coder, you should always consider tradeoff point. If I have to spend one more days to make this program, old way(human monitoring system) is better than this. This kind of human monitoring is just annoying work, doesn't need high resources. My wife and I can watch the page by just push refresh button sometimes in a day, even it's not agile popular second hand toy purchase way. Probably, altought we add all times that we click button and watch top of the search list in a month, it would be less than one day.

So I chose easy way. Below is 'shopping_radar.rb'



$KCODE = 'U'

require 'iconv'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'activeresource'

SITE_URL
 = 'http://net2.i-baby.co.kr'
QUERY_URL = "#{SITE_URL}/netboard/new_netboard/business/total_search.php?search=%C1%F6%BA%D8&search_value=&idx=&s=1&w=%C1%F6%BA%D8&x=0&y=0"

# Server's default locale is UTF-8, however the target site's character encoding is EUC-KR.
# So overloaded Hpricot::Elem class to utf-8 converting is automatically fired.
class Hpricot::Elem
  alias :org_inner_html :inner_html
  alias :org_to_s :to_s
  alias :org_get_attribute :get_attribute

  def get_attribute(attr_name)
    convert_utf_8(org_get_attribute(attr_name))
  end

  def inner_html
    convert_utf_8(org_inner_html)
  end

  def to_s
    convert_utf_8(org_to_s)
  end

  def convert_utf_8(org_str)
    iconv = Iconv.open('UTF-8','EUC-KR')
    begin
      iconv.iconv(org_str)
    rescue
      convert_utf_8_except_irregal_charater(org_str)
    end
  end

  def convert_utf_8_except_irregal_charater(org_str, replacement="XX")
    # sometimes, the target string contains strange characters,
    # because long line truncate function of the target site doesn't cut Korean character properly.
    # (Korean character is 2-byte.)

    org_str.split.map do |word|
      begin
        iconv.iconv(word)
      rescue
        replacement
      end
    end.join(' ')
  end

  private :convert_utf_8, :convert_utf_8_except_irregal_charater
end

class Item < ActiveResource::Base
  # 127.0.0.1 is address for testing.
  self.site = ARGV[0].nil? ? 'http://127.0.0.1:3000' : ARGV[0]
  @saved_count = 0

  alias :org_save :save

  def self.saved_count=(value); @saved_count = value ; end
  def self.saved_count; @saved_count; end

  def save
    self.org_save
    Item.saved_count += 1
  end

  def eql?(other)
    self.category == other.category and
    self.title == other.title and
    self.price == other.price and
    self.user == other.user and
    self.date == other.date and
    self.detail_url == other.detail_url
  end
end

def parse_item(category, item_table)
  tds = item_table.search("tr/td")
  view_url = tds[1].search("//td")[1].search("a").first['href']
  {
    :category => category,
    :title => tds[1].search("//td")[1].search("a").inner_html,
    :desc => tds[1].search("//td")[1].get_attribute('title').gsub(/\r\n/,'<br/>'),
    :price => tds[2].children.map { |e| e.class == Hpricot::Text ? e.to_s.strip : e.inner_html }.join,
    :user => (tds[3]/"a").inner_html,
    :detail_url => SITE_URL + view_url,
    :date => tds[4].inner_html,
    :readed_count => tds[5].inner_html
  }
end

def find_form(html, formname)
  html.search("//form[@name='#{formname}']")
end

def all_tables_of(form)
  form.search("table[@width='640']")
end

logger = Logger.new("#{File.dirname(__FILE__)}/shopping_radar.log", 2, 5*1024)
# Imported activeresource, so Logger definition of ruby is overrided by Rails' one.
# So, need to set formatter to Logger::Formatter.new to get default ruby loggers' formatting
logger.formatter = Logger::Formatter.new

begin
  html = Hpricot(open(QUERY_URL))
  target_forms = {
    '아기 > 팝니다 > 용품' => 'delete_mart_bs_03',
    '아기 > 팝니다 > 장난감' => 'delete_mart_bs_04',
    '어린이 > 팝니다 > 장난감' => 'delete_mart_ks_03'
  }

  exist_items = Item.find(:all)
  save_count = 0
  target_forms.each do |category,form_name|
    tables = all_tables_of( find_form(html, form_name) )
    # There is empty table between actual item tables. So skip odd order table.
    (0...(tables.length)).step(2) do |index|
      item = Item.new(parse_item(category, tables[index]))
      item.save unless exist_items.any? { |i| i.eql?(item) }
    end
  end

  logger.info("New #{Item.saved_count} items were registered.")
rescue => err
  logger.fatal("Too terrible something happend.")
  logger.fatal(err)
end


I'm not ruby/rails expert, so correcting or advice is always welcome. :-)

트랙백

이 글과 관련된 글 쓰기 (트랙백 보내기)
TrackbackURL : http://classpath.egloos.com/tb/4697717 [도움말]

덧글

덧글 입력 영역


구글애드센스