Here is the montoring program. It is run by cron at every 5 minutes. It call target site's target page with specific query url (because, it is searching result page). Then it parse html through hpricot lib. Hpricot is so simple and powerful library for ruby. As you will be able to see below my code is not good. To tell the truth, it's so bad. There aren't TestCase and I didn't define Object(there are several candidates of domain model). If you want to find defects in the program, you can make several pages report.
However, this is not commercial and large size program. It's just simple program that has simple goal, architecture, theory. I think if you are programmer, not just coder, you should always consider tradeoff point. If I have to spend one more days to make this program, old way(human monitoring system) is better than this. This kind of human monitoring is just annoying work, doesn't need high resources. My wife and I can watch the page by just push refresh button sometimes in a day, even it's not agile popular second hand toy purchase way. Probably, altought we add all times that we click button and watch top of the search list in a month, it would be less than one day.
So I chose easy way. Below is 'shopping_radar.rb'
$KCODE = 'U'
require 'iconv'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'activeresource'
SITE_URL = 'http://net2.i-baby.co.kr'
QUERY_URL = "#{SITE_URL}/netboard/new_netboard/business/total_search.php?search=%C1%F6%BA%D8&search_value=&idx=&s=1&w=%C1%F6%BA%D8&x=0&y=0"
# Server's default locale is UTF-8, however the target site's character encoding is EUC-KR.
# So overloaded Hpricot::Elem class to utf-8 converting is automatically fired.
class Hpricot::Elem
alias :org_inner_html :inner_html
alias :org_to_s :to_s
alias :org_get_attribute :get_attribute
def get_attribute(attr_name)
convert_utf_8(org_get_attribute(attr_name))
end
def inner_html
convert_utf_8(org_inner_html)
end
def to_s
convert_utf_8(org_to_s)
end
def convert_utf_8(org_str)
iconv = Iconv.open('UTF-8','EUC-KR')
begin
iconv.iconv(org_str)
rescue
convert_utf_8_except_irregal_charater(org_str)
end
end
def convert_utf_8_except_irregal_charater(org_str, replacement="XX")
# sometimes, the target string contains strange characters,
# because long line truncate function of the target site doesn't cut Korean character properly.
# (Korean character is 2-byte.)
org_str.split.map do |word|
begin
iconv.iconv(word)
rescue
replacement
end
end.join(' ')
end
private :convert_utf_8, :convert_utf_8_except_irregal_charater
end
class Item < ActiveResource::Base
# 127.0.0.1 is address for testing.
self.site = ARGV[0].nil? ? 'http://127.0.0.1:3000' : ARGV[0]
@saved_count = 0
alias :org_save :save
def self.saved_count=(value); @saved_count = value ; end
def self.saved_count; @saved_count; end
def save
self.org_save
Item.saved_count += 1
end
def eql?(other)
self.category == other.category and
self.title == other.title and
self.price == other.price and
self.user == other.user and
self.date == other.date and
self.detail_url == other.detail_url
end
end
def parse_item(category, item_table)
tds = item_table.search("tr/td")
view_url = tds[1].search("//td")[1].search("a").first['href']
{
:category => category,
:title => tds[1].search("//td")[1].search("a").inner_html,
:desc => tds[1].search("//td")[1].get_attribute('title').gsub(/\r\n/,'<br/>'),
:price => tds[2].children.map { |e| e.class == Hpricot::Text ? e.to_s.strip : e.inner_html }.join,
:user => (tds[3]/"a").inner_html,
:detail_url => SITE_URL + view_url,
:date => tds[4].inner_html,
:readed_count => tds[5].inner_html
}
end
def find_form(html, formname)
html.search("//form[@name='#{formname}']")
end
def all_tables_of(form)
form.search("table[@width='640']")
end
logger = Logger.new("#{File.dirname(__FILE__)}/shopping_radar.log", 2, 5*1024)
# Imported activeresource, so Logger definition of ruby is overrided by Rails' one.
# So, need to set formatter to Logger::Formatter.new to get default ruby loggers' formatting
logger.formatter = Logger::Formatter.new
begin
html = Hpricot(open(QUERY_URL))
target_forms = {
'아기 > 팝니다 > 용품' => 'delete_mart_bs_03',
'아기 > 팝니다 > 장난감' => 'delete_mart_bs_04',
'어린이 > 팝니다 > 장난감' => 'delete_mart_ks_03'
}
exist_items = Item.find(:all)
save_count = 0
target_forms.each do |category,form_name|
tables = all_tables_of( find_form(html, form_name) )
# There is empty table between actual item tables. So skip odd order table.
(0...(tables.length)).step(2) do |index|
item = Item.new(parse_item(category, tables[index]))
item.save unless exist_items.any? { |i| i.eql?(item) }
end
end
logger.info("New #{Item.saved_count} items were registered.")
rescue => err
logger.fatal("Too terrible something happend.")
logger.fatal(err)
end
I'm not ruby/rails expert, so correcting or advice is always welcome. :-)
However, this is not commercial and large size program. It's just simple program that has simple goal, architecture, theory. I think if you are programmer, not just coder, you should always consider tradeoff point. If I have to spend one more days to make this program, old way(human monitoring system) is better than this. This kind of human monitoring is just annoying work, doesn't need high resources. My wife and I can watch the page by just push refresh button sometimes in a day, even it's not agile popular second hand toy purchase way. Probably, altought we add all times that we click button and watch top of the search list in a month, it would be less than one day.
So I chose easy way. Below is 'shopping_radar.rb'
$KCODE = 'U'
require 'iconv'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'activeresource'
SITE_URL = 'http://net2.i-baby.co.kr'
QUERY_URL = "#{SITE_URL}/netboard/new_netboard/business/total_search.php?search=%C1%F6%BA%D8&search_value=&idx=&s=1&w=%C1%F6%BA%D8&x=0&y=0"
# Server's default locale is UTF-8, however the target site's character encoding is EUC-KR.
# So overloaded Hpricot::Elem class to utf-8 converting is automatically fired.
class Hpricot::Elem
alias :org_inner_html :inner_html
alias :org_to_s :to_s
alias :org_get_attribute :get_attribute
def get_attribute(attr_name)
convert_utf_8(org_get_attribute(attr_name))
end
def inner_html
convert_utf_8(org_inner_html)
end
def to_s
convert_utf_8(org_to_s)
end
def convert_utf_8(org_str)
iconv = Iconv.open('UTF-8','EUC-KR')
begin
iconv.iconv(org_str)
rescue
convert_utf_8_except_irregal_charater(org_str)
end
end
def convert_utf_8_except_irregal_charater(org_str, replacement="XX")
# sometimes, the target string contains strange characters,
# because long line truncate function of the target site doesn't cut Korean character properly.
# (Korean character is 2-byte.)
org_str.split.map do |word|
begin
iconv.iconv(word)
rescue
replacement
end
end.join(' ')
end
private :convert_utf_8, :convert_utf_8_except_irregal_charater
end
class Item < ActiveResource::Base
# 127.0.0.1 is address for testing.
self.site = ARGV[0].nil? ? 'http://127.0.0.1:3000' : ARGV[0]
@saved_count = 0
alias :org_save :save
def self.saved_count=(value); @saved_count = value ; end
def self.saved_count; @saved_count; end
def save
self.org_save
Item.saved_count += 1
end
def eql?(other)
self.category == other.category and
self.title == other.title and
self.price == other.price and
self.user == other.user and
self.date == other.date and
self.detail_url == other.detail_url
end
end
def parse_item(category, item_table)
tds = item_table.search("tr/td")
view_url = tds[1].search("//td")[1].search("a").first['href']
{
:category => category,
:title => tds[1].search("//td")[1].search("a").inner_html,
:desc => tds[1].search("//td")[1].get_attribute('title').gsub(/\r\n/,'<br/>'),
:price => tds[2].children.map { |e| e.class == Hpricot::Text ? e.to_s.strip : e.inner_html }.join,
:user => (tds[3]/"a").inner_html,
:detail_url => SITE_URL + view_url,
:date => tds[4].inner_html,
:readed_count => tds[5].inner_html
}
end
def find_form(html, formname)
html.search("//form[@name='#{formname}']")
end
def all_tables_of(form)
form.search("table[@width='640']")
end
logger = Logger.new("#{File.dirname(__FILE__)}/shopping_radar.log", 2, 5*1024)
# Imported activeresource, so Logger definition of ruby is overrided by Rails' one.
# So, need to set formatter to Logger::Formatter.new to get default ruby loggers' formatting
logger.formatter = Logger::Formatter.new
begin
html = Hpricot(open(QUERY_URL))
target_forms = {
'아기 > 팝니다 > 용품' => 'delete_mart_bs_03',
'아기 > 팝니다 > 장난감' => 'delete_mart_bs_04',
'어린이 > 팝니다 > 장난감' => 'delete_mart_ks_03'
}
exist_items = Item.find(:all)
save_count = 0
target_forms.each do |category,form_name|
tables = all_tables_of( find_form(html, form_name) )
# There is empty table between actual item tables. So skip odd order table.
(0...(tables.length)).step(2) do |index|
item = Item.new(parse_item(category, tables[index]))
item.save unless exist_items.any? { |i| i.eql?(item) }
end
end
logger.info("New #{Item.saved_count} items were registered.")
rescue => err
logger.fatal("Too terrible something happend.")
logger.fatal(err)
end
I'm not ruby/rails expert, so correcting or advice is always welcome. :-)



덧글