scrape scrape scrape

totally half-finished thought. maybe it’ll spawn an idea for you… there’s a zillion+1 ways to scrape information from the web these days. here’s the easiest i’ve found:

require 'nokogiri'
require 'open-uri'
require 'tidy_ffi'

class CrappyScraper

	attr_accessor :doc	
	
	def search(keyword)
		@doc = Nokogiri::HTML(open("http://www.google.com/search?q=" + keyword))

		@doc.xpath('//h3/a').each do |node|
			puts node.text
		end

	end
	
	def scrape(url)
		@doc = Nokogiri::HTML(open(url))
		
		@doc.xpath('//span/a').each do |node|
  			puts node.text
		end
	end

	def write_clean(filename)
		File.open(filename, 'w') do |f| 
						doc_clean = TidyFFI::Tidy.new(@doc.to_s).clean
						f.write(doc_clean) 
		end
	end
	
	def to_s
		TidyFFI::Tidy.new(@doc.to_s).clean
	end
	
	def write(filename)
		File.open(filename, 'w') { |f| f.write(@doc) }
	end
end


x = CrappyScraper.new
x.search('cowabunga')
puts x.to_s

2 Comments

  1. gSaenz says:

    I was actually able to follow along on this one.

  2. jcran says:

    hehe, glad to hear it. admittedly, some of my posting has been pretty cryptic (this one isn’t much better), but thankfully ruby’s super-readable.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s