Web Hosting Forum | Lunarpages
News: July 14, 2008 - New Contest! - Submit Your WordPress Theme Designs, Win BIG!
June 30, 2008 - Submit Your Site for the July 08 Site of the Month Award!
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
July 26, 2008, 05:39:13 AM


Login with username, password and session length


Pages: [1]   Go Down
  Print  
Author Topic: Using the Yahoo! Pipes Fetch Page module to make a web scraper  (Read 2524 times)
Mitch
Lunarpages Traffic Cop
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 6965



WWW
« on: December 14, 2007, 09:03:17 AM »

From Lunarforums memeber, Day:

I've written on these forums before about how to write a web scraper in PHP that can be hosted on a Lunarpages account.  But I recently discovered Yahoo! Pipes and the new Fetch Page module which allows you to make web scrapers in a visual way using drag and drop and a few rules.  The resulting scraper is hosted by Yahoo! so you don't need to write any PHP, figure out how to do cached HTTP requests from your Lunarpages account or schedule any cron jobs.

So I abandoned my previous PHP scripts and created a Pipe to make the RSS feeds for these forums.  I've written up how I did it in a tutorial.  The general techniques should be usable by anyone who needs to extract data from web pages without RSS feeds or other structured data sets.

Yahoo! Pipes Tutorial - An example using the Fetch Page module to make a web scraper

Logged

Sudija
Newbie
*
Offline Offline

Posts: 1


« Reply #1 on: April 17, 2008, 03:16:11 PM »

 Clapping Awesome tutorial on Pipes!

I have a little problem applying it to the forum I am trying to turn into feed - http://www.pigeon-chat.com/search.php?search_id=newposts
Please keep in mind I'm a total newbie in this area.
My problem kicks in when i try to cut the part that I need. No tables!
Second thing is I get a lot of junk in my output and Regex doesn't work for me, can't replace crap with nothing to keep it out.

I just want these posts to appear as an RSS feed, that's all  Rofl

Any help is appreciated!
Logged
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.3 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM