Instructables

Spidering an Ajax website with a asynchronous login form

Featured
The problem: Spidering tools don't allow AJAX login authentication.
This instructable will show you how to login through an AJAX form using Python and a module called Mechanize.

Spiders are web automation programs that are becoming increasingly popular way for people to gather data online. They creep around the web gathering precious materials to fuel the most powerful web companies around. Others crawl around and gather specific sets of data to improve decision making, or infer what's currently "in", or find the cheapest travel routes.

Spiders (web crawlers, webbots, or screen scrapers) are great for turning HTML goop into some semblance of intelligent data, but we have a problem when it comes to AJAX enabled webpages that have JavaScript and cookie enabled sessions that are not navigable with the normal set of spidering tools. In this instructable we will be accessing our own member page at pubmatic.com. These steps will show you a method to follow, but your page will be different.

Have fun!
 
Remove these adsRemove these ads by Signing Up

Step 1: Gather Materials

Picture of Gather Materials
You will need to start supplementing your programming resources. You will need the following programs. Use their guides to help you install these...

Install Firebug
It's a Firefox addon

Install Python
Go to: python.orgGo to: python.org

Install the Mechanize Module
Get MechanizeGet Mechanize

Other useful Spidering tools:
BeautifulSoup

I would love to check this one out more thoroughly when I finish studying ajax programming. Cool, keep the techy Instructables coming!
ax895 years ago
Maman is wreaking havoc in Washington!! lol :D
puffyfluff5 years ago
Darn. Firebug doesn't work on Firefox 3. Oh well, I've never come across an AJAX website anyways.
ssokolow6 years ago
Nice choice of pictures. My aunt showed me one of those spider statues a few months ago when I visited her in Ottawa. (the capital of Canada) I'll let you know what I think of your Instructable once I have need for it. At present, it'd be useless to me without also figuring out how to use python-spidermonkey or writing a Javascript parser using regexes, Plex, or PLY. (The guys who run FanFiction.net are either insane, stupid, or hate their users and this extends to treating Javascript as if it's some sort of client-side PHP, so BeautifulSoup and Mechanize on their own are useless)
nagutron6 years ago
Cool. It's good to see some more technical programming stuff on the site! Not that everyone will appreciate this Instructable, but hopefully people who can really use this knowledge will find it through a search engine, when they're looking it up.
Pro

Get More Out of Instructables

Already have an Account?

close

PDF Downloads
As a Pro member, you will gain access to download any Instructable in the PDF format. You also have the ability to customize your PDF download.

Upgrade to Pro today!