The problem: Spidering tools don't allow AJAX login authentication.
This instructable will show you how to login through an AJAX form using Python and a module called Mechanize.

Spiders are web automation programs that are becoming increasingly popular way for people to gather data online. They creep around the web gathering precious materials to fuel the most powerful web companies around. Others crawl around and gather specific sets of data to improve decision making, or infer what's currently "in", or find the cheapest travel routes.

Spiders (web crawlers, webbots, or screen scrapers) are great for turning HTML goop into some semblance of intelligent data, but we have a problem when it comes to AJAX enabled webpages that have JavaScript and cookie enabled sessions that are not navigable with the normal set of spidering tools. In this instructable we will be accessing our own member page at pubmatic.com. These steps will show you a method to follow, but your page will be different.

Have fun!

Step 1: Gather Materials

You will need to start supplementing your programming resources. You will need the following programs. Use their guides to help you install these...

Install Firebug
It's a Firefox addon

Install Python
Go to: python.orgGo to: python.org

Install the Mechanize Module
Get MechanizeGet Mechanize

Other useful Spidering tools:

I would love to check this one out more thoroughly when I finish studying ajax programming. Cool, keep the techy Instructables coming!
Maman is wreaking havoc in Washington!! lol :D
Darn. Firebug doesn't work on Firefox 3. Oh well, I've never come across an AJAX website anyways.
Nice choice of pictures. My aunt showed me one of those spider statues a few months ago when I visited her in Ottawa. (the capital of Canada) I'll let you know what I think of your Instructable once I have need for it. At present, it'd be useless to me without also figuring out how to use python-spidermonkey or writing a Javascript parser using regexes, Plex, or PLY. (The guys who run FanFiction.net are either insane, stupid, or hate their users and this extends to treating Javascript as if it's some sort of client-side PHP, so BeautifulSoup and Mechanize on their own are useless)
Cool. It's good to see some more technical programming stuff on the site! Not that everyone will appreciate this Instructable, but hopefully people who can really use this knowledge will find it through a search engine, when they're looking it up.

About This Instructable




Bio: Bilal Ghalib is interested in doing things that surprise him and inspire others. Let's create a future we want to live in together.
More by lamedust:From Crap to Craft - How to Take Advantage of Disadvantage  Robotic Music Player and Sequencer With LittleBits AKA Fruityloops IRL Prosthetics Modification for Pain Relief of Pressure Points 
Add instructable to: