Introduction: How to Monitor Webites for Changes
Last year I wrote a simple bash script that used wget to grab a copy of a web page, then compare it to a previous copy of that web page then notify me if there was a difference. (This was for a job posting page of a company site).
#!/usr/bin/env bash
if [[ -f new_page.html ]]; then
mv new_page.html old_page.html
fi
wget $1 -O new_page.html > /dev/null 2>&1
diff ./new_page.html ./old_page.html > /dev/null
if [[ $? -ne 0 ]]; then
printf "%s\n" "The file has changed!!!"
notify-send "The file has changed!!!"
date
exit 0
else
printf "%s\n" "No change"
notify-send "The file has NOT changed!!!"
date
exit 0
fi
Step 1: Going Deeper Into the Matrix
ping_status=$(ping -c 1 $link | grep "packet" | cut -d"%" -f1 | cut -d"," -f3 | cut -d" " -f2)
When you ping, you get a bunch of results. One of them being 'packet loss'. Typically if a webserver is up and running, it will respond to your ping. Ping can be set up to check multiple times in a row(or continuously....if this is done fast enough and by enough machines, you have a DDOS attack but we are just doing a normal ping). Each check returns a bunch of information and one of them is how many times the ping failed. (I'm simplifying here, feel free to read up for more details). This is returned as a percentage. i.e. 4 pings sent, 4 returns, 0% fail. That is what 'grep' does, finds the section that says 'packet' and then reads the % and reports it. 100% fail would be a bad thing.
Step 2: Does It Blend?
Using wget you can grab a copy of a web page from any site. What I do is grab a copy, give it a specific name of my choosing, then compare it to a previous copy. If they are different, then something needs to be looked into.
wget -o ./website_check/tmp.log $link -O ./website_check/$folder/new_page.html
Here I tell wget to output details to a log file, use the defined $link and call the grabbed page a specific name. Although I should mention that to avoid issues on the next time I do this, I do some filename shuffling:
if [ -f ./website_check/$folder/new_page.html ]
then
mv ./website_check/$folder/new_page.html ./website_check/$folder/old_page.html
fi
So from the last time I checked, I rename the file so that when I grab a new copy, there is no conflict. Then I compare the two for changes.
Step 3: Now the Geeky Part
Basically after I have all my information about the website I am monitoring, I send that in a notification to my Galaxy Tab 8.9 (any recent Android device will work, check site for details).
curl https://nma.usk.bz/publicapi/notify --silent --data-ascii "apikey=$APIKEY" --data-ascii "application=$1" --data-ascii "event=curl-event" --data-asci "description=$2" --data-asci "priority=1"
Enter in curl.
A command line tool for getting or sending files using URL syntax.
To put it simple. You can send information to websites using curl. In this case, I have an account with Notify My Android and I send the results of my ping/comparison to that account. In turn, that information gets sent to the Notify My Android app running on my Tablet.
What is really cool is that you can customize the 'subject' and 'message' to be whatever you want. In my message body of the push notification, I use yet another piece of information from the page difference. When I run the cmp command, one of the resulting information is the line number on which the first change was noticed. So I grab that info from the output file and put it into the message body!
Now we have a fully functional script that tells you if the server is down, and if it's up, it tells you if the page has changed(i.e. maybe hacked? Broken?) then send you the notification wherever you are. Pretty cool huh?
Sure, there have been commercial products that have been able to do the same but this was all done with nothing but an OS, and a free account on a website. Commercial-like results from free stuff!
There are some limitations and I am working on finding methods that will get around those limitations. For example, if you have a website that runs some sort of updating script (i.e. a twitter feed, or a live stock market index feed) then the page will always be shown as being changed whenever those feeds update. If you are interested in following this development, feel free to head on over and subscribe to my geeky blog at http://cosmopolitangeek.wordpress.com/. Full versions of the script are available here https://app.box.com/s/l2emmq9q9nhayvci25r5