Introduction: Web Page Scraping Via Linux.

About: computoman.blogspot.com Bytesize articles instead of a trilogy in one post.

On of the most interesting things to do with linux is use the command line to do page scraping. You can hunt the web for the information you need without spending a a lot of time on-line. I also show you the pages where the data comes from.


Step 1: Getting the Terror Level.

Getting the threat level can be important to many people in the security business, You should not have to check a web page every time you need to know what it is.

lynx -dump "http://www.usasecure.org/threat.php" > threatlevel
cat threatlevel | grep "Current Threat Level:" > tl

cat tl

To send to twitter add:

twidge update < tl

$ ./gtl.sh
Current Threat Level: ELEVATED CONDITION

$

Step 2: Get the Weather.

Might good to know the weather in areas you plan to visit or have operations depending on the weather.

gwp

# Get today's weather
ZIP=$1

echo -n "The weather for $ZIP: "

# in the pm use tonight instead of today.

elinks "http://www.weather.com/weather/print/$ZIP" > weather ; cat weather | grep Today


$ ./gwp 77331
Today Partly Cloudy / Wind 93°/71° 10 %

Update:

gwp1a:
[code]
ZIP=$1

echo "The weather for $ZIP:"

elinks "http://www.weather.com/weather/print/$ZIP" > weather ; cat weather | grep Today
elinks "http://www.weather.com/weather/print/$ZIP" > weather ; cat weather | grep Tonight
[/code]

$ ./gwp1 77331

The weather for 77331:

Tonight Isolated T-Storms 72° 30 %

Yet another way of doing it.


gw:
[code]
#!/bin/bash
# weather.bash
#desc Find current weather stats for your zip code
#desc Ex: ${trig}weather 03301
# weather 1.1 -Crouse
# With Updates by Jeo
# Modified to run stand alone by Brian Masinick
# Example: !weather 03301
# Usage: weather + zipcode

zipcode=$1
if [ -z "$zipcode" ]; then
echo "Please provide a zip code (Ex: weather 03301)"
else
unset response
# Add a backslash (\) after -dump-width 300 if this line splits
# across two lines; Should be one distinct line:
WEATHER="$(elinks -dump -dump-width 300 "http://mobile.wunderground.com/cgi-bin/findweather/getForecast?query=${zipcode}" | grep -A16 Updated)"

if [ -z "$WEATHER" ]; then
response="No Results for $zipcode"
echo "${response}"
else
response[1]="$(echo "$WEATHER" | grep -Eo 'Observed.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print "Weather: " $1}')"
response[2]="$(echo "$WEATHER" | grep -Eo 'Updated.*' |sed s/\ *\|\ */\|/g |awk -F\| '{print $1}')"
response[3]="$(echo "$WEATHER" | grep -Eo 'Temperature.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}' | sed s/DEG/\ /g )"
response[4]="$(echo "$WEATHER" | grep -Eo 'Windchill.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}'| sed s/DEG/\ /g)"
response[5]="$(echo "$WEATHER" | grep -Eo 'Wind .*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[6]="$(echo "$WEATHER" | grep -Eo 'Conditions.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[7]="$(echo "$WEATHER" | grep -Eo 'Humidity.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[8]="$(echo "$WEATHER" | grep -Eo 'Dew.Point.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}'| sed s/DEG/\ /g)"
response[9]="$(echo "$WEATHER" | grep -Eo 'Pressure.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"

for index in `seq 1 9`; do
if [ -n "${response[$index]}" ]; then
echo "${response[$index]}"
fi
let "index = $index + 1"
done
fi
[/code]

$ ./gw 77331
Weather: Observed at Wolf Creek Air Cond., Coldspring, Texas
Updated: 12:52 AM CDT on June 22, 2011
Temperature: 78.9°F / 26.1°C
Wind: WNW at 0.0 mph / 0.0 km/h
Conditions: Overcast
Humidity: 70%
Dew Point: 68°F / 20°C
Pressure: 29.90 in / 1012.4 hPa (Rising)
fi
fi

Step 3: The Horoscope.

You may want to about yourself, partner, employee, or even boss how their day might go.

ghp:

1 # Get today's horoscope
2 hsign="VIRGO"
3 lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" > hscope ; cat hscope | grep $hsign

$ ./ghp
VIRGO (Aug. 23-Sept. 22). Pride of ownership applies to all of your possessions, and you’ll take care that they sparkle, shine and really work. Tonight, you’ll be reminded how much you cherish and need plenty of space to do your thing.

Variation:
ghp1:

1 # Get today's horoscope
2 hsign=$1
3 date +%D
4 echo -n "Today's horoscope for:"
5 lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" > hscope ; cat hscope | grep $hsign

$ ./ghp1 PISCES
05/31/11
Today’s horoscope for: PISCES (Feb. 19-March 20). There are so many people who are trying to do what you already do so well. You really are doing the world a disservice unless you share what you know. In your heart, you are a teacher.

Yet another way to do the same thing.

wghp:
[code]
# Get today's horoscope
hsign=$1
wget -q "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html"
cat horoscopes-by-holiday.html | grep $hsign > hscope
cat hscope | sed 's/...\(.*\)..../\1/'
rm horoscopes-by-holiday.html
[/code]

$ ./wghp CANCER
CANCER (June 22-July 22). When your emotional needs are met, you feel physically strong and able. The affection and attention of a loved one will have a positive effect on your health.

--------------------------------------------------------------

Update: Added a graphic. (included in zip file).

$ ./ghp2 cancer
--------------------------------------------
.--.
/ _`. Cancer- The Crab
(_) ( )
'. /
`--'
09/25/11
Today's horoscope for: CANCER (June 22-July 22). You'll hear from someone you were not expecting to hear from. In your excitement, you could forget to ask what you want to know about the events that have occurred since your last visit with this person.
--------------------------------------------

ghp2
[code]
# Get today's horoscope
echo "--------------------------------------------"
hsign=$1
# characters wide
cw=60
hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"
cat $hsign
date +%D
echo -n "Today's horoscope for:"
lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" | grep $hsign | fold -sw $cw
echo "--------------------------------------------"
[/code]

If can not extract the zip file, then here are the graphics.

============================================
.-. .-.
(_ \ / _) Aries- The Ram
|
|

. .
'.___.' Taurus- The Bull
.' `.
: :
: :
`.___.'

._____.
| | Gemini- The Twins
| |
_|_|_
' '

.--.
/ _`. Cancer- The Crab
(_) ( )
'. /
`--'

.--.
( ) Leo- The Lion
(_) /
(_,

_
' `:--.--.
| | |_ Virgo- The Virgin
| | | )
| | |/
(J

__
___.' '.___ Libra- The Balance
____________


_
' `:--.--.
| | | Scorpius- The Scorpion
| | |
| | | ..,
`---':

...
.': Sagittarius- The Archer
.'
`..'
.'`.

_
\ /_) Capricorn- The Goat
\ /`.
\ / ;
\/ __.'



.-"-._.-"-._.- Aquarius- The Water Bearer
.-"-._.-"-._.-



`-. .-' Pisces- The Fishes
: :
--:--:--
: :
.-' `-.

also to get the weekly version:

gh2p:
[code]
# Get today's horoscope
echo "--------------------------------------------"
# character width
cw=60
hsign=$1
hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"
cat $hsign
echo -n "Today's date: "
date +%D
echo "Today's horoscope for:"
lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/holiday-mathis-weekly.html" | grep $hsign | fold -sw $cw
echo "--------------------------------------------"
[/code]

---------------------------------------------------------------
get weekly horscope

# Get the weekly horoscope
echo "--------------------------------------------"
# character width
cw=60
hsign=$1
if [ $# -lt "1" ];
then hsign="Virgo"
fi
hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"
cat ~/signs/$hsign
echo -n "Today's date: "
date +%D
echo "Weekly horoscope for:"
lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/holiday-mathis-weekly.html" | grep $hsign | fold -sw $cw
echo "--------------------------------------------"

Step 4: Replacement Horscope Script.

Look at astrology as an intellectual cartoon and an insight into human
thinking. So I will peek at it once in a while. Also gave a chance to play with page scraping again.

$ ./horoscope.sh Virgo

Daily Horoscope for Tuesday 19th May 2015

Share :

Through friends of someone close, you could learn more about their

background. This extra information, particularly if it's related to how

they acquired their qualifications, and the friendships they made en

route, may not be something you wish to discuss with others, but might go

some way towards explaining why they are pulled towards certain

geographical locations. This might even impact on decisions being taken

now for travel in a couple of months time.

VIRGO

---------------------------------------------

Wrote a script to pull the daily horoscope for a particular sign. The site we are getting the data from has changed. So that led me to go to another site for the time being. Actually it seems a blessing in disguise because now we can get more than the daily listing. Here is the original script.

Original script

[code]

#===================================

# Get today's horoscope

# get sign

hsign=""

read -p "Enter your horscope sign: " hsign

if [ -z $hsign ]

then hsign="virgo"

fi

# hsign=$(zenity --entry \

# --title="Daily Horoscope" \

# --text="Enter your _sign:" \

# --entry-text "$hsign")

#-------------------------------------------

# output data

# character width required for information box

cw=38

#create data file (datadir and file name can be changed to your needs.

datadir="/home/eddie/bin/signs"

filename="$datadir/th"

# make sure hsign is uppercase

hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"

cat $datadir/$hsign > $filename

echo -n "Today's date: " >> $filename

date +%D >> $filename

echo "Today's horoscope for:" >> $filename

lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" | grep $hsign | fold -sw $cw >> $filename

#output

# zenity --text-info --filename=$filename

cat $filename

#remove unneeded file

rm $filename

[/code]

Was not sure how to do the script, but the I remembered the old CNN script we used. Minor changes and so we are back with the scopes again. To invoke the script you would use ./horoscope.sh signname

$ ./horoscope.sh Virgo

First letter must be capitalized and the rest lower case.

New script

[code;]

####################################

# Horoscope Grabber

#

#===============================

# Assignments

# --------------------------------

datafile="horoscope.txt"

let "flag = 0"

# end assignments

#=================================

#

# Get data file

#---------------------------------

elinks -dump "http://www.horoscopes.co.uk/$1/Daily-Horoscope.php" > $datafile

#=================================

#

# Extract and display data

#---------------------------------

while read line

do fdata[$a]=$line

echo $line | grep -q "Daily Horoscope"

if [ $? -eq 0 ]; then

# header

clear

let "flag = 1"

fi

if [ $flag -eq 1 ]; then

echo $line | grep -q "$1"

if [ $? -eq 0 ]; then

let "flag = 0"

else

echo $line | grep -q "IMG"

if [ $? -eq 0 ]; then

let "response = donothing"

else

echo $line | sed 's/\[.*\]//'

fi

fi

fi

let "a += 1"

done < $datafile

# footer

echo ---------------------------------------------

echo

#===================================

# End.

####################################

[/code]

Step 5: Language Translation.

See also: http://www.soimort.org/translate-shell/

Google Translate CLI Lets You Translate Text From The Command Line

Author: Andrew | Date: Wednesday, March 12, 2014

Google Translate CLI is a tool that lets you use translate text from the command line using Google Translate.

The tool should work on Linux, OS X, FreeBSD, NetBSD, OpenBSD and Windows (Cygwin or MinGW) and its only requirement is GNU awk.

Install Google Translate CLI

To install Google Translate CLI, firstly make sure you wave gawk installed (for the commands below, you'll also need wget, or you can use Git instead if you want).

In Ubuntu, install them using the following commands:

sudo apt-get install gawk wget

Then, install Google Translate CLI:

$ cd /tmp

$ wget https://github.com/soimort/google-translate-cli/a...

$ tar -xvf master.tar.gz cd google-translate-cli-master/

$ sudo make install

Or, see the official installation instructions.

Usage

By default, Google Translate CLI translates text from any language into English so simply run "trs" followed by some text in a terminal to translate the text into English: trs "some text"Example:

$ trs "traducir texto de la línea de comandos utilizando

Google Translate CLI" translate text from the command line using Google Translate CLIInstead of entering some text to translate, you can also enter the path to a text file.

Google Translate CLI supports setting both the source language (the language of the source text) as well as the target language (the language to translate the source text into): trs {SOURCE=TARGET} "TEXT TO TRANSLATE"replacing "SOURCE" with the language code for the source language and "TARGET" with the language code for the target language.

For a complete list of language codes used by Google Translate, see THIS page.

Example: translate "traducir texto de la línea de comandos utilizando Google Translate CLI" from Spanish to English and French: $ trs {es=en+fr} "traducir texto de la línea de comandos utilizando Google Translate CLI" translate text from the command line using Google Translate CLI traduire le texte de la ligne de commande en utilisant Google Translate CLIExample 2: translate "hola mundo" from Spanish to English, Romanian, German and Italian:

$ trs {es=en+ro+de+it} "hola mundo" hello world Bună ziua lume Hallo Welt ciao mondo

You don't have to enter both the source and the target language. For instance, you can enter just the target language and let Google Translate guess the source language. Example: $ trs {=en+ro+de+it} "hola mundo" hello world Bună ziua lume Hallo Welt ciao mondo

It's important to note that you shouldn't use special characters (such as "!", "$", "`" or "\") inside the text to be translated, without escaping them. Also, don't use "[" and "]" as this will cause Google Translate CLI to fail.


No. Language Name Native Language Name Code
1 Afrikaans Afrikaans af 2 Albanian Shqip sq 3 Arabic 91(J ar 4 Armenian @aue�gv hy 5 Azerbaijani "01('�,'F /�D� az 6 Basque Euskara eu 7 Belarusian 5;0@CA:0O be 8 Bulgarian J;30@A:8 bg 9 Catalan Català ca 10 Chinese (Simplified) -��S zh-CN 11 Chinese (Traditional) -�A� zh-TW 12 Croatian Hrvatski hr 13 Czech eatina cs 14 Danish Dansk da 15 Dutch Nederlands nl 16 English English en 17 Estonian Eesti keel et 18 Filipino Filipino tl 19 Finnish Suomi fi 20 French Français fr 21 Galician Galego gl 22 Georgian ������� ka 23 German Deutsch de 24 Greek Ελληνικ� el 25 Haitian Creole Kreyòl ayisyen ht 26 Hebrew ����� iw 27 Hindi 9?(M&@ hi 28 Hungarian Magyar hu 29 Icelandic Íslenska is 30 Indonesian Bahasa Indonesia id 31 Irish Gaeilge ga 32 Italian Italiano it 33 Japanese �,� ja 34 Korean \m� ko 35 Latvian Latvieau lv 36 Lithuanian Lietuvis kalba lt 37 Macedonian 0:54>=A:8 mk 38 Malay Malay ms 39 Maltese Malti mt 40 Norwegian Norsk no 41 Persian A'13� fa 42 Polish Polski pl 43 Portuguese Português pt 44 Romanian Român ro 45 Russian CAA:89 ru 46 Serbian !@?A:8 sr 47 Slovak Sloven ina sk 48 Slovenian Slovensko sl 49 Spanish Español es 50 Swahili Kiswahili sw 51 Swedish Svenska sv 52 Thai D" th 53 Turkish Türkçe tr 54 Ukrainian #:@0W=AL:0 uk 55 Urdu '1/H ur 56 Vietnamese Ti�ng Vi�t vi 57 Welsh Cymraeg cy 58 Yiddish �t��� yi

Step 6: Get "Instructables" User Stats.

Finally able to grab user stats from the command line.

Usage:: ./istats.sh username

So if I use my usermame then:

$ ./istats.sh computothought
     * Date Joined:Dec 16, 2007
     * Instructables:121
     * Total Views:140,055
     * Featured %:1%
     * Subscribers:38
     * Answers:35
     * Topics:3
     * Comments:468
$ _

Here is the code:

istats.sh
[code]
lynx -width 1000   -dump "https://www.instructables.com/member/$1/" > istats
cat istats | grep  " * Date Joined:"
cat istats | grep  " * Instructables:"
cat istats | grep  " * Total Views:"
cat istats | grep  " * Featured %:"
cat istats | grep  " * Subscribers:"
cat istats | grep  " * Answers:"
cat istats | grep  " * Topics:"
cat istats | grep  " * Comments:"
[/code]


Step 7: Coming Soon:

We are going to combine the instrictable with https://www.instructables.com/id/Simple-linux-commands-from-a-web-page/ so you do not have to remember all the commands and you can access the commands from a web page. It's here: https://www.instructables.com/id/Web-page-scraping-fromto-a-web-page/

Step 8: Get Weather Update:

Getweather update:

[code]
#!/bin/bash
# forecast
#desc Find current weather stats and forecast for your zip code
#desc Ex: forecast 03301
# weather 1.1 -Crouse
# With Updates by Jeo
# Modified to run stand alone by Brian Masinick,
# and also added the forecast logic contributed by Daenyth.
# NOTE: This tool uses the elinks and links text web browsers (if you don't have both,
# adjust accordingly)
# Example: forecast 03301
# Usage: forecast zipcode

zipcode=$1
if [ -z "$zipcode" ]; then
echo "Please provide a zip code (Ex: weather 03301)"
else
unset response
# Should be one distinct line (using repeated slashes to help):
######################################################################################
./getForecast?query=${zipcode}" | grep -A16 Updated)"

if [ -z "$WEATHER" ]; then
response="No Results for $zipcode"
echo "${response}"
else
response[1]="$(echo "$WEATHER" | grep -Eo 'Observed.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print "Weather: " $1}')"
response[2]="$(echo "$WEATHER" | grep -Eo 'Updated.*' |sed s/\ *\|\ */\|/g |awk -F\| '{print $1}')"
response[3]="$(echo "$WEATHER" | grep -Eo 'Temperature.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}' | sed s/DEG/\ /g )"
response[4]="$(echo "$WEATHER" | grep -Eo 'Windchill.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}'| sed s/DEG/\ /g)"
response[5]="$(echo "$WEATHER" | grep -Eo 'Wind .*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[6]="$(echo "$WEATHER" | grep -Eo 'Conditions.*' | sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[7]="$(echo "$WEATHER" | grep -Eo 'Humidity.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"
response[8]="$(echo "$WEATHER" | grep -Eo 'Dew.Point.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}'| sed s/DEG/\ /g)"
response[9]="$(echo "$WEATHER" | grep -Eo 'Pressure.*' |sed s/\ *\|\ */\|/g | awk -F\| '{print $1 ": " $2}')"

for index in `seq 1 9`; do
if [ -n "${response[$index]}" ]; then
echo "${response[$index]}"
fi
let "index = $index + 1"
done
fi
fi

# This section of code was written by Daenyth.

DEFAULTZIP=03301

getforecast() {
echo "Your 10 Day Weather Forecast as follows:"
echo "Day, Weather, High/Low (F), Precip. %"
links -dump "http://www.weather.com/weather/print/$1" | perl -ne '/\d %\s+$/ && s/DEG//g && print'
echo ""
}

if [ $# -eq 1 ]; then
if (echo "$1" | egrep -q '^[0-9][0-9][0-9][0-9][0-9]$'); then
getforecast $1
fi
else
getforecast $DEFAULTZIP
fi


/code]

Current forecast:

Weather: Observed at Concord, New Hampshire
Updated: 9:45 PM EDT on May 07, 2009
Temperature: 55.3°F / 12.9°C
Wind: NNW at 0.0 mph / 0.0 km/h
Conditions: Overcast
Humidity: 97%
Dew Point: 54°F / 12°C
Pressure: 29.68 in / 1005.0 hPa (Steady)
Your 10 Day Weather Forecast as follows:
Day, Weather, High/Low (F), Precip. %
Tonight Showers Early 47 30 %
Fri Showers 69/50 40 %
Sat Partly Cloudy 79/50 10 %
Sun Few Showers / Wind 64/42 30 %
Mon Partly Cloudy 60/40 20 %
Tue Showers 63/40 40 %
Wed Sunny 67/44 10 %
Thu Cloudy 68/45 20 %
Fri Showers 71/44 60 %
Sat Showers 69/44 60 %

Step 9: Yet Another Weather Scraper.

My stepfather was into meteorology. In face he was a weatherman in the military, He got us into it. So I like to looke at a minimal weather report. Batch files make it easier for servers to display data as needed. Have an nslu2 running linux with very low resources. So a batch file is perfect for grabbing data off the web. You can also use it for mobile devices.  That also means you can insert the data grabbed into a database for later research,

Actually I have been doing weather scraping for a while, This is probably almost the tenth script I have written.  From the command line it might look like this:

~$ ./gwp2.sh 22546
The weather for 22546 on Sep 19:
   Updated: 7:05 AM EDT on September 19, 2013
   Observed at Mantico Hill, Beaverdam, Virginia
   Temperature         47.8°F / 8.8°C
   Humidity            98%
   Dew Point           47°F / 8°C
   Windchill           48°F / 9°C
   Wind Gust           0.0 mph / 0.0 km/h
   Pressure            30.19 in / 1022 hPa (Rising)
   Conditions          Mostly Cloudy
   Visibility          10.0 miles / 16.1 kilometers
   UV                  0.0 out of 16
   Clouds              Mostly Cloudy (BKN) : 5500 ft / 1676 m
   Yesterday's Maximum 74°F / 23°C
   Yesterday's Minimum 49°F / 9°C
   Sunrise             6:55 AM EDT
   Sunset              7:11 PM EDT
   Moon Rise           7:09 PM EDT
   Moon Set            7:01 AM EDT
   Moon Phase          Moon Phase
                       Full Moon
   Raw METAR           METAR KEZF 191055Z AUTO 00000KT 10SM BKN055 10/10 A3019 RMK AO2 T00950095

~$


The script to grab the data is fairly straight forward.  You pull the whole page off the web and then extract data as needed.

[code]
zip=$1
tmon=$(date +"%b")
tday=$(date +"%d")
echo "The weather for $zip on $tmon $tday:"
lynx -width 1000 -dump "http://m.wund.com/cgi-bin/findweather/getForecast?brand=mobile&query=$zip" > weather
cat weather | grep "Updated"
cat weather | grep "Observed"
cat weather | grep "Temperature"
cat weather | grep "Humidity"
cat weather | grep " Dew Point"
cat weather | grep "Wind" | head -1
cat weather | grep "Wind Gust" | head -1
cat weather | grep "Pressure"
cat weather | grep "Conditions" | head -1
cat weather | grep "Visibility"
cat weather | grep "UV"
cat weather | grep "Clouds"
cat weather | grep "Yesterday's Maximum"
cat weather | grep " Yesterday's Minimum"
cat weather | grep "Sunrise"
cat weather | grep "Sunset"
cat weather | grep "Moon Rise"
cat weather | grep "Moon Set"
cat weather | grep -A1 "Moon Phase"
cat weather | grep "Raw METAR"
[/code]

Hope this lille program helps someone.

Step 10: Get Revision3 Hak5 Episodes List

Couple of dumb scripts  using Hak5.
Script 1
################################################################
#   New hak5 episode?
#
file1=hak5episodes
file2=hak5episodesold
cp $file1 $file2
elinks "revision3.com/hak5/episodes"  > $file1
# diff_file=diffed
# diff  $file1 $file2 | grep "<" | sed 's/^<//g' > $diff_file
# cat diff_file
I=`wc -c $file1 | cut -d' ' -f1`
J=`wc -c $file2 | cut -d' ' -f1`
if [ $I -ne $J ]
then
echo new episode
echo new episode at $date > hak5lastupdate
else
echo no new episode
fi
------------------------------------------------
Hak5 episodes
------------------------------------------------
All Episodes
* Point to Point Pineapple mesh continued. Decibels to Watts,
antenna polarization, "cable loss" and why HAMS get all the good...
Point to Point Pineapple Mesh Continued and Syncing with
GoodSync
* Learn the ins and outs of EIRP, 2.4 GHz and the legal way to
balance radio output with antenna gain. This episode is...
Legally build a 60 Watt WiFi Link - 2.4 GHz and EIRP
* This week we go behind the scenes at the  Studio during our
recent studio upgrades. Also Shannon explores some of the...
Upgrading the Studio and Chromecast Tricks
* This week Darren interviews Craig Heffner on his research in to
backdoors on routers. Also find Shannon dices into Seafile...
Discovering Hidden Backdoors In Home Routers And Storing Data With
Seafile
* Darren meets Paul McMillan to see the whole internets VNC servers
in 16 minutes. Also find new was to connect to your phone...
Hidden Device Inputs Revealed!
* Wireless Packet Sniffing!!! Tracking vehicle Tire Pressure Sensor
data with Jared Boone and open source software defined...
Tracking Cars Wirelessly And Intercepting Femtocell Traffic
* Exploring the software development for the castAR with Rick
Johnson. Also seeing the hardware side of castAR with Jeri...
Creating Virtual Worlds With castAR
* The new WiFi Pineapple Mark V is unveiled at this special
launch event featuring Darren, Sebastian, Eighty of Dual...
The New WiFi Pineapple Mark V
* Session Hijacking with Raphael Mudge of Armitage, Disk Forensic
from Raspberry Pi and Custom Hacker Linux Distros from...
Derbycon 2013 Continues and Enterprise Cloud Syncing
* This time on , Darren speaks with RenderMan at Derbycon 2013
on vulnerabilities in the nextgen Air Traffic Control...
Secure Messaging and Air Traffic Control Hacking
* Syncing files with BitTorrent Sync and alternative ways to Sneaker
Net files using optics and audio! All that and more, this...
Alternative Sneaker Nets and BitTorrent Syncing
* Cheap Kali Linux Laptop with a Raspberry Pi, a Lapdock and Custom
Cables - Shannon Morse reports. Then, Persistently...
Kali Linux Raspberry Pi Laptop and Hijack Windows Password
* The latest NSA leaks outline a massive program against internet
encryption. What is safe anymore? Can you trust PGP? How do...
Setup Your Own Private Cloud and Air Gaps
* Cracking Windows passwords in 15 seconds or less with a special
USB Rubber Ducky firmware and mimikatz. Build your own...
Install OwnCloud and Cracking Passwords with a Rubber Ducky
* Windows exfiltration with a USB thumb drive and a USB Rubber
Ducky and Benchmarking Your Linux Systems. All that and more...
How to Benchmark Your Linux System And Exfiltration Ducky
Attacks
* Running the occasional Windows program with out cramping your
Linux lifestyle, Windows exfiltration with the USB Rubber...
What's Up with the Duck?
-------------------------------------
Script 2:
####################################
# Latest Hak5 episodes
#
#===============================
# Assignments
# --------------------------------
datafile="hak5episodes"
a=1
flag=0
# end assignments
#=================================
#
# Get data file
#---------------------------------
elinks "revision3.com/hak5/episodes"  > $datafile
#=================================
#
# Extract and display data
#---------------------------------
while read line
do fdata[$a]=$line
echo $line | grep -q "All Episodes"
if  [ $? -eq 0 ]; then
# header
clear
echo
echo ------------------------------------------------
echo  Hak5 episodes
echo ------------------------------------------------
echo ""
let "flag = 1"
fi
echo $line | grep -q "Load More"
if [ $? -eq 0 ]; then
let "flag = 0"
else
if [ $flag -eq  1 ] ;  then
echo $line | sed 's/\[.*\]//' | sed 's/\Hak5//'
fi
fi
let "a += 1"
done < $datafile
# footer
echo ---------------------------------------------
echo
#===================================
# End.
####################################
-----------------------------------------------------------------

Step 11: Instructables Counts Part 2 of 2.

Picture of Getting instructable counts.

Screenshot-Instructables - Make, How To, and DIY - Mozilla Firefox-1.png

Screenshot.png


scrapeita.png



Note with the change in the way instructables.com now does web pages, I will probably have to redo this instructable.

Lets say you wanted to know how several instructables are doing. Did not take the time to make it with a gui. that is your homework. Just picked a few instructables from the first page as an example. You will want to create a data file with the urls or the web address of the instructables you have chosen. Right click on the links and copy link location and then paste it in your editor using the data file(Please see the prior instructables on page scraping if you have any questions).

idata: (Note: do not type in "[data]" or [/data] or you will get an error)
[data]
https://www.instructables.com/id/Program-an-ATtiny...
https://www.instructables.com/id/Gut-Check-Fridge-...
https://www.instructables.com/id/Air-quality-ballo...
https://www.instructables.com/id/Sun-Bottles/
https://www.instructables.com/id/Wrap-around-workb...
https://www.instructables.com/id/Solar-PV-tracker/
[/data}

Then you need to create a program file to collect the data via web scraping. I did not go to the trouble to make it gui for simplicities sake.

iget.sh
[code]
#================================
#
# Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
datafile="idata"
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo "The views for $dj on $tmon $tday:"
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
# uncomment the following line if you want to see the url and or views
echo -n "$theurl"
# get total views
# elinks "$theurl" | grep "Total Views"
# get all the info
elinks "$theurl" | grep -m 2 Views
# just get numbers
# elinks "$theurl" | grep "Total Views" | cut -c 16-25
# Un remark the next line if you want it to be a bit more readable
# echo ""
done < $datafile
[/code]

Make it a program:
$ chmod +x iget.sh

Run it:
$ ./iget.sh
The views for on Oct 06:
https://www.instructables.com/id/Program-an-ATtiny...
Total Views: 587
Today Views: 95
https://www.instructables.com/id/Gut-Check-Fridge-...
Total Views: 618
Today Views: 608
https://www.instructables.com/id/Air-quality-ballo...
Total Views: 54,833
Today Views: 216
https://www.instructables.com/id/Sun-Bottles/
Total Views: 43,876
Today Views: 17
https://www.instructables.com/id/Wrap-around-workb...
Total Views: 15,157
Today Views: 12
https://www.instructables.com/id/Solar-PV-tracker/
Total Views: 107,243
Today Views: 46
$ _

The following will save everything to a file if your want.
$ ./iget.sh >> datafile

A real time saver if you have many many instructables and do not want to go through each page to get the data. Follow up instructable: https://www.instructables.com/id/Getting-instructable-counts-continued/

Warning: Data may not be always up to date.

============================================================================

Mswindows:
-----------------------------------
Software needed:
Browser:
Elinks:
http://www.paehl.com/open_source/?TextBrowser_for_Windows:ELINKS_an_other_textbrowser

Grep
Grep from unxutils
http://downloads.sourceforge.net/project/unxutils/unxutils/current/UnxUtils.zip?r=&ts=1331135481&use_mirror=iweb

Qbasic from Microsoft.
http://www.microsoft.com

Winzip:
http://www.winzip.com/win/en/downwz.htm

------------
You will want to create a datafile with the urls of the instructables you want to check on:

idata: (Note: do not type in "[data]" or [/data] or you will get an error)
[data]
https://www.instructables.com/id/Program-an-ATtiny...
https://www.instructables.com/id/Gut-Check-Fridge-...
https://www.instructables.com/id/Air-quality-ballo...
https://www.instructables.com/id/Sun-Bottles/
https://www.instructables.com/id/Wrap-around-workb...
https://www.instructables.com/id/Solar-PV-tracker/
[/data}

Here is the code. you will want to make a program file called scrape.bas. You will run it from qbasic.

scrape.bas(just use the lines between [code] and [/code])
[code]
OPEN "idata" FOR INPUT AS #1
while not (eof(1))
INPUT #1, a$
PRINT a$
b$ = "elinks " + a$ + " | grep Views:"
SHELL b$
PRINT
wend
close #1
system
[/code]

Note: If you have freebasic for mswindows or freebasic for linux. the code will work on either machine. I suppose it would work on a Mac also if you had the appropriate basic language compiler. Love portable code!
Once you have created all the files and the prorgams you downloaded are accessible from the directory, you should be able to get a print out.

c:\> qbasic /run scrape.bas

Afterthought: probably could of used lynx instead.....



========================================================


Temp fix:

#================================
#
# Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
total=0
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
echo "$theurl"
# echo -n "$theurl'" >> $outfile
# get total views
# count=$(elinks "$theurl" | grep -m 1 "hits-count" | sed 's/[^0-9]*//g')
count=$(elinks "$theurl" | grep -m 1 "views" | sed 's/[^0-9]*//g')
# let total=$total+$count
echo "$count" >> $outfile
done < $datafile
# echo "total: $total" >> $outfile
zenity --text-info --filename=$outfile

Step 12: Instructable Scounts Part 2

Picture of Getting instructable counts. (continued)
Screenshot-Select a iurl file to read.png
Screenshot-Text View.png
Screenshot-Untitled 1 - OpenOffice.org Calc.png
Notice: Becuase Instrucables has changed their web pages again, this instructable will not work. I am working on a fix.

In the last instructable we just displayed the data. Now we will modify the code and then use the code to get the data and the counts for importing into a spreadsheet . Sorry I did not use an icon. You can do that.

Original data:

idata: (not do not type in "[data]" or [/data] or you will get an error)
[data]
https://www.instructables.com/id/Program-an-ATtiny-with-Arduino/
https://www.instructables.com/id/Gut-Check-Fridge-a-Tweeting-and-Facebooking-Fri/
https://www.instructables.com/id/Air-quality-balloons/
https://www.instructables.com/id/Sun-Bottles/
https://www.instructables.com/id/Wrap-around-workbench-under-100/
https://www.instructables.com/id/Solar-PV-tracker/
[/data}

We now have a modified program:

gidata2ss.sh:
[code]
#================================
#
# Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
# uncomment the following line if you want to see the url and or views
echo "$theurl"
# get total views
# elinks "$theurl" | grep "Total Views"
# get all the info
# elinks "$theurl" | grep Views
# just get numbers
elinks "$theurl" | grep -m 1 "Total Views" | cut -c 16-25 | sed 's,\,,,g' >> $outfile
done < $datafile
zenity --text-info --filename=$outfile
[/code]

Run program to get data.
Start new spreadsheet.
Copy and paste numbers into spreadsheet (use fixed with and special numbers)
Add column titles
Copy paste URL's (with fixed length.)
Save and done.


Included a short movie to show how it works.



Update:

Wrote a new version of the script and it should work better.

[code]
#================================
#
# Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
echo "$theurl"
# echo -n "$theurl'" >> $outfile
# get total views
curl -s "$theurl" | grep -m 1 "hits-count" | sed 's/[^0-9]*//g' >> $outfile
done < $datafile
zenity --text-info --filename=$outfile
[/code]


-----------------------------------------------------------------------------

Partial temp fix

#================================
#
#  Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
total=0
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo  "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
echo  "$theurl"
# echo -n "$theurl'" >> $outfile
# get total views
# count=$(elinks  "$theurl" | grep -m 1 "hits-count" | sed 's/[^0-9]*//g')
count=$(elinks  "$theurl" | grep -m 1 "views" | sed 's/[^0-9]*//g')
# let total=$total+$count
echo "$count" >> $outfile
done < $datafile
# echo "total: $total" >> $outfile
zenity --text-info --filename=$outfile

--------------------------------------------------------------
Yet another temporary fix.
While looged in:
#================================
#
#  Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
total=0
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo  "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
echo  "$theurl"
# echo -n "$theurl'" >> $outfile
# get total views
# count=$(elinks  "$theurl" | grep -m 1 "hits-count" | sed 's/[^0-9]*//g')
count=$(elinks  "$theurl" | grep -m 1 "views" | sed 's/[^0-9]*//g')
# let total=$total+$count
echo "$count" >> $outfile
done < $datafile
# echo "total: $total" >> $outfile
zenity --text-info --filename=$outfile


=======================================================

Get favs by:
#================================
#
#  Instructablesnumbers catcher
#
#=================================
# Assignments
# --------------------------------
szAnswer=$(zenity --file-selection --title="Select a iurl file to read")
datafile=$szAnswer
outfile="inumdata"
total=0
# the date
tmon=$(date +"%b")
tday=$(date +"%d")
echo  "The views for $dj on $tmon $tday:" > $outfile
#=================================
#
# Data input
#---------------------------------
while read line
do theurl=$line
echo  "$theurl"
# echo -n "$theurl'" >> $outfile
# get total views
# count=$(elinks  "$theurl" | grep -m 1 "hits-count" | sed 's/[^0-9]*//g')
count=$(elinks  "$theurl" | grep -m 1 "favorites" | sed 's/[^0-9]*//g')
# let total=$total+$count
echo "$count" >> $outfile
done < $datafile
# echo "total: $total" >> $outfile
zenity --text-info --filename=$outfile

Step 13: Grab College Football Scores.

Script to get college football scores.

####################################
# Score  Grabber
#
#===============================
# Assignments
# --------------------------------
datafile="tcf"
let "flag = 0"
let "year = 2014"
let "week = 4"
if [ "$week" -lt "10" ]; then
    let "a = 0"
fi
# end assignments
#=================================
#
# Get data file
#---------------------------------
elinks -dump "www.ncaa.com/scoreboard/football/fbs/$year/$a$week/"  > $datafile
#=================================
#
# Extract and display data
#---------------------------------
while read line
do fdata[$a]=$line
    echo $line | grep -q "NCAA Scoreboard"
    if  [ $? -eq 0 ]; then
        # header
        clear
        let "flag = 1"
    fi
    if [ $flag -eq 1 ]; then
        echo $line | grep -q "Featured Sections"
            if [ $? -eq 0 ]; then
            let "flag = 0"
        else
            echo $line | grep -q "GameCenter"        
            if [ $? -eq 0 ]; then
                let "response = donothing"
            else
                echo $line | sed 's/\[.*\]//'
            fi
        fi
    fi
let "a += 1"
done < $datafile
# footer
echo ---------------------------------------------
echo
#===================================
# End.
####################################

Step 14: Grab Pro Football Scores.

Here is the temporary code:

####################################
# Score Grabber
#
#===============================
# Assignments
# --------------------------------
datafile="nflscorefile"
a=1
flag=0
week=12
# phase 1 is preseason phase 2 is regular season phase 3 is
phase=2
season=2013
#finished week = 1 unfinished week = 0
weekfinished=1
league="nfl"
# end assignments
#=================================
#
# Get data file
#---------------------------------
case $weekfinished in
1)
elinks "http://sports.yahoo.com/$league/scoreboard/?week=$week&phase=$phase&season=$season" > $datafile
;;
0)
elinks "http://sports.yahoo.com/$league/scoreboard/" > $datafile
;;
*)
#
;;
esac
#=================================
#
# Extract and display data
#---------------------------------
while read line
do fdata[$a]=$line
echo $line | grep -q "Home Score Away"
if [ $? -eq 0 ]; then
# header
clear
echo
echo ------------------------------------------------
echo $league data for phase = $phase week = $week season = $season
echo ------------------------------------------------
echo
echo " Home Score Away"
echo ""
let "flag = 1"
fi
if [ $flag -eq 1 ]; then
echo $line | grep -q "Latest NFL Videos"
if [ $? -eq 0 ]; then
let "flag = 0"
else
echo $line | grep -q "Home Score Away"
if [ $? -ne 0 ]; then
case $weekfinished in
1)
echo $line | sed 's/\[.*\]//'
;;
0)
echo $line
;;
*)
#
;;
esac
fi
fi
fi
let "a += 1"
done < $datafile
# footer
echo ---------------------------------------------
echo
#===================================
# End.
####################################

Step 15: Some Updated Scripts.

[code]
###################################

# Score Grabber

#

#===============================

# Assignments

# --------------------------------

datafile="tcf"

let "flag = 0"

let "year = 2014"

let "week = 14"

if [ "$week" -lt "10" ]; then

let "a = 0"

fi

# end assignments

#=================================

#

# Get data file

#---------------------------------

elinks -dump "www.ncaa.com/standings/football/fbs/" > $datafile

IFS='%'

#=================================

#

# Extract and display data

#---------------------------------

while read line

do fdata[$a]=$line

echo $line | grep -q "Atlantic Coast"

if [ $? -eq 0 ]; then

# header

clear

let "flag = 1"

fi

if [ $flag -eq 1 ]; then

echo $line | grep -q "NCAA football"

if [ $? -eq 0 ]; then

let "flag = 0"

else

echo $line | grep -q "GameCenter"

if [ $? -eq 0 ]; then

let "response = donothing"

else

line=`echo $line | sed 's/\[.*\]//'`

echo $line

fi

fi

fi

let "a += 1"

done < $datafile

# footer

echo ---------------------------------------------

echo

#===================================

# End.

####################################

[/code]

=================================================

For historical sake here is the score grabber and other listings

[code]

####################################

# NFL Score Grabber

#

#===============================

# Assignments

# --------------------------------

datafile="tcf"

let "flag = 0"

let "year = 2014"

let "week = 14"

if [ "$week" -lt "10" ]; then

let "a = 0"

fi

# end assignments

#=================================

#

# Get data file

#---------------------------------

elinks -dump "www.ncaa.com/scoreboard/football/fbs/$year/$a$week/" > $datafile

#=================================

#

# Extract and display data

#---------------------------------

while read line

do fdata[$a]=$line

echo $line | grep -q "NCAA Scoreboard"

if [ $? -eq 0 ]; then

# header

clear

let "flag = 1"

fi

if [ $flag -eq 1 ]; then

echo $line | grep -q "Featured Sections"

if [ $? -eq 0 ]; then

let "flag = 0"

else

echo $line | grep -q "GameCenter"

if [ $? -eq 0 ]; then

let "response = donothing"

else

echo $line | sed 's/\[.*\]//'

fi

fi

fi

let "a += 1"

done < $datafile

# footer

echo ---------------------------------------------

echo

#===================================

# End.

####################################

[/code]

Here is the code for the pro-football script also.

[code]

####################################

# Score Grabber

#

#===============================

# Assignments

# --------------------------------

datafile="nflscorefile"

a=1

flag=0

week=13

# phase 1 is preseason phase 2 is regular season #phase 3 is

phase=2

season=2014

#finished week = 1 unfinished week = 0

weekfinished=1

league="nfl"

# end assignments

#=================================

#

# Get data file

#---------------------------------

case $weekfinished in

1)

elinks "http://sports.yahoo.com/$league/scoreboard/?week=$week&phase=$phase&season=$season" > $datafile

;;

0)

elinks "http://sports.yahoo.com/$league/scoreboard/" > $datafile

;;

*)

#

;;

esac

#=================================

#

# Extract and display data

#---------------------------------

while read line

do fdata[$a]=$line

echo $line | grep -q "Home Score Away"

if [ $? -eq 0 ]; then

# header

clear

echo

echo ------------------------------------------------

echo $league data for phase = $phase week = $week season = $season

echo ------------------------------------------------

echo

echo " Home Score Away"

echo ""

let "flag = 1"

fi

if [ $flag -eq 1 ]; then

echo $line | grep -q "Latest NFL Videos"

if [ $? -eq 0 ]; then

let "flag = 0"

else

echo $line | grep -q "Home Score Away"

if [ $? -ne 0 ]; then

case $weekfinished in

1)

echo $line | sed 's/\[.*\]//'

;;

0)

echo $line

;;

*)

#

;;

esac

fi

fi

fi

let "a += 1"

done < $datafile

# footer

echo ---------------------------------------------

echo

#===============================

# End.

################################

[/code]

[code]

###################################
# Nfl Standings Grabber #

#===============================

# Assignments

# --------------------------------

datafile="tcf" l

et "flag = 0" let "year = 2014" let "week = 14"

if [ "$week" -lt "10" ]; then

let "a = 0" fi # end assignments

#=================================

#

# Get data file

#---------------------------------

elinks -dump "http://www.nfl.com/standings" > $datafile

IFS='%'

#================================= # # Extract and display data #--------------------------------- while read line do

fdata[$a]=$line

echo $line | grep -q "American Football Conference"

if [ $? -eq 0 ]; then

# header

clear

let "flag = 1"

fi if [ $flag -eq 1 ]; then

echo $line | grep -q "NFL Playoff Picture"

if [ $? -eq 0 ]; then

let "flag = 0"

else

echo $line | grep -q "GameCenter"

if [ $? -eq 0 ]; then

let "response = donothing"

else

line=`echo $line | sed 's/\[.*\]//'`

echo $line

fi

fi

fi

let "a += 1"

done < $datafile

# footer

echo ---------------------------------------------

echo

#===================================

# End. ####################################

[/code]

Step 16: Some Data Extraction.

More than one way to skin a cat.

# just a thought

IFS=’%’

line=” [173]Virginia 3-5 5-7 310 289 5-2 0-5 Lost 1″

a=`echo $line | cut -c8-28`

b=`echo $line | cut -c29-31`

c=`echo $line | cut -c33-35`

d=`echo $line | cut -c38-40`

e=`echo $line | cut -c42-44`

f=`echo $line | cut -c46-48`

g=`echo $line | cut -c51-53`

h=`echo $line | cut -c56-61`

echo “ ”> somefile.html

echo “ ”>> somefile.html

echo “”>> somefile.html

echo "" >> somefile.html

echo “ \ ” >> somefile.html

echo “
SchoolW-LW-LPFPAHomeAwayCurrent streak
$a$b$c$d$e$f$g$h

”>> somefile.html

echo “

”>> somefile.html

echo “

”>> somefile.html

#————————————–

chmod +x testithtml.sh

./ testithtml.sh

links2 -dump somefile.html

Firefox

Somefile.html

SchoolW-LW-LPFPAHomeAwayCurrent streak
Virginia 3-55-73102895-20-5Lost 1