Introduction: Hack a $30 WiFi Pan-Tilt Camera - Video, Audio, and Motor Control With Python
In this Instructable, you'll learn how to intercept the video, microphone, and controls of the $30 Kaicong SIP1602 wireless pan-tilt camera on Windows, Linux, or OSX! Everything is rolled neatly into python scripts; you can use the output data for things like voice transcription, computer vision, and automated directional control. If you're feeling truly adventurous, keep on reading and you'll learn my methods to discover and reverse engineer wireless cameras!
Installation time: ~30 minutes
You will Need:
- A Kaicong SIP1602 WiFi Pan-Tilt Camera
- NOTE: Apparently the popularity of this Instructable wiped out Amazon's stock. Others have reported that this $45 Tenvis camera is a good substitute.
- A computer or network router with an ethernet port
- A working 802.11B/G wireless network (wireless N isn't supported with this camera)
- Basic knowledge of command prompt or terminal (change directories, run a file)
For anything other than just installing and running the camera code, intermediate-level experience in Python and OpenCV will also be very useful. Let's get to it!
If you like this hack, don't forget to follow us on Instructables, Facebook or Twitter, and check out our other projects on our website!
Step 1: Setting Up the Camera
On the box that contains the camera is Kaicong's motto: "Nothing Important Than Safty". And it shows - they really made the manual secure, because anyone that can't read Mandarin is going to have a pretty hard time understanding it! That said, installation is surprisingly simple.
- Remove the camera, wall charger, and ethernet cable from the box.
- Plug the wall charger into a nearby outlet, and connect the camera to your computer or router ethernet port via the cable.
- Turn your camera over and look on the bottom. You should see a host name listed, as well as the username and password for your camera (spoiler alert: it's "admin" and "123456")
- Type in your domain (mine was 385345.kaicong.info) into your browser. Type in your camera's login information at the prompt, and you'll be directed to a list of links to different browsing modes. Choose the best mode for your browser.
- Have fun clicking buttons for a bit! You'll notice that the Server Push Mode for Chrome and Firefox doesn't have a working microphone or speaker output, which is quite lame. The IE version also requires installing an ActiveX object, but all features work once it is installed.
- Also take note of the IP address in the webpage URL - we'll need this later.
- There should be a small gear icon at the bottom of the column of control buttons on the page you were on at step 5. Click this, and you'll be taken to the settings page.
- Click on the Wireless Lan Settings link on the left side of the page.
- Click on the text input labeled "SSID" and enter your wireless network's name.
- Make sure the "Authetication" drop down button is set to your network's auth type (usually WPA2-PSK AES if your network has a password)
- Click on the "Share Key" text input and enter in your network password if you have one.
- Click the "Set" button. Your camera will reset and connect to the network.
By default, these cameras are viewable to anyone on the internet who guesses your <number>.kaicong.info address - which can be awesome for projects, but not so awesome for security and privacy. To solve this, you can either change your DDNS username and password, or simply set both of them to blank (thereby making it impossible to access your camera outside of your local network)
Step 2: Installing Python Controls
With the camera set up complete, we'll need to install a few libraries before we can run our scripts.
For Windows: Here are links to windows installation tutorials, or pages where you can find the windows installer.
For Ubuntu: setup can be done via this command:
sudo apt-get install python python-opencv python-pyaudio python-pygame
For OSX: First install OpenCV and Homebrew - I had to additionally install eigen (
brew install eigen) to prevent compiler errors.
Then run the following:
brew install python
brew install gcc
brew install homebrew/python/pygame
brew install portaudio
Then download the pyaudio wrapper for OSX and install that as well.
Now that we've got the dependencies out of the way, head over to the git repository where this project is hosted, download it, and extract the files. Open up a command window or terminal in the directory with the extracted files, and run each script with the following commands, replacing
192.168.1.19 with the IP address of your camera:
python KaicongAudio.py 192.168.1.19
This script pulls audio from the mic and plays it on your speakers.
python KaicongVideo.py 192.168.1.19
This script displays video from the camera and displays it in an OpenCV window.
python KaicongMotor.py 192.168.1.19
This script opens up a black Pygame window. Click it with the mouse so it can capture your keyboard, then use the WASD keys to pan and tilt the camera!
At this point, we've successfully hooked up the camera and can intercept audio, video, and motor control from it via programming. But how did we do this? Read on to find out...
Step 3: How We Did It: Hacking Motion
We started out with a camera with a web page interface and wanted to control it programmatically, so what better way to figure out how it works than inspect the code?
We saved the webpage to disk and looked at monitor.htm. It was there that we found some interesting looking variables, such as PTZ_UP and PTZ_STOP, which appeared to be motion control constants. Keeping that in mind, we opened up the web inspection console (Ctrl+Shift+C in Chrome) and inspected the network traffic while clicking the camera motion buttons. We found several calls to a
decoder_control.cgi page with a "command=" argument matching the constants we found earlier in the HTML - one whenever a click begins, and another whenever a click ends. So the controls are ON/OFF and via HTTP GET request? Let's find out!
We copied the url we saw:
into the browser and loaded the page, and sure enough the camera began moving! From then it was a matter of throwing the constants and a formattable URL string into Python to complete the controller. Done.
But what about video? A camera's not a camera without it, after all...
Step 4: How We Did It: Hacking Video
As it turns out, video hacking was actually pretty simple - we looked in the network requests and found a lot of requests to
snapshot.cgi. Entering one of these into Chrome produced a still image every time the page was loaded. Neat!
But we wanted something a bit more efficient: the streamed video that the ActiveX object seemed to receive. The ActiveX object itself didn't seem too useful to disassemble (reversing assembly code is way overrated), so instead we opened up Wireshark. We filtered the capture down to the IP of our camera (Capture->Options->Capture Filter) and started the capture, before reloading the ActiveX control page in our browser. What we found were two GET requests for
livestream.cgi, presumably for the audio and video.
Putting aside the audio url for now, we turned to Google to see if anyone had decoded an IP camera video stream before. Under a search for "IP camera HTTP stream" we found a handy little python script to get everything running in OpenCV. All it took was replacing the script's URL with ours, and we were in business!
Next, it was time to intercept the audio.
Step 5: How We Did It: Hacking Audio
Getting video wasn't too hard. Hopefully audio would be just as easy, right? After a few hours of Google searching, it looked like no one else has ever managed to successfully pull out and decode the audio stream of an IP camera. We were on our own.
Going back to our
audiostream.cgi url we found via Wireshark, we captured a few bytes of audio with Ubuntu:
Then hit Ctrl+C to cut off the stream. Raw audio in hand, we marched over to Audacity to attempt to play it via File->Import->Raw Data. Most attempts sounded like noise, however we found that using the VOX ADPCM encoding at 8kHz produced something recognizable!
There was still the matter of removing that weird pattern of clicks. I figured it had something to do with the packets, as with the video stream we had to remove some headers at the start and end. Maybe the same was true with audio?
We looked a bit more closely at each packet, and noticed that the data started with the same 0x55aa15a8... bytes, plus a value that looked to be counting upwards each packet, and a long stream of zeroes, for a total of 32 bytes. Presumably, Audacity was taking these packet headers as audio data and trying to decode them, which is what made the nasty clicking sounds.
A few experimental python scripts later, we removed the headers and passed it through the ADPCM decoder in Audacity - most of the clicks were removed! But there were a few left over, specifically during the noisier parts of the audio.
So we read into how ADPCM works - apparently it encodes audio via the difference between samples, and caches the previous audio state so that it can add the two and produce a new sample. After a few more python scripts, we managed to capture the packets directly and reset this state at the start of each packet. Clicks were completely removed, and nothing but camera audio remained. Success!
Step 6: The Future
It's awesome to have such a complex device completely controllable via python. We plan on using our camera for person detection and room occupancy tracking as well as spoken voice commands, but we can think of a few other uses for a camera like this one, such as:
- Augment an RC car or plane to display first-person video while driving
- Put it on an airsoft or NERF turret and track your victims
- Set up a rockin' custom built home security system
- Use CV object recognition to track where your pets go automatically when you aren't there
- Make a remote telepresence robot
- Rip off the camera part and attach anything that needs to be precisely positioned (lasers, robot arms...)
- Create a remote time-lapse system with slow panning over time