Introduction: Retro Speech Synthesis. Part : 12 IoT, Home Automation

This article is the 12th in a series on home automation Instructables documenting how to create and integrate an IoT Retro Speech Synthesis Device into an existing home automation system including all the necessary software functionality to enable the successful deployment within a domestic environment.

Picture 1 shows the completed IoT speech synth device and Picture 2 shows all the component parts used in the prototype which were form factor reduced to go into the final product.

The video shows the device in action (during testing).


As mentioned above this Instructable details how to make an IoT Retro Speech Synthesis Device and is based around the General Instruments SP0256-AL2.

It's primary purpose is add 'old school' voice synthesis to an IoT network. Why 'old school' you may ask? Well, because I was around in the 80's when these things were first manufactured and I interfaced one to my BBC Micro so for me there's some degree of nostalgia surrounding the SP0256-AL2.

I much prefer the challenge of trying to figure out what on earth is being said by this Dalek sounding voice than listening to the dulcet tones of a hipster Amazon echo or Siri. Where's the challenge in that I ask you?

Oh, and not to mention I also have a 'bag load' of 'SP0256-AL2' ICs lying around.

The device is also capable of reading local temperature and humidity so further extends the ambient instrumenting of my existing IoT infrastructure hooking into the MQTT/OpenHAB based IoT network detailed in this series on home automation (HA), building on reused code taken from here.

At it's heart is an ESP8266-07 which is responsible for MQTT communications and controlling all system functionality (SD card access, led control, temperature/humidity sensing, volume control, speech synthesis).The device is fully configurable via text files stored on a local SD card, though calibration and network security parameters can also be programmed via remote MQTT publications.

What parts do I need?

See the bill of materials here

What software do I need?

  • Arduino IDE 1.6.9,
  • Arduino IDE configured to programme the ESP8266-07 (same as this). Then configure the IDE as indicated in the detailed description provided in the software sketch here,
  • Python v3.5.2 if you wish to use the automated test capability, details here

What tools do I need?

  • Microscope at least x3 (for SMT soldering),
  • Molex connector crimping tool (for JST connectors),
  • SMD soldering Iron (with liquid flux pen and flux cored solder),
  • Screwdrivers (various),
  • Heat gun,
  • Drills (various),
  • Countersink handtool,
  • Files (various),
  • Dremel (various bits),
  • Sturdy vice (small and large, like a black and decker work mate),
  • Scalpel,
  • Vernier calipers (used to measure fabrication and useful for sizing PCB components),
  • Spanners and Nut drivers (various),
  • Strong tweezers (for SMT soldering),
  • Junior Hacksaw,
  • Drill (with various drill bits),
  • Fine pliers (point and snub nosed),
  • Flush cutters,
  • DMM with audible continuity check,
  • Dual channel digital scope (handy for debugging signals)

What skills do I need?

  • A lot of patience,
  • A great deal of manual dexterity and excellent hand/eye coordination,
  • Excellent soldering skills,
  • Excellent fabrication skills,
  • The ability to visualise in 3 dimensions,
  • Some knowledge of software development with 'C' (if you want to understand the source code),
  • Some knowledge of Python (how to install and run scripts, if you want to use the automated testing),
  • Knowledge of Arduino and it's IDE,
  • Good knowledge of electronics,
  • Some understanding of your home network.

Topics Covered

  • User Manual
  • Circuit Overview
  • PCB Manufacture and Assembly
  • Fabrication
  • Software System Overview
  • Software Overview
  • Sensor Calibration
  • MQTT Topic Naming Convention
  • Debug & Fault Finding
  • Testing the Design
  • Conclusion
  • References Used

Series Links
To Part 11 : IoT Desktop Console. Part : 11 IoT, Home Automation

Step 1: User Manual

Picture 1 above shows the front of the Retro Speech Synthesiser and picture 2 the rear.

Enclosure Front

  1. Speaker Grill
  2. 3.5mm Earphone Jack : The main speaker is disabled when 3.5mm jack is inserted.
  3. Red LED : This LED illuminates whilst a word is being spoken when speech was initiated via an HTTP request.
  4. Blue LED : This LED illuminates whilst a word is being spoken when speech was initiated via an MQTT IoT request.

Enclosure Rear

  1. Reset Button : Used to hard reset the ESP8266-07 IoT device.
  2. Flash Button : When used in conjunction with the Reset Button allows re-flashing of the ESP8266-07.
  3. WiFi Antenna plug (SMA Plug) : For external WiFi Antenna giving the least RF path attenuation as the closure is aluminium.
  4. External Programming Port : To remove the need to unscrew the enclosure to gain access to the ESP8266-07 for re-programming purposes. The programming pins of the ESP8266-07 have been brought out to the external programming port. Picture 3 is the programming adapter.
  5. Green LED : This is the IoT system led and is used to indicate diagnostic status of the device and boot up and whilst operating.
  6. External Temperature/Humidity Sensor (AM2320)
  7. SD Card slot : This holds all config/security data along with web server pages.
  8. 2.1mm Supply jack 6vdc

Step 2: Circuit Overview

The Retro Speech Synth device comprises two PCBs;

  • RetroSpeechSynthIoTBoard : This is a generic, re-usable ESP8266-07/12/12E/13 PCB
  • RetroSpeechSynthBoard : This is a generic SP0256-AL2 PCB

Retro Speech Synth IoT Board

This board allows for either the direct soldering of an ESP8266-07/12/12E/13 or 0.1" pitch sockets accommodating an ESP8266 carrier PCB.

The board was designed to expand it's I/O over an I2C connection and can support either 3v3 or 5v supply levels via Q1, Q2, R8-13.

Connection to the board is achieved via one of two headers J2 and J4, An 8-way DIL IDC ribbon or 5-way JST/Molex.

U2 and U3 provision 3.3v and 5v on board supply regulation. Alternatively if greater current capacity is required, off board serial shunt regulators may be attached via connectors J10 and J11 respectively.

Connectors J1 and J3 offer external SD card support over SPI. J1 has been designed for an 8-way Molex and J3 has direct pin for pin compatibility support for an off the shelf SD card PCB with either 3v3 or 5v support.

Retro Speech Synth Board

Control of this board is over an I2C 5v compliant connection via J1, J5 or J6, a 4-way JST/Molex, 8-way DIL IDC or 8-way IDC ribbon connector.

U2 MPC23017 provides the I2C to parallel interface to U3 the SP0256-AL2 and LEDS D1 (Green), D2 (Red) and D3 (Blue). The output of the Speech Synth is fed to audio amp CR1 TBA820M via either analogue pot RV1 or digital pot U1 MCP4561.

Digital Pot U1 is also controlled via 5v compliant I2C.


Note : The ESP8266-07 device was chosen as it has an integral IPX RF connector allowing an external WiFi Antenna to be added to the aluminum enclosure.

Step 3: PCB Manufacture and Assembly

Pictures 1 and 2 show the completed and wired PCB sub-assemblies located on the aluminum enclosure substrate.

The two PCBs were designed using Kicad v4.0.7, manufactured by JLCPCB and assembled by me and shown above Pics 3 to 13.

Step 4: Fabrication

Picture 1 shows a Haynes Manual style layout of all the prefabricated parts before final assembly.

Pics 2 ... 5 show various shots during the fabrication of the enclosure with minimal clearances.

Step 5: Software System Overview

This IoT Retro Speech Synthesis Device contains six key software components as shown in pic 1 above.

SD Card

This is the external SD SPI Flash Filing System and is used to hold the following information (see pic 2 above);

  • Icons and 'Speech Synth Configuration Home Page' index.htm: Served up by the IoT device when it is unable to connect to your IoT WiFi network (usually due to incorrect security information, or first time use) and provides the user with a means of remotely configuring the sensors without the need to re-flash new SD content.
    It also holds index1.htm, mqtt.htm and sp0256.htm, these are the locally served web pages accessible over a web browser allowing limited control of the speech synth over HTTP.
  • Security Information: This holds the information used at power up by the IoT device to connect to your IoT WiFi network and MQTT Broker. Information submitted via the 'Speech Synth Configuration Home Page' is written to this file ('secvals.txt').
  • Calibration Information: The information contained within the files ('calvals1.txt' and 'calvals2.txt') are used to calibrate the on-board temperature/humidity sensors should it be necessary. Calibration constants can be written to the IoT device via MQTT commands from an MQTT broker or by re-flashing the SD card. 'calvals1.txt' pertains to the AM2320 sensor and 'calvals2.txt' to the DHT22.
  • User configurable system values: The information contained within this file ('confvals.txt'), chosen by the user, controls certain system responses, such as initial digital volume level, auto 'system ready' announcement on MQTT broker subscription etc.

mDNS Server

This functionality is invoked when the IoT device has failed to connect to your WiFi network as a WiFi station and instead has become a WiFi access point something akin to a domestic WiFi router. In the case of such a router you would typically connect to it by entering the IP Address of something like (usually printed on a label affixed to the box) directly into your browser URL bar whereupon you would receive a login page to enter the username and password to allow you to configure the device. For the ESP8266-07 in AP mode (Access Point mode) the device defaults to the IP address, however with the mDNS server running you only have to enter the human friendly name 'SPEECHSVR.local' into the browser URL bar to see the 'Speech Synth Configuration Home Page'.

MQTT Client

The MQTT client provides all the necessary functionality to; connect to your IoT network MQTT broker, subscribe to the topics of your choice and publish payloads to a given topic. In short it provisions IoT core functionality.

HTTP Web Server

This web server has two purposes;

  1. If the IoT device is unable to connect to the WiFi network whose SSID, P/W etc. is defined in the Security Information file held on the SD Card the device will become an Access Point. Once connected to the WiFi network provided by the Access Point, the presence of an HTTP Web Server allows you to directly connect to the device and change it's configuration via the use of an HTTP Web Browser it's purpose being to serve up the 'Speech Synth Configuration Home Page' web page which is also held on the SD Card.
  2. Once the IoT Retro Speech Synthesis Device has connected to the WiFi network and MQTT broker, if accessed, the HTTP Web Server will automatically serve up an HTTP web page allowing limited control of the IoT device to speak a selection of fixed phrases and the ability to cycle the two front Red and Blue LEDS.

WiFi Station

This functionality gives the IoT device the capability to connect to a domestic WiFi network using the parameters in the Security Information file, without this your IoT device will not be able to subscribe/publish to the MQTT Broker.

WiFi Access Point

The ability to become a WiFi Access Point is a means by which the IoT device allows you to connect to it and make configuration changes via a WiFi station and a browser (such as Safari on the Apple iPad). This access point broadcasts an SSID = "SPEECHSYN" + the last 6 digits of the MAC address of the IoT device. The password for this closed network is imaginatively named 'PASSWORD'

Step 6: Software Overview


To successfully compile this source code you will need a local copy of the code and libraries outlined below in Step 12, References Used. If you are not sure how to install an Arduino library go here.


The software makes use of the state-machine as shown in pic 1 above (full copy of source in my GitHub repository here). There are 5 main states as outlined below;

  • INIT
    • This initialisation state is the first state entered after power up.
    • This state is entered if after power up an invalid or missing secvals.txt file is detected. During this state the Config Page is visible.
    • This state is transitory, entered whilst there exists no WiFi network connection
    • This state is transitory, entered after a WiFi network connection has been made and whilst there exists no connection to an MQTT broker on that network.
    • This is the normal operational state entered once both a WiFi network connection and an MQTT Broker connection has been established. It is during this state the temperature, heat index and humidity at the IoT Retro Speech Synthesis Device is regularly published to the MQTT Broker. In this state the Speech Synth Home Page is visible.

The events controlling transitions between states are described in pic 1 above. Transitions between states is also governed by the following SecVals parameters;

  • 1st MQTT Broker IP Address. In dotted decimal form AAA.BBB.CCC.DDD
  • 2nd MQTT Broker Port. In Integer form.
  • 3rd MQTT Broker connection attempts to make before switching from STA mode to AP mode. In Integer form.
  • 4th WiFi Network SSID. In free form text.
  • 5th WiFi Network Password. In free form text.

As mentioned above if the IoT device is unable to connect as a WiFi Station to the WiFi network who's SSID and P/W is defined in secvals.txt held on the SD Card the IoT device will become an Access Point. Once connected to this access point it will serve up the 'Speech Synth Configuration Home Page' as shown above in Pic 2 (by entering either 'SPEECHSVR.local' or into your browsers URL address bar). This home page allows the reconfiguration of the IoT Retro Speech Synthesis Device via an HTTP browser.

Remote Access whilst in the ACTIVE state

Once connected to the MQTT Broker it is also possible to both re-calibrate and reconfigure the device via MQTT topic publications. The file calvals.txt has R/W access and secvals.txt has write only access exposed.

Also as mentioned above, once in the active mode it is possible to access the Speech Synth via an HTTP interface by entering 'SPEECHSVR.local' or into your browsers URL address bar. This HTTP based interface allows for basic control of the Speech Synth. Pics 3, 4 and 5 show the web pages available.

User debug

During the boot sequence the IoT device green System led at the rear of the enclosure gives the following debug feedback;

  • 1 Short flash : No Config file located on SD Card (secvals.txt)
  • 2 Short flashes : IoT device is attempting to connect to WiFi network
  • Continuous illumination : IoT device is attempting to connect to MQTT Broker
  • Off : Device is active.

IoT Retro Speech Synthesis Device Functionality in ACTIVE State

Once in the ACTIVE state the ESP8266 enters a continual loop calling the following functions; timer_update(), checkTemperatureAndHumidity() and handleSpeech(). The net result of which has been designed to present the user with an HTTP or MQTT interface, seamlessly service it's on-board speech processor with phonemes on demand and publish local ambient parametric values over MQTT.

A comprehensive list of all topic subscriptions and publications including payload values is included in the source code.

Step 7: Sensor Calibration

When the IoT device powers up, as part of the boot sequence two files named 'cavals1.txt' and 'cavals2.txt' are read from the SD Card.

The contents of these files are calibration constants as indicated above in pic 1.

  1. 'cavals1.txt' : Used by the external AM2320
  2. 'cavals2.txt' : Used by the internal DHT22

These calibration constants are used to adjust the readings acquired from the two sensors to bring them into line with a reference device. There is one further value which defines a reporting strategy for each device and is described below along with the procedure followed to calibrate the sensors.

Reporting Strategy

This parameter determines how the remote sensor reports any ambient parametric changes local to it. If a value of 0 is selected the remote sensor will publish any change it sees in the temperature or humidity each time the respective sensor is read (approx every 10 seconds). Any other value will delay the publication of a change by 1...60 minutes. Modifying this parameter allows for optimisation of MQTT network traffic. It should be noted temperature and humidity data from the DHT22 is read alternately due to limitations of the sensor.

Temperature calibration

To calibrate the temperature sensor I followed the same process as outlined here step 4, again using a simple y=mx+c relationship. I used IoT Temperature, Humidity Sensor #1 as the reference device. Values from the sensor are in degrees celcius.

Humidity Calibration

As I possess no means to accurately record or even control local ambient humidity, to calibrate the sensor I used a similar approach to that above here step 4, again using Sensor #1 as reference. However the above said, I have recently found an excellent article on the web describing how to calibrate humidity sensors. I may well try this approach sometime in the future. Values from the sensor are in %age of relative humidity.

Step 8: MQTT Topic Naming Convention

As mentioned in an earlier Instructable (here) I settled on the topic naming convention outlined in pic 1 above.

Namely, 'AccessMethod/DeviceType/WhichDevice/Action/SubDevice' It's not perfect but it does allow for useful filters to be applied to see all sensor outputs for a given parametric topic thus allowing for easy comparison as in pic 2 above with MQTTSpy.

This project is the first instance where a single device contains more than one originating source of the same type of publication. ie. Two temperature/humidity sensors, from internal and external sub-devices.


It also supports reasonably extensible logical groupings of functionality within a given IoT device.


In implementing these topics in software I used hard coded topic strings with fixed, embedded numerical identifiers for each device as opposed to dynamically generating the topics at run time so as to save on RAM and keep performance high.


Note : If you're not sure how to use MQTTSpy see here 'Setting Up an MQTT Broker. Part 2 : IoT, Home Automation'

Step 9: Debug & Fault Finding

By and large, for my hobby projects, where possible I tend to build a representative hardware prototype against which the software is developed I rarely have any issues when integrating the software into the final platform hardware.

However, on this occasion I came across a strange intermittent fault whereby some phonemes would sound out but others would not.

After some initial debugging of the Speech Synth PCB using an Arduino Uno to source phonemes and prove this board was working, I took a scope to the I2C lines between the IoT PCB and the Speech Synth PCB. See Pic 1 above.

You can clearly see the 'saw tooth'/exponential edge to the I2C signal on the traces.

This is usually an indication the I2C pull up values are too high preventing the line voltage from recovering fast enough in an open drain circuit.

As a 'work around' I paralleled the two smt pull up resistors R12 and R13 with 10Ks to give 4K7 and sure enough the Speech Synth 'burst into life'

This type of failure is the opposite to what can happen when debugging these types of projects. In general most of the I2C based modules purchased from Ebay tend to come with 10K or 4K7 pull ups already fitted. If you intend to use >5 I2C modules, each with 4K7 pull ups, then the overall load is 940R which will be too great for the output stage of the master. The fix would be to de-solder all but one set of pull up resistors on each module. Preferably the one physically furthest away from the master.

A useful tip and worth keeping in mind when designing electronics with I2C devices.

Step 10: Testing the Design

Testing was carried out using two methodologies; Manual and Automated.

The first, manual, and generally used during initial code development was using MQTT Spy to exercise all of the available subscribed topics and check the published responses (depicted in pic 2 above). As this a manual process it can be time consuming and prone to errors as code development progresses, although manual execution does enable 100% coverage.

MQTTSpy was chosen for manual testing because it is an excellent tool to hand format a given payload and publish it to any topic with ease. It also displays a clear, time stamped log which is very useful for debugging (pic 3 above).

The second, automated approach was adopted as the source code became more complex (>3700 lines). Increased complexity means longer manual testing cycles and more complex tests. In order to improve the reliability, determinism and quality of tests, automated testing was used via a python test executive (pic 1). See Step #10 in this Instructable on how automated testing was introduced. A full copy of the automated tests used in this Instructable is available here.


A video of the automated test sequence in operation is shown above. The sequence executes the following steps;

  • Automated via MQTT
    • Connect to MQTT backbone and announce 'System Ready'
    • Exercise Green LED
    • Exercise Red LED
    • Exercise Blue LED
    • Check Digital Pot works
    • Speak using Phonemes
    • Speak using Hex Codes for Phonemes
    • Speak using codes for fix phrases
    • A little bit of Dr Who and the Daleks fun.
  • Manually via HTTP/Chrome
    • Exercise Blue LED
    • Exercise Red LED
    • Speak fixed phrases 'Steven Quinn', 'System Ready' and 'Hello World'
    • Have the HTTP server, serve up
      • Details on Speech Synth Chip
      • MQTT Details


Step 11: Conclusion

Although it took a lot of effort with files and drills etc. especially for the speaker grille, I think the outcome is aesthetically pleasing and packs into a nice, small enclosure. I could have made it smaller but it would have needed to go onto one PCB and I deliberately broke it into two so I could re-use the PCBs at a later date for other projects. So it's a happy compromise.

The software works well, the IoT device has been in stable operation for quite some time now without any issues.

I've been monitoring the temperature and humidity via Grafana and comparing with a co-located device. The two ambient values have been correlating well, implying the calibration is reasonable (or at least they are similar).

I stopped short of implementing word command ('WFD/SpeechTH/1/Word/Command') because I ran out of time and needed to move on. I may well re-visit this if and when I set up a MySQL database. Right now I'm using InfluxDB.

Step 12: References Used

The following sources were used to put this Instructable together;
Source code for the IoT Retro Speech Synthesis Device (this contains a copy of everything)






For fun

PCB Manufacture

Installing Additional Arduino Libraries

How to Check and Calibrate a Humidity Sensor

SP0256-AL2 Datasheet

Speech Chips Shop


Arduino Contest 2019

Runner Up in the
Arduino Contest 2019