Introduction: Arbitrary Wave Generator With the Raspberry Pi Pico
Just two weeks ago, the pico, a new microcontroller, the pico, was released by the Raspberry Pi Foundation, well known for the incredibly successful series of Raspberry Pi single-board computers. The new microcontroller uses a brand new chip, designed in-house, the RP2040. It has two 32-bit cores running by default at 125MHz. It has been criticised for not having Wifi or Bluetooth, and no hardware floating point math. But it has a very fast internal bus and powerful peripherals. It has been designed for makers and has very strong support: it was released with 6 detailed datasheets and a beginner’s guide book, which is available free of charge as pdf. Best of all, it is cheap at $4. I got 5 for under 30EUR including shipping.
As a test I wanted to see if any of my previous projects based on the Arduino Uno/Nano could benefit from a remake with this much more powerful board: After all it has 4x the bus width, 8x the clock frequency, 130x the RAM, and is more than a decade more modern. My choice fell on the Arbitrary waveform generator (AWG). With the Arduino, I managed to squeeze out 381ksps, since every sample update took 42 instruction cycles, mostly because updating a 32-bit phase counter takes a quadruple loop with an 8-bit CPU. My expectation was that it should be possible to improve this by a factor 8 just from clock speed and maybe another factor 2 because the new board is 32-bit. However, after reading selected parts of the 637-page datasheet of the new RP2040 chip, I realised it might be possible to update every single clock cycle! Just by initialising 2 peripherals, the DMA (Direct Memory Access) and the PIO (programmable Input/Output), an array can be cyclically streamed to the output pins.
Indeed, it works, and the increase in speed with the Arduino is more than a factor 300, from 381ksps to 125Msps. That is similar to serious lab-AWGs, which cost ~100EUR for budget models. There is no attempt here to provide a buffer or amplifier for the produced signal, it is beyond my skills and my equipment to come up with a buffer/amplifier beyond the 10MHz range. The produced signal is thus rather weak, with an output impedance of ~1kOhm, and a maximum current draw of ~1mA. Suggestions for a buffer/amplifier are welcome in the comments! There is no attempt either to provide a dedicated user interface in terms of a screen, buttons, rotary encoders etc. That adds cost and complexity. I found it is much more convenient and much more powerful to set the requested waveform in the micropython code itself!
For comparison, several instructables (e.g. here, here and here) describe how to make a function generator based on the dedicated AD9833 chip. This chip runs at 25Msps and can generate only 3 predefined waveforms: sine, triangle and square. The pico is 5x faster and can generate any possible wave that fits in an array, up to many thousands of points.
Supplies
Required materials:
- Raspberry pi pico microcontroller with male pin headers
- 1 5x7cm prototype board
- 2 20-pin female pin headers
- 23 resistors of identical value, near 2kOhm
Step 1: Construction of the R2R DAC
The pico itself cannot produce analog signals, it does not have a digital-to-analog converter (DAC). But they are simple to build from resistors. We use here the classic R2R DAC. An 8-bit DAC requires 7 resistors of value R and 9 resistors of value 2R. The actual value of R is not critical, but it is essential that all ‘R’ resistors have the same value and that all 2R resistors have double that value. In practice, this is best achieved by using 23 resistors of 2R, and putting 7 pairs of them in parallel to create the 7 ‘R’ resistors. Resistors with a power rating of 0.25W and a tolerance of 1% are cheap in packs of 100, and that is what I recommend to use. I had an unused pack of 2kOhm resistors. I think any value of R in the range 1kOhm-10kOmh will be fine. Smaller values will draw more current than the Pico can provide and larger values result in poor performance because there is too little current to counteract parasitic capacitance and/or inductance at high frequencies.
I numbered all the resistors, measured their values and put them in a spreadsheet. None differed by more than 1% from the nominal value, but by selecting I could reduce the effective spread from 1 percent to 1 per mille. For the ‘2R’ resistors I picked a value near the mean of which there were at least 9, which happened to be the value 2000. Then I picked 7 pairs that were equally distant from the value 2000, for example 3 pairs of 1998+2002 Ohm and 4 pairs of 1997+2003 Ohm. In parallel, equal but opposite deviations cancel!
Solder the female headers to the prototype board such that the pico fits comfortably. Now solder the resistors according to the provided schematics and pictures. Note that the resistors are mounted vertically only to save space. Feel free to mount them horizontally if space allows. The board layout is done with KiCad. I did not make a PCB, but a PCB layout helps to solder effectively on the 5x7cm prototype board. Note that my build differs slightly from the KiCad design, I made some improvements to the design after soldering it up.
At the end I added to two male header pins: one for the signal and one for ground. Probe clips and crocodile clips attach well to them.
Step 2: Uploading the Software and Run
The pico can be programmed in C or micropython. Using micropython is much easier since you don’t have to install the ‘C-SDK’. Micropython may be 100x slower than C, but here it doesn’t matter. All the code does is making an array with a waveform and then instructing the peripherals to stream that array to the output pins. The CPU is idle and free to perform other tasks.
Make sure your pico has the micropython UF2 file uploaded. I used version rp2-pico-20210205-unstable-v1.14-8-g1f800cac3.uf2. Version 1.13, which was originally provided on release day, misses the 'uctypes' module. I used the Thonny IDE to upload the python script, I had never heard of that before but it works well, at least for small scripts like this.
The code as is will run though a few hundred example waves (see video at step 1). To modify this, scroll to line 155 and set up your own wave.
There are 6 basic pulse shapes: sine, pulse, gaussian, sinc, exponential and noise. The sine has no extra parameters, but the pulse has 3: risetime, uptime and falltime. Gaussian, sinc and exponential have one parameter, which determines their width, and noise has one parameter, which determines the pdf of the output values, from 1 (flat) to 8 or larger (nearly gaussian).
A wave will have a shape, but also amplitude and offset. The shape can be replicated within a period, which increases its frequency. That way, operations like summing or multiplying can be done with waves of different frequencies. A non-zero phase shift can also be set, but its effect is only noticeable when combining waves.
Waves can be combined by either summing, multiplying or applying a phase modulation with another wave. The resulting combined wave can then be added, multiplied or modulated with another basic or complex wave. The screenshots and video show just a few examples of how complex shapes can be made.
There is only one value for the frequency, even when combining waves. Operations are not performed on-the-fly by the CPU, but stored in a buffer and played by the DMA/PIO. The 'duplication' keyword however does allow for a shape to fill at double, tripe, quadruple etc values of the (base) frequency.
The buffer size (value of maxnsamp) determines how complex the shape can be made, and how accurate the frequency can be set. For simple waves, like single sine or wide pulses, 1024 samples may be sufficient. A buffer size of 65536 (64kB) is the practical maximum. Filling the buffer may take 20-60s at that size!
Step 3: Comments About the Code
I admit the code looks obscure: most of it is based on direct register access, and require studying the 637-page datasheet of the RP2040 chip to understand. But I’ll try to explain the thought behind it.
The crucial peripheral is the so-called Direct Memory Access (DMA) module of the chip. The DMA can be instructed to perform block copies between memory and peripherals without requiring the attention of the CPU. Really, I had no clue this existed just 2 weeks go either! It is like having an assistant whom you tell to do shopping for you!
Anyway, this DMA is being told to transfer the contents of the array with the waveform to another peripheral (the PIO) which will put the values on the output pins. One complication is that this DMA needs to do this cyclically, and without interruption. For that, a second DMA channel is instructed to reconfigure and restart the first DMA channel as soon as it is done. This is called ‘chaining’. So channel 0 does the transfer, and passes the stick on to channel 1. But channel 1 immediately tells channel 0 to restart. You might expect a delay in this swapping between the channels, but that delay is absorbed in buffers: the DMA transfers 32-bit words, while the PIO only ‘eats’ 8-bit bytes. So the PIO still has some snacks in its buffer while the DMA is losing 1 or 2 cycles to start over again. I am amazed by the engineers and scientists who came up with such intricate hardware!
The DMA cannot control the pins directly, but there is something better: the pico has 2 Programmable Input/Output (PIO) units, which have 4 processing units themselves, (called 'state machines'). They are really 8 tiny microcontrollers inside the microcontroller itself. Here, only 1 state machine is used, and it is programmed with a single command (would this qualify for a Guinness World record of smallest computer program?) The command is ‘out(pins,8)’ which instructs the state machine to pass 8 bits from its buffer to the output pins. Wrapping is implied, so the state machine just keeps doing this single command, every clock cycle. As the 8 bits are shifted to the pins, the buffer will request to be refilled by the DMA when 32 bits have been consumed.
And that’s all. So the code consists of
- The configuration of the DMA
- The configuration and programming of the PIO
- The filling of the array with the waveform
- Starting up the DMA.
At that moment the waveform is produced and the CPU’s of the pico are free for other tasks. There will be data traffic on the bus, but the pico has a highly parallel bus structure, and I expect no noticeable slow-down.
Step 4: Ideas for Improvement
I posted this instructable a bit prematurely to demonstrate the capabilities of this new microcontroller. I hope it will motivate some of you to get a board and try out its new features.
For this AWG project I have several ideas on how to improve it:
- It might very well be possible to have higher resolution (10 or 12 bit) and/or a second channel.
- The RP2040 can be overclocked to 250 MHz or more, resulting in 250Msps AWG
- Sweeps of frequency or other parameters can be implemented in the python code after setup.
- A real AWG needs a buffer and/or amplifier to reduce the output impedance and extend the voltage swing.
- A well-designed PCB and SMD resistors might reduce parasitic capacitance or inductance and improve the bandwidth.
- A screen and buttons, or a touch-screen could make it into a self-confined apparatus, like the commercial devices.
Step 5: Appendix on Data Throughput
Several other users have extended the AWG with a larger number of output bits, either for higher precision, a second channel, or both. When doing so, it appears the data throughput is not always able to hold up to the sampling speed. I have investigated this a bit and tell here the current status of my findings.
The key quantity is the number of samples that can be stored in a 32-bit word. In the present project that was 4: every 32-bit word contains 4 samples of 8 bits each. Thus, the DMA only has to run at one quarter of the clock speed and it does so without any problem.
For a 10-bit DAC, 3 samples can be stored in a 32-bit word. I found the present setup results in a small hiccup at the restart of the DMA, which, however is resolved by setting fifo_join=PIO.JOIN_TX in the PIO decorator. This enlarges the FIFO that transmits to the PIO from 4 to 8 words, at the cost of the (here unused) FIFO that receives from the PIO.
For an 11-bit DAC, a 12-bit DAC or a 2-channel 8-bit DAC, a 32-bit word can only fit 2 samples. Here I would have expected that the DMA would be able to cope, but it does not. Apparently, in the present setup, a 32-bit word transfer is done at most every other clock cycle. There may be a way to solve this, but at present I have no idea how. In any case, with the DMA just able to follow up with the PIO, there is a substantial delay when the DMA reconfigures. This is solved by slowing down the PIO by a factor two, but of course this results in the sampling frequency now being only half the clock speed.
For a 2-channel DAC beyond 8 bits, only one sample fits in a 32-bit word. Hiccups can be avoided by slowing down the PIO by a factor three.
Of course a given piece of hardware can be configured either for speed (up to 10 bits, with the remainder unused) or for precision/channels (e.g. 2x11 or 12 bits) running at one third of the sampling speed. Best would be to force the unused pins to zero (not done in the attached script).
An updated script is attached here, which allows to demonstrate and test these scenarios. It is set up for a 2-channel 11-bit DAC, running from pin 0 to pin 21, with the first channel going from pin 0 to pin 10 and the second channel from pin 11 to pin 21. The second channel however has inverted bit order: the MSB is on pin 11 and the LSB on pin 21. This was done to reduce cross-talk, but it has the additional advantage that the most significant bits are in the middle, which makes it possible to run it as a 2-channel 8-bit DAC with consecutive pins. The script now also allows to set the clock speed anywhere between 100 and 250 MHz: many have reported that overclocking by up to a factor two from the nominal 125MHz is OK. In addition, matching the clock frequency with the output frequency can result in a particularly stable output wave.
Attachments

Participated in the
Microcontroller Contest
93 Comments
Question 6 weeks ago
Is it possible that wifi is interrupting this program? If I try using wifi and this program at the same time on pico w it doesn't make soud
Answer 6 weeks ago
Sorry, I have no idea, I don't have a pico W. In principle waveforms are generated 100% in the background by DMA an PIO, so it should not be incompatible with WiFi. However, there may be other issues, e.g. the names and addresses of the registers
Question 4 months ago
Many thanks for this brilliant project that is extremely well explained! I have plenty programming experience, but am a newbie with electronics and microprocessors so have learned a lot! I built and tested the DAC on a breadboard and worked through and understood much of the code, tweaking it to produce a 10Hz sine wave to power an LED where I can measure the frequency by eye. It works perfectly when I powered up from scratch the first time. Brilliant ;-) !!!
I am slightly bamboozled on one point though. If I stop and restart the programme the LED goes to full on (corresponding to DAC output with all 8 GPIO pins pulled high) and then just freezes, so no sine wave. I suspect an initialisation problem that doesn't happen when the pico is powered up from scratch. Do you have any thoughts on what might be going wrong? Do I need to kill the state-machine or any other processes that may have been left running on the pico from the previous attempt to run the code?
Really beautiful project though, and many thanks!
Answer 4 months ago
Glad you liked it. I've noticed too that with Thonny and micropython the PICO often needs a rest. I have resolved it by always having an easy way to do a hardware rest (connecting ground to pin 30 ('RUN'): on the breadboard with a jumper wire, on a protoboard with a button. Sorry I have not investigated it in more detail, maybe some other reader here understood and solved it, if so, pleas share!
Reply 4 months ago
Thanks for the fast reply. Just implemented your reset switch suggestion. That is fine and much cleaner than reinserting the usb. Thought there should be a clean way to do that in software, but maybe there is a good reason why not!
Reply 4 months ago
I am working in Thonny. I added lines to stop the DMA, and also disable the individual channels.
DMA_ABORT = DMA_BASE + 0x444
@micropython.viper
def stopDMA():
mem32[DMA_ABORT] = 0xFFFF
while int(mem32[DMA_ABORT]) != 0:
time.sleep_us(0.05)
mem32[CH0_AL1_CTRL] = mem32[CH0_AL1_CTRL] & 0xFFFFFFFE
mem32[CH1_CTRL_TRIG] = mem32[CH1_CTRL_TRIG] & 0xFFFFFFFE
So, before I edit/restart/save in Thonny, at the command line I enter
>>>stopDMA()
>>>sm.active(0)
Also, clearing out the PIO is necessary,
PIO(0).remove_program()
Then, saving/stopping/etc works.
4 months ago
Thanks for this - very well explained and discussed! I'm trying to use this to make an audio synthesizer. Every time I give it a new frequency I hear a pop because there is a phase jump - the new wave just starts immediately in the middle of the previous one. Any thoughts on how to address this? Can I make the state machine wait til the wave is done? Or something else?
Thanks for any thoughts...
Reply 4 months ago
Hi. The method applied here (DMA+PIO) does well at very high frequencies, but for dynamically changing the signal it is tricky to synchronize the CPU and the DMA/PIO. For audio I recommend to follow a completely different approach, and do everything with the CPU. See for example https://www.instructables.com/Arduino-Synthesizer-With-FM/ where I implemented audio synthesis with an Arduino. It has the additional advantage that there is no need for an R2R DAC, PWM is sufficient to generate audio-frequencies on a single pin. With the PICO, being at least 8x faster than the Arduino, it should be possible to synthesize spectacular audio.
Question 2 years ago
Can you explain more about how I set it up?
Where do you choose the signal output, frequency, offset, ...?
Don't you lose bits by using an offset and amplitude in the software?
Answer 2 years ago
The first wave is set in line 165:
setupwave(wavbuf[ibuf],freq,wave1); ibuf=(ibuf+1)%2
You can take it out of the loop and fill in your values of the wave properties in lines 155-162, then delete all the code below it.
Yes, resolution is compromised by doing amplitude and offset in software, and it would be better do offset and amplifictation/reduction at the analog level. It takes some skill in fast analog electronics to do that well. Nevertheless, a software amp/offset remains needed to construct hybrid shapes that are sums, products or phase-modulations.
User sdwood68 has been giving some suggestions for a suitable opamp/buffer, I'd be happy to hear some more ideas!
Reply 10 months ago
hi rgco,
this all works great, i isolated a wave. would like to see it in C++ on the arduino IDE, I'm using it, good on the pi pico, not yet on the xiao rp2040.
when you set up the registers of a computer, is it possible to run an 'if then'
statement to control different waves? I can't seem to do that no matter where I put the code.
10 months ago
1. Simple example of how to use 155 -165 section and delete all following. Doesnt it need freq etc?
2. Where in the script does it cycle thru types of waves??? I got the rest working
Question 10 months ago on Introduction
Thank you so much for sharing this! Is there any chance of changing the 8-bit output (thus removing the need for the R2R DAC) by using that 8 bit value to produce a PWM output on a single pin? You'll need to make sure the frequency of the PWM generator was large enough not to spoil the intended signal.
Answer 10 months ago
Sure, it'd be a big hit in frequency range though, you'd go from 125 Msps to 0,5 Msps for an 8-bit PWM. But it'd be fine for audio and even a bit of ultrasound. I guess it could be done either with PIO or with the actual PWM.
Reply 10 months ago
Audio was in the back of my mind when asking my question ('emulate' one of the retro 90s Sound Cards). On using 'machine.PWM()' - I've no idea on the range of PWM frequencies allowed on the Pico's version of Micropython but it would have to be over 16Khz to avoid young people hearing the PWM freq mixed in with your wanted audio.
People using your AWG - may be interested in using Bourne's 4116R chips from Mouser to avoid the hassle of soldering their own ladder - and removing errors down to errors in resistor values.
Thank's again - for your write up. I may build an AM radio using your AWG routine!
1 year ago
Great project! I was surprised to see what small Pico Pi can do due to its fast DMA/PIO transfers (too bad that most ARM processors have much slower GPIOs).
I'm trying to understand pros/cons of this approach and I have a few questions that you may be able to address.
1. Firstly, I'm curious about the relationship between sampling rate and data buffer size (I'm accustomed to C and Python is a bit confusing for me). If I understand your code correctly, it calculates optimal number samples in the LUT and adjusts clock to match it (at least to some extent). Then you store samples for 360° of waveform period. That's just my guess considering that in your scope shots I didn't see any obvious continuous waveform truncation.
However, considering the slight mismatch between sampling rate (clock) and the number of samples in LUT I guess that there could be introduced some slight mismatch between the set frequency of the waveform and the real waveform produced by R2R.
Did you perhaps check out if there is indeed a frequency mismatch of set wave and really produced and how big it is?
2. Secondly, I'm curious to understand how you handle low frequency waveforms. I mean if you have high MSPS then LUT size for the low frequency signal period would greatly exceed the number of samples in LUT. For example, 1 Hz waveform would require at a few million discrete samples. The way I understand it, the only seeming way to use LUT size of 65536 samples (8-bit) would be to repeat sending a number of samples in the LUT . In essence one would have to skip reading each value in each clock and re-use old values a few times before reading and using next ones in the LUT.
For example, for 12 MSPS and waveform of 1 Hz one would need 12'000'000 discrete samples. With LUT size of 65536 bytes it would mean that you would have to re-send one value 183 times (150000000 / 65536 = 183.1) before sending the next value in the LUT. In reality it would lower waveform resolution to 183 values for 1 Hz waveform. However, you're using automatic DMA/PIO transfers so repeating same sample several times is obviously not the case.
So, how are you pulling off 1 Hz waveform with very high sampling rate? I'm obviously missing something here.
In any case, you've done beautiful work and I thank you for sharing your work.
Reply 1 year ago
You've been having a very detailed look indeed!
Concerning 1) the frequency mismatch: it is approximately inversely proportional to the buffer size. So for a 1k buffer size the frequency mismatch is O( 10^-3) The RAM can be pushed to ~100k samples to 10^-5 frequency mismatch (or frequency steps) are possible but better not. To get more precise requires a phase-accumulator approach as was done in my previous (linked in the intro) Arduino-based project. It is my understanding that serious AWG's indeed implement a phase accumulator in FPGA, I don't think it's possible with the PIO but can be done in the CPU, but the sample rate will go down a lot.
Concerning 2) Low frequencies are achieved with prescaling the PIO clocks. It's true I did not comment on that in the description. The pio clock can be lowered by up to a factor 2^16, so a period of 1 second can be reached even with a 2k buffer.
Reply 1 year ago
1. FPGA approach of calculating each sample on the run by using phase accumulator is, of course, the best and it is usually used considering their speed of math and IO. However, from what I see, currently FPGA market is dry with hardly any reasonably capable model available.
Phase accumulator approach of calculating sample to sample could be done by faster ARM by using fixed point math (even on processors with FPU there is some speed gain by using FPM) and approximation based sin(). However, I was surprised to see that great majority of ARM processors have rather slow GPIO, which would inevitably negate processing speed. There is a new ARM by Nexperia that boasts 600 MHz core and fast GPIO but cost is high and availability is even worse than FPGA
Also, SBC like Raspberry Pi could be used in real time phase accumulator sample to sample calculation but, surprisingly enough, it also has very slow GPIO and on top of that it has variable latency could be problem (no way to precisely time output). Using PCIe port on RPI and aAdding external FIFO would most likely solve latency issues but then again one still needs fast device to read FIFO and time samples precisely and we're back to square one.
2. Prescaling the clock is quite good solution considering that RP2040 is capable in that department. However that limits capabilities of any AM modulation. My reasoning with implementation of "skipping" frames (in fact a kind of software prescaler) was that in that case one could generate two LUTs and use timer interrupts to do just a simple math (multiplication) of samples from two LUTs. There would still be present inevitable frequency mismatch but at least AM modulation would be possible without waveform truncation.
That solution would require implementing header (a few bytes) at the beginning of LUT to store size of LUT, counter of present position in LUT, number of samples to be repeated and counter of repeated samples. Considering that all is done in integer math the time spent in the interrupt would be short.
Very crude pseudo code example in C would be something like this for uint16_t LUT. It likely contain errors and is far from optimal working code but I'm just trying to convey the basic idea. Obviously, it would be much better to use uint8_t array to increase number of available samples in SRAM and in that case global variables or better yet global struct could be used to uint16_t header values (counters). All of this would take some time but it would still be much faster than dynamically calculating each sample in ISR (consider it a "hybrid" of LUT and dynamic calculation).
//two values for multiplication
uint16_t WavesToMultiply[2] = {0, 0};
//initializing LUT0 (malloc() could be used if careful)
//four words in header to hold counters
//this wave is low frequency modulating
uint16_t Wave0 [CalculatedSizeOfLUT0 + 4] =
{CalculatedSizeOfLUT + 4,
NoOfSamplesToSkip,
CounterOfSkippedSamples,
CounterOfPositionInTheLUT,
0, 0, 0...};
//initializing LUT1 (malloc() could be used if careful)
//four words in header to hold counters
//this wave is high frequency carrier
uint16_t Wave1 [FixedSizeOfLUT1+ 4] =
{CalculatedSizeOfLUT1 + 4,
NoOfSamplesToSkip,
CounterOfSkippedSamples,
CounterOfPositionInTheLUT,
0, 0, 0...};
//fill LUT0 with the calculated samples
for
(uint16_t n = CalculatedSizeOfLUT0 + 4; i <
CalculatedSizeOfLUT0; n++){
}
for (uint16_t n = CalculatedSizeOfLUT1 + 4; i < CalculatedSizeOfLUT1; n++){
ISR (...){
//go through LUT0 and duplicate samples if necessary
[NoOfSamplesToSkip] != 0 {Wave0[CounterOfSkippedSamples] += 1;
if (Wave0[CounterOfSkippedSamples] > Wave0) [NoOfSamplesToSkip]){
Wave0[CounterOfSkippedSamples] = 0;
Wave0[CounterOfPositionInTheLUT] += 1;
SamplesToMultiply[0] = Wave0 [CounterOfPositionInTheArray];
}
else
SamplesToMultiply[0] = Wave0 [CounterOfPositionInTheLUT];
//go through LUT1 and duplicate samples if necessary
Wave1
[CounterOfSkippedSamples] += 1;
if (Wave1[CounterOfSkippedSamples] > Wave1 [NoOfSamplesToSkip]){
Wave1[CounterOfSkippedSamples] = 0;
Wave1[CounterOfPositionInTheLUT] += 1;
SamplesToMultiply [1] = Wave1 [CounterOfPositionInTheLUT];
}
else
SamplesToMultiply[1] = Wave1 [CounterOfPositionInTheLUT];
//multiply current samples and send it to GPIO (perhaps by using DMA if faster)
SendToGPIO(SamplesToMultiply[0] * SamplesToMultiply[1]);
}
Question 1 year ago
How to generate sweep frequency using pi pico ?
Answer 1 year ago
No, this method is not suitable for sweeps.