Introduction: AVR Assembler Tutorial 3

About: I am interested in a wide range of things as shown in my list of interests. Almost anything creative is fun and worth trying.

Welcome to tutorial number 3!

Before we get started I want to make a philosophical point. Don't be afraid to experiment with the circuits and the code that we are constructing in these tutorials. Change wires around, add new components, take components out, change lines of code, add new lines, delete lines, and see what happens! It is very difficult to break anything and if you do, who cares? Nothing we are using, including the microcontroller, is very expensive and it is always educational to see how things can fail. Not only will you find out what not to do next time but, more importantly, you will know why not to do it. If you are anything like me, when you were a kid and you got a new toy it wasn't very long before you had it in pieces to see what made it tick right? Sometimes the toy ended up irreparably damaged but no big deal. Allowing a child to explore his curiousity even to the point of broken toys is what turns him into a scientist or an engineer instead of a dishwasher.

Today we are going to be wiring a very simple circuit and then getting a bit heavy into the theory. Sorry about this, but we need the tools! I promise we will make up for this in tutorial 4 where we will be doing some more serious circuit building and the result will be pretty cool. However, the way you need to do all of these tutorials is in a very slow, contemplative way. If you just plow through, build the circuit, copy and paste the code, and run it then, sure, it will work, but you won't learn a thing. You need to think about each line. Pause. Experiment. Invent. If you do it that way then by the end of the 5th tutorial you will be off building cool stuff and need no more tutoring. Otherwise you are simply watching rather than learning and creating.

In any case, enough philosophy, let's get started!

In this tutorial you will need:

  1. your prototyping board
  2. an LED
  3. connecting wires
  4. a resistor around 220 to 330 ohms
  5. The Instruction Set Manual: www.atmel.com/images/atmel-0856-avr-instruction-se...
  6. The Datasheet: www.atmel.com/images/Atmel-8271-8-bit-AVR-Microco...
  7. a different crystal oscillator (optional)

Here is a link to the complete collection of tutorials: https://www.instructables.com/id/Command-Line-AVR-T...

Step 1: Constructing the Circuit

The circuit in this tutorial is extremely simple. We are essentially going to write the "blink" program so all we need is the following.

Hook up an LED to PD4, then to a 330 ohm resistor, then to Ground. i.e.

PD4 ---> LED ---> R(330) ---> GND

and that is it!

The theory is going to be tough slogging though...

Step 2: Why Do We Need the Comments and the M328Pdef.inc File?

I think we should start by showing why the include file and the comments are helpful. None of them are actually necessary and you can write, assemble, and upload code in the same way without them and it will run perfectly well (although without the include file you may get some complaints from the assembler -- but no errors)

Here is the code we are going to write today, except that I have removed the comments and the include file:

.device ATmega328P
.org 0x0000
jmp a
.org 0x0020
jmp e
a: 
   ldi r16,0x05
   out 0x25,r16
   ldi r16,0x01
   sts 0x6e,r16
   sei
   clr r16
   out 0x26,r16
   sbi 0x0a,0x04
   sbi 0x0b,0x04
b:
   sbi 0x0b,0x04
   rcall c
   cbi 0x0b,0x04
   rcall c
   rjmp b
c:
   clr r17
d:
   cpi r17,0x1e
   brne d
   ret
e: 
   inc r17
   cpi r17, 0x3d
   brne PC+2
   clr r17
   reti

pretty simple right? Haha. If you assembled and uploaded this file you will cause the LED to blink at a rate of 1 blink per second with the blink lasting 1/2 second and the pause between blinks lasting 1/2 second.

However, looking at this code is hardly enlightening. If you were to write code like this you and wanted to modify it or repurpose it in the future you would have a hard time.

So let's put the comments and include file back in so that we can make some sense of it.

Step 3: Blink.asm

Here is the code we will discuss today:

;************************************
; written by: 1o_o7 
; date: <2014|10|29>
; version: 1.0
; file saved as: blink.asm
; for AVR: atmega328p
; clock frequency: 16MHz (optional)
;************************************

; Program funcion:---------------------
; counts off seconds by blinking an LED
;
; PD4 ---> LED ---> R(330 ohm) ---> GND
;
;--------------------------------------

.nolist
.include "./m328Pdef.inc"
.list

;==============
; Declarations:

.def temp = r16
.def overflows = r17


.org 0x0000              ; memory (PC) location of reset handler
rjmp Reset               ; jmp costs 2 cpu cycles and rjmp costs only 1
                         ; so unless you need to jump more than 8k bytes
                         ; you only need rjmp. Some microcontrollers therefore only 
                         ; have rjmp and not jmp
.org 0x0020              ; memory location of Timer0 overflow handler
rjmp overflow_handler    ; go here if a timer0 overflow interrupt occurs 

;============

Reset: 
   ldi temp,  0b00000101
   out TCCR0B, temp      ; set the Clock Selector Bits CS00, CS01, CS02 to 101
                         ; this puts Timer Counter0, TCNT0 in to FCPU/1024 mode
                         ; so it ticks at the CPU freq/1024
   ldi temp, 0b00000001
   sts TIMSK0, temp      ; set the Timer Overflow Interrupt Enable (TOIE0) bit 
                         ; of the Timer Interrupt Mask Register (TIMSK0)

   sei                   ; enable global interrupts -- equivalent to "sbi SREG, I"

   clr temp
   out TCNT0, temp       ; initialize the Timer/Counter to 0

   sbi DDRD, 4           ; set PD4 to output

;======================
; Main body of program:

blink:
   sbi PORTD, 4          ; turn on LED on PD4
   rcall delay           ; delay will be 1/2 second
   cbi PORTD, 4          ; turn off LED on PD4
   rcall delay           ; delay will be 1/2 second
   rjmp blink            ; loop back to the start
  
delay:
   clr overflows         ; set overflows to 0 
   sec_count:
     cpi overflows,30    ; compare number of overflows and 30
   brne sec_count        ; branch to back to sec_count if not equal 
   ret                   ; if 30 overflows have occured return to blink

overflow_handler: 
   inc overflows         ; add 1 to the overflows variable
   cpi overflows, 61     ; compare with 61
   brne PC+2             ; Program Counter + 2 (skip next line) if not equal
   clr overflows         ; if 61 overflows occured reset the counter to zero
   reti                  ; return from interrupt

As you can see, my comments are a bit more brief now. Once we know what the commands in the instruction set do we don't need to explain that in comments. We only need to explain what is going on from the point of view of the program.

We will be discussing what all of this does piece by piece, but first let's try to get a global perspective. The main body of the program works as follows.

First we set bit 4 of PORTD with "sbi PORTD, 4" this sends a 1 to PD4 which puts the voltage to 5V on that pin. This will turn on the LED. We then jump to the "delay" subroutine which counts out 1/2 a second (we will explain how it does this later). We then return to blink and clear bit 4 on PORTD which sets PD4 to 0V and hence shuts off the LED. We then delay for another 1/2 second, and then jump back to the beginning of blink again with "rjmp blink".

You should run this code and see that it does what it should.

And there you have it! That is all this code does physically. The internal mechanics of what the microcontroller is doing are a bit more involved and that is why we are doing this tutorial. So let's discuss each section in turn.

Step 4: .org Assembler Directives

We already know what the .nolist, .list, .include, and .def assembler directives do from our previous tutorials, so let's first take a look at the 4 lines of code that come after that:

.org 0x0000
jmp Reset
.org 0x0020
jmp overflow_handler

The .org statement tells the assembler where in "Program Memory" to put the next statement. As your program executes, the "Program Counter" (abreviated as PC) contains the address of the current line being executed. So in this case when the PC is at 0x0000 it will see the command "jmp Reset" residing in that memory location. The reason we want put jmp Reset in that location is because when the program begins, or the chip is reset, the PC starts executing code at this spot. So, as we can see, we have just told it to immediately "jump" to the section labeled "Reset". Why did we do that? That means that the last two lines above are just being skipped over! Why?

Well that is where things get interesting. You are now going to have to open up a pdf viewer with the full ATmega328p datasheet that I pointed to on the first page of this tutorial (that is why it is item 4 in the "you will need" section). If your screen is too small, or you have way too many windows open already (as is the case with me) you could do what I do and put it on an Ereader, or your Android phone. You will be using it all the time if you plan on writing assembly code. The cool thing is that all microcontollers are organized in very similar ways and so once you get used to reading datasheets and coding from them you will find it almost trivial to do the same for a different microcontroller. So we are actually learning how to use all microcontrollers in a sense and not just the atmega328p.

Okay, turn to page 18 in the datasheet and take a look at Figure 8-2.

This is how the Program Memory in the microcontroller is set up. You can see that it starts with address 0x0000 and is separated into two sections; an application flash section and a boot flash section. If you refer briefly to page 277 table 27-14 you will see that the application flash section takes up the locations from 0x0000 to 0x37FF and the boot flash section takes up the remaining locations from 0x3800 to 0x3FFF.

Exercise 1: How many locations are there in the Program memory? I.e. convert 3FFF to decimal and add 1 since we start counting at 0. Since each memory location is 16 bits (or 2 bytes) wide what is the total number of bytes of memory? Now convert this to kilobytes, remembering that there are 2^10 = 1024 bytes in a kilobyte. The boot flash section goes from 0x3800 to 0x37FF, how many kilobytes is this? How many kilobytes of memory remain for us to use to store our program? In other words, how big can our program be? Finally, how many lines of code can we have?

Alright, now that we know all about the organization of the flash program memory, let's continue with our discussion of the .org statements. We see that the first memory location 0x0000 contains our instruction to jump to our section we labeled Reset. Now we see what the ".org 0x0020" statement does. It says that we want the instruction on the next line to be placed at memory location 0x0020. The instruction we have placed there is a jump to a section in our code that we have labeled "overflow_handler"... now why the heck would we demand that this jump be placed at memory location 0x0020? To find out, we turn to page 65 in the datasheet and take a look at Table 12-6.

Table 12-6 is a table of "Reset and Interrupt Vectors" and it shows exactly where the PC will go when it receives an "interrupt". For example, if you look at Vector number 1. The "source" of the interrupt is "RESET" which is defined as "External Pin, Power-on Reset, Brown-out Reset, and Watchdog system reset" meaning, if any of those things happen to our microcontroller, the PC will start executing our program at program memory location 0x0000. What about our .org directive then? Well, we placed a command at memory location 0x0020 and if you look down the table you will see that if a Timer/Counter0 overflow happens (coming from TIMER0 OVF) it will execute whatever is at location 0x0020. So whenever that happens, the PC will jump to the spot we labeled "overflow_handler". Cool right? You will see in a minute why we did this, but first let's finish up this step of the tutorial with an aside.

If we want to make our code more neat and tidy we should really replace the 4 lines we are currently discussing with the following (see page 66):

.org 0x0000
rjmp Reset ; PC = 0x0000
reti       ; PC = 0x0002
reti       ; PC = 0x0004
reti       ; PC = 0x0006
reti       ; PC = 0x0008
reti       ; PC = 0x000A
...
reti       ; PC = 0x001E
jmp overflow_handler : PC = 0x0020
reti       : PC = 0x0022
...
reti       ; PC = 0x0030
reti       ; PC = 0x0032

So that if a given interrupt occurs it will just "reti" which means "return from interrupt" and nothing else happens. But if we never "Enable" these various interrupts, then they will not be used and we can put program code in these spots. In our current "blink.asm" program we are only going to enable the timer0 overflow interrupt (and of course the reset interrupt which is always enabled) and so we won't bother with the others.

How do we "enable" the timer0 overflow interrupt then? ... that is the subject of our next step in this tutorial.

Step 5: Timer/Counter 0

Take a look at the above picture. This is the decision making process of the "PC" when some outside influence "interrupts" the flow of our program. The first thing it does when it gets a signal from outside that an interrupt has occured is it checks to see if we have set the "interrupt enable" bit for that type of interrupt. If we haven't, then it just continues to execute our next line of code. If we have set that particular interrupt enable bit (so that there is a 1 in that bit location instead of a 0) it will then check whether or not we have enabled "global interrupts", if not it will again go to the next line of code and continue. If we have enabled global interrupts as well, then it will go to the Program Memory location of that type of interrupt (as shown in Table 12-6) and execute whatever command we have placed there. So let's see how we have implemented all this in our code.

The Reset labeled section of our code begins with the following two lines:

Reset:
   ldi temp,  0b00000101
   out TCCR0B, temp

As we already know, this loads into temp (i.e. R16) the number immediately following, which is 0b00000101. Then it writes this number out to the register called TCCR0B using the "out" command. What is this register? Well, let's head over to page 614 of the datasheet. This is in the middle of a table summarizing all of the registers. At address 0x25 you will find TCCR0B. (Now you know where the line "out 0x25,r16" came from in my un-commented version of the code). We see by the code segment above that we have set the 0th bit and the 2nd bit and cleared all of the rest. By looking at the table you can see that this means we have set CS00 and CS02. Now lets head over to the chapter in the datasheet called "8-bit Timer/Counter0 with PWM". In particular, go to page 107 of that chapter. You will see the same description of the "Timer/Counter Control Register B" (TCCR0B) register that we just saw in the register summary table (so we could have come straight here, but I wanted you to see how to use the summary tables for future reference). The datasheet continues to give a description of each of the bits in that register and what they do. We will skip all that for now and turn the page to Table 15-9. This table shows the "Clock Select Bit Description". Now look down that table until you find the line that corresponds to the bits that we just set in that register. The line says "clk/1024 (from prescaler)". What this means is that we want Timer/Counter0 (TCNT0) to tick along at a rate which is the CPU frequency divided by 1024. Since we have our microcontroller fed by a 16MHz crystal oscillator it means that the rate that our CPU executes instructions is 16 million instructions per second. So the rate that our TCNT0 counter will tick is then 16 million/1024 = 15625 times per second (try it with different clock select bits and see what happens - remember our philosophy?). Let's keep the number 15625 in the back of our mind for later and move on to the next two lines of code:

ldi temp, 0b00000001
sts TIMSK0, temp

This sets the 0th bit of a register called TIMSK0 and clears all of the rest. If you take a look at page 109 in the datasheet you will see that TIMSK0 stands for "Timer/Counter Interrupt Mask Register 0" and our code has set the 0th bit which is named TOIE0 which stands for "Timer/Counter0 Overflow Interrupt Enable"... There! Now you see what this is all about. We now have the "interrupt enable bit set" as we wanted from the first decision in our picture at the top. So now all we have to do is enable "global interrupts" and our program will be able to respond to these type of interrupts. We will enable global interrupts shortly, but before we do that you may have been confused by something.. why the heck did I use the command "sts" to copy into the TIMSK0 register instead of the usual "out"?

Whenever you see me use an instruction that you haven't seen before the first thing you should do is turn to page 616 in the datasheet. This is the "Instruction Set Summary". Now find the instruction "STS" which is the one I used. It says it takes a number from an R register (we used R16) and "Store direct to SRAM" location k (in our case given by TIMSK0). So why did we have to use "sts" which takes 2 clock cycles (see last column in table) to store in TIMSK0 and we only needed "out", which takes only one clock cycle, to store in TCCR0B before? To answer this question we need to go back to our register summary table on page 614. You see that the TCCR0B register is at address 0x25 but also at (0x45) right? This means that it is a register in SRAM, but it is also a certain type of register called a "port" (or i/o register). If you look at the instruction summary table beside the "out" command you will see that it takes values from the "working registers" like R16 and sends them to a PORT. So we can use "out" when writing to TCCR0B and save ourselves a clock cycle. But now look up TIMSK0 in the register table. You see that it has address 0x6e. This is outside the range of ports (which are only the first 0x3F locations of SRAM) and so you have to fall back to using the sts command and taking two CPU clock cycles to do it. Please read Note 4 at the end of the instruction summary table on page 615 right now. Also notice that all of our input and output ports, like PORTD are located at the bottom of the table. For example, PD4 is bit 4 at address 0x0b (now you see where all the 0x0b stuff came from in my un-commented code!).. okay, quick question: did you change the "sts" to "out" and see what happens? Remember our philosophy! break it! don't just take my word for things.

Okay, before we move on, turn to page 19 in the datasheet for a minute. You see a picture of the data memory (SRAM). The first 32 registers in SRAM (from 0x0000 to 0x001F) are the "general purpose working registers" R0 through R31 that we use all the time as variables in our code. The next 64 registers are the I/O ports up to 0x005f (i.e. the ones we were talking about that have those un-bracketed addresses beside them in the register table which we can use the "out" command instead of "sts") Finally the next section of SRAM contains all the other registers in the summary table up to address 0x00FF, and lastly the rest is internal SRAM. Now quickly, let's turn to page 12 for a second. There you see a table of the "general purpose working registers" that we always use as our variables. You see the thick line between numbers R0 to R15 and then R16 to R31? That line is why we always use R16 as the smallest one and I will get into it a bit more in the next tutorial where we will also need the three 16-bit indirect address registers, X, Y, and Z. I won't get into that just yet though since we don't need it now and we are getting bogged down enough here.

Flip back one page to page 11 of the datasheet. You will see a diagram of the SREG register at the top right? You see that bit 7 of that register is called "I". Now go down the page and read the description of Bit 7.... yay! It is the Global Interrupt Enable bit. That is what we need to set in order to pass through the second decision in our diagram above and allow timer/counter overflow interrupts in our program. So the next line of our program should read:

sbi SREG, I

which sets the bit called "I" in the SREG register. However, rather than this we have used the instruction

sei

instead. This bit is set so often in programs that they just made a simpler way to do it.

Okay! Now we have got the overflow interrupts ready to go so that our "jmp overflow_handler" will be executed whenever one occurs.

Before we move on, take a quick look at the SREG register (Status Register) because it is very important. Read what each of the flags represents. In particular, many of the instructions that we use will set and check these flags all the time. For example, later on we will being using the command "CPI" which means "compare immediate". Take a look at the instruction summary table for this instruction and notice how many flags it sets in the "flags" column. These are all flags in SREG and our code will be setting them and checking them constantly. You will see examples shortly. Finally the last bit of this section of code is:

clr temp
out TCNT0, temp
sbi DDRD,4

The last line here is pretty obvious. It just sets the 4th bit of the Data Direction Register for PortD causing PD4 to be OUTPUT.

The first one sets the variable temp to zero and then copies that out to the TCNT0 register. TCNT0 is our Timer/Counter0. This sets it to zero. As soon as the PC executes this line the timer0 will start at zero and count at a rate of 15625 times every second. The problem is this: TCNT0 is an "8-bit" register right? So what is the largest number that an 8-bit register can hold? Well 0b11111111 is it. This is the number 0xFF. Which is 255. So you see what happens? The timer is zipping along increasing 15625 times a second and every time it reaches 255 it "overflows" and goes back to 0 again. At the same time as it goes back to zero it sends out a Timer Overflow Interrupt signal. The PC gets this and you know what it does by now right? Yep. It goes to Program Memory location 0x0020 and executes the instruction it finds there.

Great! If you are still with me then you are a tireless superhero! Let's keep going...

Step 6: Overflow Handler

So let's assume that the timer/counter0 register has just overflowed. We now know that the program receives an interrupt signal and executes 0x0020 which tells the Program Counter, PC to jump to the label "overflow_handler" the following is the code we wrote after that label:

overflow_handler:
   inc overflows
   cpi overflows, 61
   brne PC+2
   clr overflows
   reti

The first thing it does is increment the variable "overflows" (which is our name for general purpose working register R17) then it "compares" the contents of overflows with the number 61. The way that the instruction cpi works is that it simply subtracts the two numbers and if the result is zero it sets the Z flag in the SREG register (I told you we would be seeing this register all the time). If the two numbers are equal then the Z flag will be a 1, if the two numbers are not equal then it will be a 0.

The next line says "brne PC+2" which means "branch if not equal". Essentially, it checks the Z flag in SREG and if it is NOT a one (i.e. the two numbers are not equal, if they were equal, the zero flag would be set) the PC branches to PC+2, meaning it skips the next line and goes straight to "reti" which returns from the interrupt to whatever place it was in the code when the interrupt arrived. If the brne instruction found a 1 in the zero flag bit it would not branch and instead it would just continue to the next line which would clr overflows resetting it to 0.

What is the net result of all this?

Well we see that every time there is a timer overflow this handler increases the value of "overflows" by one. So the variable "overflows" is counting the number of overflows as they occur. Whenever the number reaches 61 we reset it to zero.

Now why in the world would we do that?

Let's see. Recall that our clock speed for our CPU is 16MHz and we "prescaled" it using TCCR0B so that the timer only counts at a rate of 15625 counts per second right? And every time the timer reaches a count of 255 it overflows. So that means it overflows 15625/256 = 61.04 times per second. We are keeping track of the number of overflows with our variable "overflows" and we are comparing that number with 61. So we see that "overflows" will equal 61 once every second! So our handler will reset "overflows" to zero once every second. So if we were to simply monitor the variable "overflows" and take note of each time it resets to zero we would be counting second-by-second in real time (Note that in the next tutorial we will show how to get a more exact delay in milliseconds the same way that the Arduino "delay" routine works).

Now we have "handled" the timer overflow interrupts. Make sure you understand how this works and then move on to the next step where we make use of this fact.

Step 7: Delay

Now that we have seen that our timer overflow interrupt handler "overflow_handler" routine will set the variable "overflows" to zero once each second we can use this fact to design a "delay" subroutine.

Take a look at the following code from under our delay: label

delay:
   clr overflows
   sec_count:
     cpi overflows,30
   brne sec_count
   ret

We are going to call this subroutine every time we need a delay in our program. The way it works is it first sets the variable "overflows" to zero. Then it enters an area labeled "sec_count" and compares overflows with 30, if they are not equal it branches back to the label sec_count and compares again, and again, etc. until they are finally equal (remember that the whole time this is going on our timer interrupt handler is continuing to increment the variable overflows and so it is changing each time we go around here. When overflows finally equals 30 it gets out of the loop and returns to wherever we called delay: from. The net result is a delay of 1/2 second

Exercise 2: Change the overflow_handler routine to the following:

overflow_handler:
   inc overflows
   reti

and run the program. Is anything different? Why or why not?

Step 8: Blink!

Finally let's look at the blink routine:

blink:
   sbi PORTD, 4
   rcall delay
   cbi PORTD, 4
   rcall delay
   rjmp blink

First we turn on PD4, then we rcall our delay subroutine. We use rcall so that when the PC gets to a "ret" statement it will come back to the line following rcall. Then the delay routine delays for 30 counts in the overflow variable as we have seen and this is almost exactly 1/2 second, then we turn off PD4, delay another 1/2 second, and then go back to the beginning again.

The net result is a blinking LED!

I think you will now agree that "blink" is probably not the best "hello world" program in assembly language.

Exercise 3: Change the various parameters in the program so that the LED blinks at different rates like a second or 4 times a second, etc.
Exercise 4: Change it so that the LED is on and off for different amounts of time. For example on for 1/4 second and then off for 2 seconds or something like that.

Exercise 5: Change the TCCR0B clock select bits to 100 and then continue going up the table. At what point does it become indistinguishable from our "hello.asm" program from tutorial 1?
Exercise 6 (optional): If you have a different crystal oscillator, like a 4 MHz or a 13.5 MHz or whatever, change out your 16 MHz oscillator on your breadboard for the new one and see how that affects the blinking rate of the LED. You should now be able to go through the precise calculation and predict exactly how it will affect the rate.

Step 9: Conclusion

For those of you die-hards who made it this far, Congratulations!

I realize it is pretty hard slogging when you are doing more reading and looking up than you are wiring and experimenting but I hope you have learned the following important things:

  1. How Program Memory works
  2. How SRAM works
  3. How to look up registers
  4. How to look up instructions and know what they do
  5. How to implement interrupts
  6. How the CP executes the code, how the SREG works, and what happens during interrupts
  7. How to do loops and jumps and bounce around in the code
  8. How important it is to read the datasheet!
  9. How once you know how to do all this for the Atmega328p microcontroller it will be a relative cake walk to learn any new controllers that you are interested in.
  10. How to change CPU time into real time and use it in delay routines.

Now that we have a lot of theory out of the way we are able to write better code and control more complicated things. So the next tutorial we will be doing just that. We will build a more complicated, more interesting, circuit and control it in fun ways.

Exercise 7: "Break" the code in various ways and see what happens! Scientific curiousity baby! Somebody else can wash the dishes right?

Exercise 8: Assemble the code using the "-l" option to generate a list file. I.e. "avra -l blink.lst blink.asm" and take a look at the list file.

Extra Credit: The un-commented code that I gave at the beginning and the commented code that we discuss later differ! There is one line of code that is different. Can you find it? Why doesn't that difference matter?

Hope you had fun! See ya next time...