In the reading “Introduction to ARM Assembly Language” in the section on Data Directives, we saw how to define an array in AL.  The data directive DCB, DCW, or DCD defines an array of bytes, halfwords, and words, respectively, initialized to the values specified in the operand field of the data directive.  The size of the array is equal to the number of initial values.  An array can also be defined using the % directive, which defines an array of N bytes, where N is the value of the expression in the operand field.  This array can be used as an array of bytes, halfwords, or words, depending on how it is referenced (e.g., with LDRB, LDRH, or LDR).  For example,


        BA DCB "a+b+c",0     ;an array of 6 bytes (a character string is stored

                              one character per byte in a byte array)

        HA DCW 31,511,­1     ;an array of 3 halfwords (6 bytes)

        WA DCD 127,438687148 ;an array of 2 words (8 bytes)

        UA %   12            ;an array of 12 bytes, 6 halfwords, 3 words with

                              every byte initialized to zero


Note that the Java (and C/C++) convention is that a character string is always automatically terminated with a null character (ASCII code for nul is 0016), but in AL we must explicitly include the terminating nul.


6.1 Constant Indexes and Subscripts


We diagram an array as a sequence of horizontal boxes (leftmost box has the lowest address) or vertical boxes (topmost box has the lowest address).  Each box represents one byte in PM and contains the hex value that is stored in that location.  The array BA defined above would be diagrammed as


BA[0]       BA[1]     BA[2]      BA[3]     BA[4]       BA[5]      C reference








contents of byte in PM








PM address








AL reference



In C, an array element is referenced by suffixing its name with a subscript enclosed in square brackets.  In AL,an array element is referenced by suffixing its name with a plus sign followed by an index, except that an index of zero can be omitted.  Recall that in AL, the value of a variable name is its PM address, so BA+3 is the PM address of the byte that is three bytes away from the byte whose PM address is BA.  The PM address of the first element in an array is called its base address.  The subscript of the first element in a C array is always zero and the index of the first element in an AL array is also always zero.


It appears that an index is the same as the corresponding subscript.  This is true only for byte arrays.  In C, subscripts count array elements, while in AL, indexes count bytes.  Thus for byte arrays, indexes and subscripts are equal.  In halfword and word arrays however, each element is two bytes and four bytes respectively.  For example, the array HA defined above would be diagrammed as


HA[0]       HA[1]       HA[2]           







contents of byte in PM








PM address





AL reference



Since each box in an array diagram is one byte, in the diagram for a halfword array each element is a pair of boxes.  The ARM is little-endian, so the first of the two boxes in a pair is the lo byte of the halfword and the second box in

the pair is the hi byte of the halfword.  Thus, the element whose Java subscript is 1 occupies the two bytes in PM at addresses HA+2 and HA+3, with its lo byte in HA+2.  In AL, a multi-byte unit is always referenced using the PM address of the lo byte of the unit.  In AL, the element HA[1] is referenced by HA+2, which is the PM address of its lo byte.  The array WA defined above would be diagrammed as





Java reference










PM contents










PM address




AL reference



A word is four bytes, so the PM addresses of successive elements differ by four.  In general, the AL reference corresponding to the C reference A[con], where con is a constant, is A+k, where k is con multiplied by the length of the elements in A.  In other words, the index corresponding to an array subscript equals the subscript multiplied by the number of bytes in the elements of the array.


6.2 Variable Indexes and Subscripts


One might jump to the incorrect conclusion that the AL reference corresponding to the C reference A[i], where i is a variable, would be A+i.  This does not work because arithmetic expressions in AL are evaluated by the assembler before the program is actually executed.  Thus, every element in an AL expression must be a constant or the name of a constant.  The expression A+i in an instruction such as

                        LDR  R0,A+i


would be evaluated by the assembler as a PM address y  =  add(A)+add(i) and y would be put in the LDR instruction as the address of the word to load into register R0.  It is unlikely that y will be the address of any element in the array A.  Furthermore, no matter how the value of i changes during execution of the program, the PM location referenced by this LDR instruction will always be y, that is, it will always reference the same PM location.  In order to get the effect of a variable subscript during execution of a program, we need to be able to change the address an instruction uses to reference PM.


The simplest mechanism for varying the address used by an instruction is indexed addressing.  In the ARM we put the array's base address in one register, called the base register, and the index in another register, called the index register.  For example, suppose the base address of the array BA, which is defined above, is in R5 and the index is in R6.  Then the instruction


                        LDRB  R0,[R5,R6]


will load the byte whose PM address equals cont(R5)+cont(R6) into the lo byte of R0.  So if the contents of R6 is varied from 0 to 5 by increments of one, the LDRB instruction will refer to each of the elements BA[0] through BA[5] in turn.  Sequencing through the elements of HA or WA is essentially the same except that the contents of the index register must be incremented by two or four, respectively, instead of one (and also the base register must contain the base address of HA or WA).  The amount by which a variable index is incremented (or decremented) is equal to the number of bytes in the array elements.  To get the base address of an array Aname into a register, we use one of the following


                        ADR   R5,Aname

                        ADRL  R5,Aname

                        LDR   R5,=Aname


ADR can only be used when the base of Aname is within 255 bytes, if not word-aligned, or 1020 bytes, if word-aligned, of the ADR instruction.  The assembler generates one of the following instructions


                        ADD  R5,PC,#con

                        SUB  R5,PC,#con


where con is such that when appropriately shifted and added to, or subtracted from, the current contents of PC, the result is the base address of Aname.  If the assembler is unable to do this, it will indicate an error and one of the

other forms must be used.  ADRL can only be used when the base of Aname is within 64K bytes, if not word-aligned, or 256K bytes, if word-aligned, of the ADRL instruction.  The ADRL always generates two ML instructions that combine the hi and lo 8 bits of the distance between the ADRL and the array base, by shifting and adding to or subtracting from the PC contents to form the address of Aname.  If neither of these forms works, then


                        LDR  R5,=Aname


must be used.  This form is the slowest because the address is stored in PM as a 32-bit constant, not constructed from immediate data in the instructions as it is for both ADR and ADRL.


6.3 Indexing and Arrays


Suppose A is a byte array, B a halfword array, and C a word array.  If the loop variable is kept in a register then it can be directly used as the index.  For example, to sum the elements in A modulo 28, assuming they are naturals 


        Sum = 0;                                 MOV  R0,#0

        for ( i = 0; i <= 5; i++ )               ADR  R5,A

                                                 MOV  R6,#0

        {                                   AGN  CMP  R6,#5

                                                 BHI  NXT

             Sum = Sum+A[i];                     LDRB R1,[R5,R6]

                                                 ADD  R0,R0,R1

        }                                        ADD  R6,R6,#1

                                                 B    AGN

                                            NXT  STRB R0,Sum


Note that an index is always 32 bits.  To sum the elements in B modulo 216, assuming they are integers


        Sum = 0;                                  MOV   R0,#0

        for ( i = 0; i <= 5; i++ )                ADR   R5,B

                                                  MOV   R6,#0

        {                                    AGN  CMP   R6,#10

                                                  BHI   NXT

             Sum = Sum+B[i];                      LDRSH R1,[R5,R6]

                                                  ADD   R0,R0,R1

        }                                         ADD   R6,R6,#2

                                                  B     AGN

                                             NXT  STRH  Sum


and to sum the elements in C modulo 232


        Sum = 0;                                  MOV  R0,#0

        for ( i = 0; i <= 5; i++ )                ADR  R5,C

                                                  MOV  R6,#0

        {                                    AGN  CMP  R6,#20

                                                  BHI  NXT

             Sum = Sum+C[i];                      LDR  R1,[R5,R6]

                                                  ADD  R0,R0,R1

        }                                         ADD  R6,R6,#4

                                                  B    AGN

                                             NXT  STR  Sum


Comparing the three code sequences, the only difference is the amount by which the index register (R6) is incremented, the instructions used to load and store the array elements, and of course the address that is loaded into the base register (R5).


6.4 Register Indirect Addressing


This section applies only to the memory reference in load and store instructions, such as in


                        LDR  Rd,memref

                        STRB Rs,memref


Unless stated otherwise, all comments apply equally to all of the load and store instructions, even though we will use only LDR in our examples.  We divide the load and store instructions into two groups: group LS1 consists of the  LDR,  LDRB,  STR, and STRB instructions and group LS2 consists of the LDRSB,  LDRH,  LDRSH, and STRH instructions.  Every reference to PM in AL is some variant of register indirect addressing.  The basic form of memref in register indirect addressing is [Rb], as in


        (1)             LDR  Rd,[Rb]


where Rb contains the PM address of some entity, which is usually a variable or the base of an array.  The address of the referenced memory location is the address in Rb.  So if Rb contains the address of C, the current value of C is loaded into Rd.  One variant of this form is


        (2)             LDR  Rd,[Rb±#exp]


Here, the assembler evaluates exp, which must result in a constant offset (off) that is less than 212 for group LS1 instructions or less than 28 for group LS2 instructions.  This constant is inserted into the LDR instruction and the address of the referenced PM location is cont(Rb)±off.  The form


        (3)             LDR  Rd,C


is really just register indirect addressing using PC as the base register, that is, writing C is equivalent to writing [PC±#off(PC,C)], where off(PC,C) is the offset of C from the current contents of PC.  Calculating off(PC,C) is rather tricky and thankfully the assembler does it for you.  If it is unable to do so, it will signal an error.  In this case, the address of C  must  be  loaded  into  a  register  Rb and then basic register indirect addressing [Rb]  is  used.    One variant of this form is


        (4)             LDR  Rd,C±exp


Here, writing C±exp is equivalent to writing [PC±#off(PC,C)±exp].


In group LS1 instructions, indexed addressing is written as


        (5)             LDR  Rd,[Rb,{±}Rx{,ShiftType#ShiftAmount}]


where ShiftType is one of LSL, LSR, ASR, ROR, or RRX and ShiftAmount is an constant expression whose value is less than 25.  If a shift type and amount are included, the contents of Rx is shifted before being added to the contents of Rb and the address of the referenced PM location is cont(Rb)±shift(cont(Rx)).  Otherwise Rx is not shifted and the address of the referenced PM location is cont(Rb)±cont(Rx).  In group LS2 instructions, indexed addressing is written the same as for group LS1 instructions, except that no shifting is permitted.


Automatic incrementing or decrementing of the base register may be optionally specified in forms (2) and (5) above.  When this is done, we call it autoincrement addressing or autodecrement addressing, respectively.  This option is specified by appending an exclamation point (!) as a suffix on the memref, as in


        (6)             LDR  Rd,[Rb±#exp]!

        (7)             LDR  Rd,[Rb,{±}Rx{,ShiftType#ShiftAmount}]!


In (6), after using cont(Rb)±off as the reference address, this value replaces the original contents of Rb, that is, cont(Rb) is incremented or decremented by off.  In (7), after using cont(Rb)±shift(cont(Rx)) as the reference address, this value replaces the original contents of Rb, that is, cont(Rb) is incremented or decremented by shift(cont(Rx)). Autoincrement addressing and autodecrement addressing are also known as update addressing.


All the addressing forms discussed above are pre-indexed addressing because an index value is added to a base address before being used as the PM reference address.  It is also possible to have post-indexed addressing, as in


        (8)             LDR  Rd,[Rb],#±exp

        (9)             LDR  Rd,[Rb],{±}Rx{,ShiftType#ShiftAmount}


In both forms, the address of the referenced PM location is cont(Rb).  After this value is used as the reference address, the contents of Rb are incremented or decremented by off or shift(cont(Rx)).  Post indexed addressing is always update addressing, while pre-indexed addressing may be update addressing or non-update addressing.


6.5 Operations on Arrays


A few simple AL code patterns are the basis for most operations on arrays.  In Section 6.3 we saw one such pattern that summed all the elements in an array.  A common operation on an array that contains a null terminated string is to determine the length of that string.


        i = 0;                                     MOV  R6,#0

        while ( A[i] != nul )                      ADR  R5,A

        {                                     AGN  LDRB R0,[R5,R6]

                                                   CMP  R0,#0

                                                   BEQ  NXT

             i++;                                  ADD  R6,R6,#1

        }                                          B    AGN

        Leng = i;                             NXT  STR  R6,Leng


There is no error detection in this AL code.  If there is no nul in A, bytes beyond the end of A will be tested.  In fact, if there is no byte anywhere in memory containing nul, the program will loop endlessly.  Why?  Addition in a computer is always modulo, so when the address of A[i] reaches the highest PM address, adding one results in a PM address of zero.  We call this change for the highest PM address to the lowest PM address, memory wrap around.  Suppose the length of the array A is N, then we know the string cannot be longer than N-2 characters (the last byte has to contain a nul character if A contains a legal null terminated string).  If we continually test for i greater than or equal to N-1, we will avoid an infinite loop and false values for the string's length.


Another common operation on a null terminated string is to make a copy.  Suppose the array S is the same size as the array A that was used above.  AL code that makes a copy of A and puts the copy in S is


        i = 0;                                     MOV  R6,#0

        while (1)                                  ADR  R5,A

                                                   ADR  R7,S

        { S[i] = A[i]                         AGN  LDRB R0,[R5,R6]

                                                   STRB R0,[R7,R6]

             if ( A[i] == nul ) break;             CMP  R0,#0

                                                   BEQ  NXT

             i++;                                  ADD  R6,R6,#1

        }                                          B    AGN

                                              NXT  ---


Here we need to copy at least one byte since even if the string being copied is empty (i.e., no characters in the string) we need to copy the terminating nul character.


The ability to shift the contents of the index before adding it to the contents of the base register is useful when dealing with two arrays when the size of the elements in one array is not the same as the size of the elements in the other array.  Suppose the byte array A contains N naturals and B is a halfword array of the same size.  The following code squares each element of A and puts the double length result into the corresponding element of B.


        for ( i = 0; i < N; i++ )                  MOV  R4,N

                                                   ADR  R5,A

                                                   ADR  R7,B

                                                   MOV  R6,#0

        {                                     AGN  CMP  R6,R4

                                                   BHS  NXT

             B[i] = A[i]*A[i];                     LDRB R0,[R5,R6]

                                                   MUL  R3,R0,R0

                                                   STRH R3,[R7,R6,LSL#1]

        }                                          ADD  R6,R6,#1

                                                   B    AGN

                                              NXT  ---



The index values used in the loop, which are in R6, go from 0 to N-1 as does the subscript in the Java code on the left.  This is the right sequence for a byte array, but not for a halfword array where each index must be twice the corresponding subscript.  The logical shift left by one bit (LSL#1) in the STRH instruction multiplies the index by two, so the index sequence used to reference elements of B is 0, 2, 4, and so forth.


Update addressing is mostly useful for saving instructions in the body of a loop and thus reducing the execution time of a program.


        i = 0;                                            ADR  R5,A

                                                          MOV  R4,R5

        while ( A[i] != nul )                        AGN  LDRB R0,[R5],#+1

                                                          CMP  R0,#0

                                                          BEQ  NXT


        }                                                 B    AGN

        Leng = i;                                    NXT  SUB  R5,R5,R4

                                                          STR  R5,Leng


Here the post-indexed addressing, which is always update addressing, adds one to the address in R5 after each use of that address to reference PM.  This eliminates the need for an explicit index, so we do not need R6 or the ADD instruction that explicitly incremented the index.  Adding an additional instruction outside of the loop does not increase the execution time much, but eliminating one instruction inside the loop results in a significant saving if the loop is executed many times.


6.6 Stacks


In the ARM, a stack is an array of words.  Each item on the stack is one word.  The stack pointer SP (R13) always points to the top of the stack, that is, it contains the PM address of the item currently on the top of the stack.  In PM the stack is "upside down" with the bottom of the stack at the highest address and the top of the stack at the lowest address.  For example, if the bottom of the stack is at 0007FFFC16 and three items have been pushed onto the stack, it looks like


                     PM address          Contents


                        0007FFF0         ????????

                        0007FFF4 3rd item pushed        SP current top of stack

                        0007FFF8 2nd item pushed        SP+4

                        0007FFFC 1st item pushed        SP+8

                        00080000         ????????        bottom of stack


When the stack is empty, SP will contain the address of the word immediately above the word at the bottom of the stack, here 00080000.  Two common operations on a stack are PUSH, which puts an item on the top of the stack, and POP, which takes off the item at the top of the stack.  On the ARM, AL code to push the contents of R0 onto the stack is


  STR R0,[SP,#-4]!


and AL code to pop the top of the stack into R1 is


                          LDR R1,[SP],#+4


To get a copy of the word on the top of the stack without popping it, use


                          LDR R1,[SP]


and to reference Nth word "down" from the top of the stack, use


                          LDR R1,[SP,#4*N]







































Adapted from © 2000 Robert M. Graham