Assembly Language

15. Assembly Language

The remainder of this course will involve software as well as hardware structures, both in examples and exercises. In many cases the software is coded in the very simple assembly language used for symbolic representation of Beta instructions in the last chapter. This language choice serves an important pedagogic goal of demystifying the technology underlying software, owing to its direct relation to the binary instructions executed by the Beta processors that we describe.

However, the low level and machine-dependent detail involved in assembly language programming is tedious, occasionally to the point of obscuring the algorithm it is intended to represent. For this reason, we use a high-level language, C, for some examples. C is a compiled language developed in the early 1970s and still widely used for system programming; among compiled languages, its semantics are fairly close to our implementation technology, while presenting a straightforward programming model that is independent of ISA details. We assume sufficient understanding of C on the poart of the student to read and understand our examples. If you are unfamilar with C, an overview is available for your perusal.

Indeed, today's assembly languages are primarily important as a specification of the target to which programs written in higher-level languages are translated -- a process explored further in Chapter 17.
The Assembly Process

15.1. The Assembly Process

An assembler is a program that translates a symbolic assembly language program, contained in a text file, to an initial configuration of a processors main memory. Assemblers for typical machines typically produce as output a binary object file; separately assembled software components are assembled and their object files combined via a linker and loaded into memory.

In the interests of pedagogic minimalism, we have bundled our assembler into an integrated Beta simulator, BSim, which allows assembly language programs to be edited, assembled, and run without explicit linking or loading.
Assembly Language

15.2. Assembly Language

Assembly languages are often specific to an ISA, and may be quite complex. The assembly language for the Beta, named uasm (for "microassembler", translating $\mu$ to an ASCII approximation) is uncharacteristically simple. It is also ISA-independent: its customization to the Beta ISA is done within the assembly language input file itself. A summary of the uasm assembly language is contained in the BSim documentation, which should be referred to as a reference.

The assembly source language defines a sequence of 8-bit binary bytes to be loaded into consecutive byte addresses of memory.
    // Six bytes of data:
    14 15 0b10000 0x11
    3*6 20-1 
A source file may specify a sequence of expressions separated by white space, giving values of consecutive bytes within memory; the fragement to the right, for example, fills six byte locations with consecutive values from 14 through 19. The double slash // marks a comment that extends through the end of the line. Syntax of comments and expressions are largely consistent with those used in C (and other C-like languages).

    X = 2
    Y = X+1
    Z = X*Y
    X+X+Y    // 11
Symbolic names can be given to values, and subsequently used in expressions. The first three lines of the code to the left define values of 2, 3, and 6 respectively for the symbols X, Y, and Z, and generates a single byte value (11) to be placed in the next byte of memory. Note that the first three lines generate no output data: they serve only to associate values with symbols for subsequent use in the program.

      // Start filling from 0x100:
      . = 0x100
      1   2   3   4

      five = .     // Location of 5
      5   6   7   8

      . = .+16    // leave 16 bytes
nine: 9  10  11  12
The special symbol "." (period or "dot") has special significance to the assembler: it refers to the byte address of the next location in memory to be filled. Its value can be referenced or set; setting its value causes the assembler to start filling consecutive byte locations at a new address.

In the example to the left, the program directs the assembler to fill the byte location at address 0x100 with 1, 0x101 with 2, and continuing through 8 at location 0x108. During this sequence, the symbol 'five' is set to the value 0x105, containing the value 5.

The notation . = .+16 advances the value of . by 16, effectively skipping over 16 byte locations in memory. This is a common assembly language idiom for reserving a block of uninitialized memory locations, e.g. to be filled in with an array or other data structure.

The syntax nine: in the above example, called a tag, serves to define the symbol nine as the location in memory of the next byte to be assembled; it is the equivalent of typing nine=., and provides a convenient means for attaching symbolic names to locations of instructions and data.

15.2.1. Macros

The primary abstraction mechanism provided in our assembly language is the ability to define named, parameterized templates to generate sequences of values to be assembled into memory locations.

Consider, for example, our frequent need to fill a 32-bit word in memory with a given 32-bit value. Since by default our assembler fills 8-bit bytes, this need translates into a pattern involving filling four consecutive bytes with appropriate 8-bit segments of the given 32-bit value.
// Assemble a 32-bit word
//    having the value x:
.macro WORD(x) {
    x & 0xFF
    (x>>8) & 0xFF
    (x>>16) & 0xFF
    (x>>24) & 0xFF
We can encapsulate this pattern in the WORD macro whose definition is shown to the right. The first line of the definition starts with the keyword .macro, signifying a macro definition. Note the leading period, marking .macro as one of the reserved keywords in our assembly language. The keyword is followed by the name of the macro being defined, and zero or more parameter names separated by commas. The following curly bracket marks the start of the body of the macro definition, which extends through the closing curly bracket.

Subsequent code may contain invocations of the newly defined macro, each stamping out a parameterized instance of the pattern defined by its body.
// Table of Squares:

The code to the left, for example, shows the beginning of a table of squares constructed from consecutive word locations each containing the square of a natural number, represented as a 32-bit binary quantity. Note that the first location of the table is given the symbolic address squares; thus the $n^{th}$ entry in the table, containing the 32-bit value of $n^2$, will be at address squares+4*n since each entry comprises four consecutive byte locations.

// Alternative table approach:
.macro square(n)  WORD(n*n)

  square(0)  square(1)  square(2)
  square(3)  square(4)  square(5)
We could make our table easier to type (and read) by defining a new ad-hoc macro, square, and rewriting the table as shown to the left. This example uses an alternative single-line syntax for defining the macro square. Its body computes the square, and uses the WORD macro to output the 32-bit value of $n^2$ for the supplied $n$.

For each macro invocation, the assembler evaluates the supplied arguments and binds each resulting 32-bit value to a new temporary symbol whose scope includes the body of the macro. The body then undergoes the normal assembly process, except that each symbolic parameter encountered represents the 32-bit value supplied for that parameter in the macro invocation. Macro bodies can contain invocations of other macros, allowing increasingly complex structures to be defined using layers of defined macros.
Beta Instructions

15.3. Beta Instructions

The astute reader will recognise that the symbolic notation we have been using to represent Beta machine instructions is precisely the syntax of macro invocations: we customize the assembler to the Beta by defining macros to assemble each instruction. The definitions of these macros, along with other definitions relating the the Beta ISA, are collected in an assembler source file beta.uasm which will be logically included in the input for each assembly language program.

r0=0 r1=1 r2=2 ... r31=31 R0=0 R1=1 ... R31=31 ... Among the useful definitions contained in beta.uasm is the definitions of symbols r0 through r31 as the values $0$ through $31$, and similarly for the upper case R0 through R31. This allows use to refer to the numeric "name" of a register as R13 rather than simply the value 13, making our code somewhat more readable.

It is worth noting, however, that the very simple assembly language semantic simply associates an integer value with register names like R13. It will not flag nonsensical expressions like R13*R14 & 23 as errors; instead, it will happily compute a 32-bit binary value and use it for whatever purpose its context demands: as a register number, an instruction-stream constant, or a branch target.
Operate Instructions

15.3.1. Operate Instructions

Since each Beta instruction occupies a single 32-bit word in memory, our WORD macro will be handy for assembling instructions. However, since all Beta instructions share only two formats for fields such as opcode and register numbers, it is useful to define helper macros that build 32-bit instructions from the values in each contained field. We define macros for this purpose as follows:

// Support macros for defining Beta instructions:
// Assemble a format 1 (OP class) instruction:
.macro betaop(OP,RA,RB,RC) {

// Assemble a format 2 (OPC class) instruction:
.macro betaopc(OP,RA,CC,RC) {
Each of these macros assembles a 32-bit Beta instruction from four parameters giving the values to be placed into respective fields. Each field is computed by an expression like ((RC%32)<<21), which masks the value supplied for the parameter RC to five bits via the modulo 32 operation %32, then uses a left shift <<21 to shift it into its proper position within the 32-bit word. Note that the masking can alternatively be performed by the bitwise AND operator, as is done in the CC&0xFFFF term which masks the constant field of an OPC instruction to 16 bits.

.macro ADD(RA, RB, RC) betaop(0x20,RA,RB,RC) .macro ADDC(RA, C, RC) betaopc(0x30,RA,C,RC) Armed with these utility macros, the definition of OP and OPC instructions is straightforward as illustrated to the left. The body of each macro is an invocation of the macro appropriate to the format of the instruction, supplying the opcode (from the table in Section 14.8.7) along with the other passed parameters.

The assembly of a Beta instruction into a sequence of four consectutive byte values can be viewed as a succession of steps, each of which reduces an assembly source expression to an equivalent source expression in more primitive form. ADDC(R31, -3, R1) ⇒ betaopc(0x30, 31, -3,1) ⇒ WORD((0x30<<26)+ ((31%32)<&;t;21)+ ((1%32)%lt;<16)+ (-3&0xFFFF)) ⇒ WORD(0xC0000000+0x1F0000+ 0x200000+0xFFFD) ⇒ WORD(0xC03FFFFD) ⇒ 0xFD 0xFF 0x3F 0xC0 For example, the processing of symbolic instruction ADD(R31, -3, R1) is sketched to the right. The original source expression is recognized as a macro invocation which expands to a betaopc invocation with the supplied parameters. The betaopc expands in turn to a WORD invocation, whose parameter is built from an expression whose terms represent the values of successive fields of the instruction. The WORD expands to a sequence of four 8-bit bytes to be assembled into consecutive memory locations.
Branch Instructions

15.3.2. Branch Instructions

The simplest -- and most general -- instruction for loading the Beta's program counter with a new value is the JMP(Ra,Rc) instruction which loads a 32-bit value from Ra into the PC, and loads the previous incremented PC value into the destination register Rc.

.macro JMP(RA, RC) betaopc(0x1B,RA,0,RC)
.macro JMP(RA)     betaopc(0x1B,RA,0,r31)
Definitions of JMP are shown to the left. The first of these defines the full 2-parameter JMP(RA,RC) form. The second definition has a single parameter, and will be used whenever JMP is invoked with one rather than two parameters. The ability to define several forms of a macro with differing numbers of parameters offers a convenient way to provide default values for omitted parameters. Here, we have arranged that omitting specification of the destination register RC causes it to default to R31, effectively discarding the previous PC value.

The JMP instruction is sufficiently general to serve as our only means to transfer control. An arbitrary address can be computed; the computation might involve computed results and data, allowing conditional transfers. However, coding such computations is tedious, and their execution consumes time and program space; consequently, the easier-to-use conditional branch instructions BEQ and BNE are used for most control transfers in Beta programs.

The branch instructions are defined to compute the 16-bit constant field of the instruction -- the offset of the branch target from the current PC value -- from a supplied target address.
x:    ...
      BNE(R2, y, R31)
      BEQ(R1, x, R31)
y:    ...
This allows branch target locations to be specified as tags in assembly language code (such as the x: and y: to the right) and for the tags to be supplied directly in branches to the tagged locations. Recall from Section 14.8.5 that the value in the 16-bit offset field is a signed integer specifying the number of words -- e.g., full Beta instructions -- between the target address and the incremented PC value. This arithmetic is anticipated by the BETABR helper macro, defined along with the branch instructions:
// Assemble a branch instuction, computing branch offset to LABEL:

// Our two basic conditional branch instructions:
.macro BEQ(RA, LABEL, RC)       BETABR(0x1C,RA,RC,LABEL)
.macro BEQ(RA, LABEL)           BETABR(0x1C,RA,r31,LABEL)
.macro BNE(RA, LABEL, RC)       BETABR(0x1D,RA,RC,LABEL)
.macro BNE(RA, LABEL)           BETABR(0x1D,RA,r31,LABEL)

// Other forms for convenience:
.macro BF(RA, LABEL, RC) BEQ(RA,LABEL,RC)  // Brance on False
.macro BF(RA,LABEL)      BEQ(RA,LABEL)
.macro BT(RA,LABEL,RC)   BNE(RA,LABEL,RC)  // Branch on True
.macro BT(RA,LABEL)      BNE(RA,LABEL)
.macro BR(LABEL,RC)  BEQ(r31, LABEL, RC)  // Unconditional Branches
.macro BR(LABEL)     BR(LABEL, r31)
Following the definition of BETABR are the definitions of BEQ and BNE, again allowing a missing destination register to default to R31. These are followed by some other useful aliases: BF and BT, reflecting our representation of False and True by the values zero and one; and an unconditional branch BR.

It is important to recognise that the range of target locations that may be specified in a branch is limited by the 16-bit signed offset value. A branch target must be within about $2^15$ instructions -- word addresses -- of the location containing the branch, in either direction. While programs encountered in this course are small enough that this limitation will not be a constraint, realistic scale software for the Beta (and similar ISAs) needs to defer to JMP instructions for more general (i.e., non-local) transfers of control.

In practice, branches in both directions are common.
// Take absolute value of R2
       CMPLTC(R2, 0, R0)
       BF(RO, skip)
       // Do this if R2<0:
       SUB(R31, R2, R2)
skip:  ...
The code snipet to the left illustrates the use of a conditional branch to skip over a computation depending on the outcome of a comparison. Note the use of CMPLT to compute a Boolean (truth) value in R0, and the subsequent use of this value in a BF (branch on False) to skip to the tagged instruction if the test fails. The result is that the skipped code will be executed only if the tested condition is true -- in this case, if the contents of R2 are negative.

Backward branches frequently occur at the bottom of program loops,
// R3 modulo 11...
loop:  CMPLTC(R3, 11, R0)
       BT(R0, done)
       SUBC(R3, 11, R3)
done:  ...
such as the unconditional BR in the example to the right. Assuming a positive initial value in R3, this loope computes that value modulo 11 by repeatedly subtracting 11, leaving the result in R3. Note that the conditional branch at the top of the loop serves to exit the loop once the contents of R3 are below eleven.
Load/Store Instructions

15.3.3. Load/Store Instructions

The two essential instructions for loading from and storing to main memory locations, LD(RA, CC, RC) and ST(RC, CC, RA) are straightforward applications of our betaopc macro, with two noteworthy issues:
  1. Argument order: in ST, RA is specified last not first as it is in LD. This reflects our informal assembly-language convention that destinations (of which RA is a part) are specified last in the parameter list.
  2. Two-parameter versions of each macro are defined, such that an omitted RA defaults to R31. Thus for example LD(0x1234, R5) loads the contents of memory location 0x1234 into R5.
Both the LD and ST instructions compute the address of the location in memory to be accessed by adding the contents of the specified RA to the sign-extended value of the constant CC, as discussed in Section 14.8.6. As noted there, this choice reflects a number of commonly-occuring memory reference patterns in which the effected address is the sum of some constant term and a variable component resulting from computation.

We briefly examine two common patterns below.
Array Access Array Access

Arrays are a common data structure in modern programming. In most high-level languages, arrays are an aggregate data structure containing an arbitrary number of components referenced by an integer array index. Typically the elements may differ in value but not in type, meaning that the size and layout of each represented element is similar.
// Histogram array:
    int Hist[100];

    Hist[score] = Hist[score]+1;
As a simple example, consider the fragment of C code to the left. The statement int Hist[100]; declares an array of 100 elements, each of type int which we interpret to be a 32-bit integer. This code fragment doesn't specify initial values for the elements; we might assume that initialization code somewhere in the program initializes the array. In C, the elements would be indexed from 0 to 99. An individual element value may be accessed via the syntax Hist[n], where n is an integer within this range.

A typical runtime convention for representing such an array is to allocate a number of sequential memory locations to store element values, in order of their indices. The diagram to the right depicts a region of contiguous memory, with each 4-byte 32-bit location represented as a horizontal slice. The first (lowest addressed) location of the region reserved for the Hist array, termed the base address of the array, bears the tag Hist in the assembly language program. That location marks the starting location for Hist[0], the first element of the array. It is immediately followed by representations of Hist[1], Hist[2], and so on through the last element Hist[99] of the array.

Since the array elements are stored in adjacent memory locations in order of increasing indices, addresses of consecutive array elements differ by the size (in memory) of a single element. In our example, each element requires one 4-byte integer, so consecutive array elements are stored at locations that differ by 4. Since the first element Hist[0] is stored at the base address of the array, it follows that Hist[n] is stored at an address that is $4 \cdot n$ greater than the base address From our sample C code, the reference to Hist[score] should access location Hist+4*score.

// Histogram array:
hist:  . = .+4*100
   MULC(R1, 4, R2)
   LD(R2,Hist,R0)   // hist[score]
   ADDC(R0,1,R0)    // hist[score]+1
   ST(R0, Hist, R2) // hist[score] = ...
This insight allows us to translate the C code reference to assembly code, as sketched to the right. Assume the value of score is computed by code elsewhere, and left in R1. Per discussion above, this value dictates the offset between the base address of the array and the location to be accessed. The actual offset is computed by multiplying score by 4, the size in bytes of each element. A single LD instruction uses the array's base address as the constant portion of the address calculation, and the computed 4*score as the variable component, properly addressing the location containing Hist[score]. After incrementing the value, the result is stored via an ST using the same address arithmetic.
Component Access Component Access

Another class of commonly occuring data structures involves records or structs, each a template for a localized clump of hetrogeneous components in a fixed order and layout.

// Points in 2-space
struct Point {
  int x; int y;
} *p1, *p2;

int dx, dy, dsq;
    dx = p1->x - p2->x;
    dy = p1->y - p2->y;
    dsq = dx*dx + dy*dy;
For example, the C code to the left contains the definition of a structure called a Point, whose two integer components x and y define an $(x, y)$ point in some two-dimensional space (e.g., a computer screen). The two variables p1 and p2 are declared, by virtue of the asterisk before their names, to be pointers to structures of type Point. The run-time values of these variables will be addresses of data structures representing the points, presumably set by some code (not shown here) that allocates space for each structure and initializes p1 and p2 to point to them.

The declaration specifies that every Point has two integer components, x and y, and implicitly specifies an order in which they appear in the runtime representation of a Point. Every Point is represented as 2 consecutive 32-bit locations within memory, containing x and y values, respectively, for that point. Every pointshares this layout template, but each point can be in an arbitrary location within memory.

Like the arrays of the prior section, finding the address of a component of a structure involves combining a variable address component with a fixed one. In contrast to arrays, however, structure references typically involve a variable base pointer combined with a fixed offset.

// Define offsets for Point structures:
pt.x = 0  // offset of x from base adr
pt.y = 4  // offset of y from base adr
pt.size = 8  // size of a Point struct

// Variables:
dx:  . = .+4
dy:  . = .+4
dsq: . = .+4
      // p1 in R1, p2 in R2:
      LD(R1, pt.x, R3)    // p1->x
      LD(R2, pt.x, R4)    // p2->x
      SUB(R3, R4, R5)
      ST(R5, dx)

      LD(R1, pt.y, R6)    // p1->y
      LD(R2, pt.y, R7)    // p2->y
      SUB(R6, R7, R8)
      ST(R8, dy)

      MUL(R5, R5, R5)    // dx*dx
      MUL(R8, R8, R8)    // dy*dy
      ADD(R5, R8, R0)
      ST(R0, dsq)
The code to the left is a plausible assembly language translation of the C excerpt dealing with points. To aid readability, the assembler symbols pt.x and pt.y are defined to represent the constant offsets to the x and y components of every point. It then uses the familiar . = .+4 idiom to allocate space for the variables dx, dy, and dsq.

The executable code that follows assumes that the values of p1 and p2 represent base addresses of valid Point structures, and have been loaded into R1 and R2. Space for these structures might have been allocated from some dynamic storage pool, or may have been statically allocated by reserving a block of memory of size pt.size for each.

Finally, the code shown accesses the x and y components of the Point structures pointed to by p1 and p2, storing the computed deltas and squared distance into the allocated variables.
A Beta Program A Beta Program

We are now in a position to run simple programs under BSim.

The image to the right shows BSim applied to the primitive factorial program of Section 14.10; clicking on the image will launch BSim and the code shown.

The first line of the program logically includes the file beta.uasm as part of the assembly source; this generates no code, but educates the assembler about Beta-specific instructions and features.

By default, the "current address" variable "." begins at zero; recall from that this location is reserved as the RESET vector, containing the first instruction to be executed when the Beta is powered on or otherwise reset. We put an unconditional BR into location zero, transfering control to the entry point of our program. As we make use of additional Beta interrupt vectors, we will fill locations following zero with similar branches.

The remaining code is substantially the same as that of Section 14.10, followed by a HALT() instruction which, when executed, causes the simulation to halt. Click the image to run BSim on the program; if you click the Assemble button, view will switch to show the inital state of the simulated Beta, with the assembled program loaded into memory. Try clicking single step, and follow the change in cpu state as the Beta executes the next instruction. If you click play ("run"), the program runs to completion; you should see the computed $N!$ as the hex value in R0. Experiment with the program; you might try changing the value of N, or making improvements in the code itself.

15.4. References