Assembly Language
15. Assembly Language
The remainder of this course will involve software as well as hardware structures, both in examples and exercises. In many cases the software is coded in the very simple assembly language used for symbolic representation of Beta instructions in the last chapter. This language choice serves an important pedagogic goal of demystifying the technology underlying software, owing to its direct relation to the binary instructions executed by the Beta processors that we describe. However, the low level and machine-dependent detail involved in assembly language programming is tedious, occasionally to the point of obscuring the algorithm it is intended to represent. For this reason, we use a high-level language, C, for some examples. C is a compiled language developed in the early 1970s and still widely used for system programming; among compiled languages, its semantics are fairly close to our implementation technology, while presenting a straightforward programming model that is independent of ISA details. We assume sufficient understanding of C on the poart of the student to read and understand our examples. If you are unfamilar with C, an overview is available for your perusal. Indeed, today's assembly languages are primarily important as a specification of the target to which programs written in higher-level languages are translated -- a process explored further in Chapter 17.The Assembly Process
15.1. The Assembly Process
An assembler is a program that translates a symbolic assembly language program, contained in a text file, to an initial configuration of a processors main memory.Assembly Language
15.2. Assembly Language
Assembly languages are often specific to an ISA, and may be quite complex. The assembly language for the Beta, named uasm (for "microassembler", translating $\mu$ to an ASCII approximation) is uncharacteristically simple. It is also ISA-independent: its customization to the Beta ISA is done within the assembly language input file itself. A summary of the uasm assembly language is contained in the BSim documentation, which should be referred to as a reference. The assembly source language defines a sequence of 8-bit binary bytes to be loaded into consecutive byte addresses of memory.// Six bytes of data: 14 15 0b10000 0x11 3*6 20-1
X = 2 Y = X+1 Z = X*Y X+X+Y // 11
// Start filling from 0x100: . = 0x100 1 2 3 4 five = . // Location of 5 5 6 7 8 . = .+16 // leave 16 bytes nine: 9 10 11 12
. = .+16
advances the value
of .
by 16, effectively skipping over 16 byte locations in memory.
This is a common assembly language idiom for reserving a block
of uninitialized memory locations, e.g. to be filled in with an
array or other data structure.
The syntax nine:
in the above example, called
a tag, serves to define the symbol nine
as the location in memory of the next byte to be assembled; it
is the equivalent of typing nine=.
, and provides
a convenient means for attaching symbolic names to locations
of instructions and data.
Macros
15.2.1. Macros
The primary abstraction mechanism provided in our assembly language is the ability to define named, parameterized templates to generate sequences of values to be assembled into memory locations. Consider, for example, our frequent need to fill a 32-bit word in memory with a given 32-bit value. Since by default our assembler fills 8-bit bytes, this need translates into a pattern involving filling four consecutive bytes with appropriate 8-bit segments of the given 32-bit value.// Assemble a 32-bit word // having the value x: .macro WORD(x) { x & 0xFF (x>>8) & 0xFF (x>>16) & 0xFF (x>>24) & 0xFF }
WORD
macro
whose definition is shown to the right. The first line of
the definition starts with the keyword .macro
,
signifying a macro definition. Note the leading period, marking
.macro
as one of the reserved keywords in our
assembly language. The keyword is followed by the name of the
macro being defined, and zero or more parameter names separated
by commas. The following curly bracket marks the start of the
body of the macro definition, which extends through
the closing curly bracket.
Subsequent code may contain invocations of the newly
defined macro, each stamping out a parameterized instance of
the pattern defined by its body.
// Table of Squares: squares: WORD(0) WORD(1*1) WORD(2*2) ...
squares
; thus the $n^{th}$ entry in the table,
containing the 32-bit value of $n^2$, will be at address
squares+4*n
since each entry comprises four
consecutive byte locations.
// Alternative table approach: .macro square(n) WORD(n*n) squares: square(0) square(1) square(2) square(3) square(4) square(5) ...
square
,
and rewriting the table as shown to the left. This example
uses an alternative single-line syntax for defining the macro
square
. Its body computes the square, and
uses the WORD
macro to output the 32-bit value
of $n^2$ for the supplied $n$.
For each macro invocation, the assembler evaluates the supplied
arguments and binds each resulting 32-bit value to a new temporary
symbol whose scope includes the body of the macro. The body then
undergoes the normal assembly process, except that each symbolic
parameter encountered represents the 32-bit value supplied for
that parameter in the macro invocation. Macro bodies can contain
invocations of other macros, allowing increasingly complex structures
to be defined using layers of defined macros.
Beta Instructions
15.3. Beta Instructions
The astute reader will recognise that the symbolic notation we have been using to represent Beta machine instructions is precisely the syntax of macro invocations: we customize the assembler to the Beta by defining macros to assemble each instruction. The definitions of these macros, along with other definitions relating the the Beta ISA, are collected in an assembler source file beta.uasm which will be logically included in the input for each assembly language program.r0
through r31
as the values $0$
through $31$, and similarly for the upper case R0
through
R31
. This allows use to refer to the numeric "name" of
a register as R13
rather than simply the value 13
,
making our code somewhat more readable.
It is worth noting, however, that the very simple assembly language
semantic simply associates an integer value with register names like
R13
. It will not flag nonsensical expressions like
R13*R14 & 23
as errors; instead, it will happily compute
a 32-bit binary value and use it for whatever purpose its context
demands: as a register number, an instruction-stream constant, or
a branch target.
Operate Instructions
15.3.1. Operate Instructions
Since each Beta instruction occupies a single 32-bit word in memory, ourWORD
macro will be handy for assembling
instructions. However, since all Beta instructions share only
two formats for fields such as opcode and register numbers, it
is useful to define helper macros that build 32-bit instructions
from the values in each contained field. We define macros for
this purpose as follows:
// Support macros for defining Beta instructions: // Assemble a format 1 (OP class) instruction: .macro betaop(OP,RA,RB,RC) { WORD((OP<<26)+((RC%32)<<21)+((RA%32)%lt;<16)+((RB%32)<<11)) } // Assemble a format 2 (OPC class) instruction: .macro betaopc(OP,RA,CC,RC) { WORD((OP<<26)+((RC%32)<&;t;21)+((RA%32)%lt;<16)+(CC&0xFFFF)) }
((RC%32)<<21)
, which masks the value
supplied for the parameter RC
to five bits via
the modulo 32 operation %32
, then
uses a left shift <<21
to shift it
into its proper position within the 32-bit word. Note
that the masking can alternatively be performed by the
bitwise AND operator, as is done in the
CC&0xFFFF
term which masks the constant field
of an OPC instruction to 16 bits.
ADD(R31, -3, R1)
is sketched to the right. The original source expression is recognized as a macro
invocation which expands to a betaopc
invocation with the supplied
parameters. The betaopc
expands in turn to a WORD
invocation, whose parameter is built from an expression whose terms represent
the values of successive fields of the instruction.
The WORD
expands to a sequence of four
8-bit bytes to be assembled into consecutive memory locations.
Branch Instructions
15.3.2. Branch Instructions
The simplest -- and most general -- instruction for loading the Beta's program counter with a new value is theJMP(Ra,Rc)
instruction
which loads a 32-bit value from Ra
into the PC, and loads
the previous incremented PC value into the destination register Rc
.
.macro JMP(RA, RC) betaopc(0x1B,RA,0,RC) .macro JMP(RA) betaopc(0x1B,RA,0,r31)
JMP
are shown to the left. The
first of these defines the full 2-parameter JMP(RA,RC)
form. The second definition has a single parameter, and will be used
whenever JMP
is invoked with one rather than two parameters.
The ability to define several forms of a macro with differing numbers of
parameters offers a convenient way to provide default values for omitted
parameters. Here, we have arranged that omitting specification of the
destination register RC
causes it to default to R31
,
effectively discarding the previous PC value.
The JMP
instruction is sufficiently general to serve as our
only means to transfer control. An arbitrary address can be
computed; the computation might involve computed results and data,
allowing conditional transfers.
However, coding such computations is tedious, and their execution consumes
time and program space; consequently, the easier-to-use conditional
branch instructions BEQ
and BNE
are used for
most control transfers in Beta programs.
The branch instructions are defined to compute the 16-bit constant
field of the instruction -- the offset of the branch target from
the current PC value -- from a supplied target address.
x: ... ... BNE(R2, y, R31) BEQ(R1, x, R31) ... y: ...
x:
and y:
to the right) and for the tags to be supplied directly in branches to
the tagged locations.
Recall from Section 14.8.5 that the value in the
16-bit offset field is a signed integer specifying the number of
words -- e.g., full Beta instructions -- between the target
address and the incremented PC value. This arithmetic is anticipated
by the BETABR
helper macro, defined along with the branch
instructions:
// Assemble a branch instuction, computing branch offset to LABEL: .macro BETABR(OP,RA,RC,LABEL) { betaopc(OP,RA,((LABEL-(.+4))>>2),RC) } // Our two basic conditional branch instructions: .macro BEQ(RA, LABEL, RC) BETABR(0x1C,RA,RC,LABEL) .macro BEQ(RA, LABEL) BETABR(0x1C,RA,r31,LABEL) .macro BNE(RA, LABEL, RC) BETABR(0x1D,RA,RC,LABEL) .macro BNE(RA, LABEL) BETABR(0x1D,RA,r31,LABEL) // Other forms for convenience: .macro BF(RA, LABEL, RC) BEQ(RA,LABEL,RC) // Brance on False .macro BF(RA,LABEL) BEQ(RA,LABEL) .macro BT(RA,LABEL,RC) BNE(RA,LABEL,RC) // Branch on True .macro BT(RA,LABEL) BNE(RA,LABEL) .macro BR(LABEL,RC) BEQ(r31, LABEL, RC) // Unconditional Branches .macro BR(LABEL) BR(LABEL, r31)
BETABR
are the definitions
of BEQ
and BNE
, again allowing a missing
destination register to default to R31
.
These are followed by some other useful aliases: BF
and BT
, reflecting our representation of False
and True by the values zero and one; and an unconditional
branch BR
.
It is important to recognise that the range of target locations that
may be specified in a branch is limited by the 16-bit signed offset
value. A branch target must be within about $2^15$ instructions --
word addresses -- of the location containing the branch,
in either direction.
While programs encountered in this course are small enough that this
limitation will not be a constraint, realistic scale software for the
Beta (and similar ISAs) needs to defer to JMP
instructions
for more general (i.e., non-local) transfers of control.
In practice, branches in both directions are common.
// Take absolute value of R2 CMPLTC(R2, 0, R0) BF(RO, skip) // Do this if R2<0: SUB(R31, R2, R2) skip: ...
CMPLT
to compute a Boolean (truth) value in R0
, and
the subsequent use of this value in a BF
(branch
on False) to skip to the tagged instruction if the test fails.
The result is that the skipped code will be executed only if
the tested condition is true -- in this case, if the contents
of R2
are negative.
Backward branches frequently occur at the bottom of program
loops,
// R3 modulo 11... loop: CMPLTC(R3, 11, R0) BT(R0, done) SUBC(R3, 11, R3) BR(loop) done: ...
BR
in the example to
the right. Assuming a positive initial value in R3
,
this loope computes that value modulo 11 by repeatedly subtracting
11, leaving the result in R3
.
Note that the conditional branch at the top of the loop serves
to exit the loop once the contents of R3
are below
eleven.
Load/Store Instructions
15.3.3. Load/Store Instructions
The two essential instructions for loading from and storing to main memory locations,LD(RA, CC, RC)
and ST(RC, CC, RA)
are straightforward applications of our betaopc
macro, with
two noteworthy issues:
- Argument order: in
ST
,RA
is specified last not first as it is inLD
. This reflects our informal assembly-language convention that destinations (of whichRA
is a part) are specified last in the parameter list. - Two-parameter versions of each macro are defined, such that an omitted
RA
defaults toR31
. Thus for exampleLD(0x1234, R5)
loads the contents of memory location0x1234
intoR5
.
LD
and ST
instructions compute the
address of the location in memory to be accessed by adding the contents
of the specified RA
to the sign-extended value of the
constant CC
, as discussed in Section 14.8.6.
As noted there, this choice reflects a number of commonly-occuring memory
reference patterns in which the effected address is the sum of some
constant term and a variable component resulting from
computation.
We briefly examine two common patterns below.
Array Access
15.3.3.1. Array Access
Arrays are a common data structure in modern programming. In most high-level languages, arrays are an aggregate data structure containing an arbitrary number of components referenced by an integer array index. Typically the elements may differ in value but not in type, meaning that the size and layout of each represented element is similar.// Histogram array: int Hist[100]; ... Hist[score] = Hist[score]+1; ...
int Hist[100];
declares an array of
100 elements, each of type int
which we interpret to
be a 32-bit integer.
This code fragment doesn't specify initial values for the elements;
we might assume that initialization code somewhere in the program
initializes the array.
In C, the elements would be indexed from 0 to 99.
An individual element value may be accessed via the syntax Hist[n]
,
where n
is an integer within this range.
Hist
array, termed the base address of the array,
bears the tag Hist
in the assembly language program.
That location marks the starting location for Hist[0]
,
the first element of the array.
It is immediately followed by representations of Hist[1]
,
Hist[2]
, and so on through the last element Hist[99]
of the array.
Since the array elements are stored in adjacent memory locations
in order of increasing indices, addresses of consecutive array
elements differ by the size (in memory) of a single element.
In our example, each element requires one 4-byte integer, so
consecutive array elements are stored at locations that differ
by 4.
Since the first element Hist[0]
is stored at the base address
of the array, it follows that Hist[n]
is stored at
an address that is $4 \cdot n$ greater than the base address
From our sample C code, the reference to Hist[score]
should access location Hist+4*score
.
// Histogram array: hist: . = .+4*100 ...MULC(R1, 4, R2) LD(R2,Hist,R0) // hist[score] ADDC(R0,1,R0) // hist[score]+1 ST(R0, Hist, R2) // hist[score] = ... ...
score
is computed by code elsewhere,
and left in R1
. Per discussion above, this value dictates
the offset between the base address of the array and the
location to be accessed.
The actual offset is computed by multiplying score
by 4,
the size in bytes of each element. A single LD
instruction
uses the array's base address as the constant portion of the address
calculation, and the computed 4*score
as the variable
component, properly addressing the location containing Hist[score]
.
After incrementing the value, the result is stored via an ST
using the same address arithmetic.
Component Access
15.3.3.2. Component Access
Another class of commonly occuring data structures involves records or structs, each a template for a localized clump of hetrogeneous components in a fixed order and layout.// Points in 2-space struct Point { int x; int y; } *p1, *p2; int dx, dy, dsq; ... dx = p1->x - p2->x; dy = p1->y - p2->y; dsq = dx*dx + dy*dy; ...
Point
, whose two integer
components x
and y
define an
$(x, y)$ point in some two-dimensional space (e.g., a computer
screen). The two variables p1
and p2
are declared, by virtue of the asterisk before their names,
to be pointers to structures of type Point
.
The run-time values of these variables will be addresses
of data structures representing the points, presumably set by
some code (not shown here) that allocates space for each structure
and initializes
p1
and p2
to point to them.
Point
has two
integer components, x
and y
, and implicitly
specifies an order in which they appear in the runtime representation
of a Point
. Every Point
is represented as
2 consecutive 32-bit locations within memory, containing x
and y
values, respectively, for that point. Every
point
shares this layout template, but each point can
be in an arbitrary location within memory.
Like the arrays of the prior section, finding the address of a
component of a structure involves combining a variable address
component with a fixed one. In contrast to arrays, however,
structure references typically involve a variable base pointer
combined with a fixed offset.
// Define offsets for Point structures: pt.x = 0 // offset of x from base adr pt.y = 4 // offset of y from base adr pt.size = 8 // size of a Point struct // Variables: dx: . = .+4 dy: . = .+4 dsq: . = .+4 ... // p1 in R1, p2 in R2: LD(R1, pt.x, R3) // p1->x LD(R2, pt.x, R4) // p2->x SUB(R3, R4, R5) ST(R5, dx) LD(R1, pt.y, R6) // p1->y LD(R2, pt.y, R7) // p2->y SUB(R6, R7, R8) ST(R8, dy) MUL(R5, R5, R5) // dx*dx MUL(R8, R8, R8) // dy*dy ADD(R5, R8, R0) ST(R0, dsq)
pt.x
and pt.y
are defined to represent the constant offsets to the x
and y
components of every point.
It then uses the familiar . = .+4
idiom to allocate space
for the variables dx
, dy
, and dsq
.
The executable code that follows assumes that the values of p1
and p2
represent base addresses of valid Point
structures, and have been loaded into R1
and R2
.
Space for these structures might have been allocated from some dynamic
storage pool, or may have been statically allocated by reserving a
block of memory of size pt.size
for each.
Finally, the code shown accesses the x
and y
components
of the Point
structures pointed to by p1
and
p2
, storing the computed deltas and squared distance into
the allocated variables.
A Beta Program
15.3.3.3. A Beta Program
We are now in a position to run simple programs under BSim..
" begins
at zero; recall from that this location
is reserved as the RESET vector, containing the first
instruction to be executed when the Beta is powered on or otherwise
reset. We put an unconditional BR
into location zero,
transfering control to the entry point of our program. As we
make use of additional Beta interrupt vectors, we will
fill locations following zero with similar branches.
The remaining code is substantially the same as that of
Section 14.10, followed by a HALT()
instruction which, when executed, causes the simulation to halt.
Click the image to run BSim on the program;
if you click the Assemble button, view will switch to
show the inital state of the simulated Beta, with the assembled
program loaded into memory. Try clicking single step,
and follow the change in cpu state as the Beta executes the next
instruction. If you click play ("run"), the program
runs to completion; you should see the computed $N!$ as the hex
value in R0
. Experiment with the program; you might
try changing the value of N
, or making improvements
in the code itself.
References
15.4. References
- BSim reference: A terse description of the assembly language used in this course, along with instuctions for using the Beta simulator to run and debug Beta assembly language programs.
- C Overview: A brief introduction to the C language used for examples in this course.
- Macros: A summary macros commonly used in Beta assembly language programs.
- beta.uasm: An assembler source file defining Beta-specific macros and symbols. Contains definitions for each Beta instruction, registers R0-R31 (and r0-r31 aliases), as well as some convenience macros.