Compiled Languages

17. Compiled Languages

Although assembly language provides a more convenient interface to the instruction set architecture of our general-purpose computer than the binary instructions it executes, it is a tedious language for most programming tasks. Assembly language is deliberately low level: its elements correspond directly to actual machine instructions, guaranteeing the programmer access to the complete feature set of the machine. As an engineering abstraction, however, assembly language has notable deficiencies; these stem from the fact that assembly language is designed around the capabilities of the hardware, rather than the needs of the programmer.

In this Chapter, we indulge in a brief glimpse of higher-level programming languages.
Programming Language Features

17.1. Programming Language Features

There are many different higher-level languages available for modern computer systems, with differing features and characteristics. Generally these languages are platform independent, meaning that they can be made to run on most any computer with sufficient resources -- in contrast to the ISA-specific assembly language.

Beyond this commonality, general-purpose programming languages may advertise a number of other attractions, such as: Many popular languages like Python, Ruby, and Haskell offer these (and many other) useful features. The C language use in this course, however, eschews such bells and whistles in favor of a minimalist extreme: among popular programming languages, it is the lowest-level high-level language. It offers platform independence, and a substantially less detailed programming model than assembly language, while avoiding features that entail substantial implementation complications.

In the following sections we explore several hard problems that are punted by C.
Dangling References

17.1.1. Dangling References

Although the stack is an efficient and effective storage allocation mechanism for many programming constructs, there are a few patterns that run afoul of the requirements of stack discippline.

int *p; // A Pointer int h(int x) { int y = x*3; p = &y; return 37; } h(10); print(*p); Consider, for example, the little C program to the left. The declaration int *p declares a pointer to an int value: we expect the runtime value of p to be the address of a word in main memory containing a 32-bit integer. Note that this declaration is outside of any procedure definition; consequently, it is a statically-allocated global variable. No initial value is specified; the value of p is to be set by a call to the procedure h, defined subsequently.

The definition of h(x) includes the declaration of a local variable, y. Since y is local to the procedure h, it will be allocated dynamically from the stack on entry to h, and deallocated when h returns.
But the call to h sets p to the address of y -- the using the C syntax p = &y;.

The final two lines include a call to h, which allocates y and initializes it to 30, followed by an attempt to print the value pointed to by the pointer p. What value do we expect to be passed to print?

Unfortunately, the C language provides no guarantees as to the behavior of this program. The problem it raises is that the static pointer p is set to the address of a variable y whose lifetime is temporary. The attempt to dereference p in the call to print is executed after h has returned and y is deallocated: it contains the address in some portion of the unallocated portion of the stack region.

Such problems stem from the fact that the program keeps a pointer p to a value y beyond the lifetime of the latter: stack discipline dictates that y be deallocated when the call to h returns, rendering the address stored in p invalid. More ambitous languages solve this problem by detecting cases where storage must be allocated from a more general storage mechanism (a heap), which avoids the constraints of stack discipline.
Procedure Values and Closures

17.1.2. Procedure Values and Closures

A closely related class of problems involves treating procedures as "first-class" values -- the ability to pass them around like integer or character data, as arguments to (and values returned from) procedures, stored in data structures, etc. // Procedure-generating-procedure: make_addx(int x) { // Define addx(y) = y+3: int addx(int y) { return x+y; } return addx; } add3 = make_addx(3) z = add3(3); The invalid C code to the right shows a reasonable but naive attempt to use procedures in ways unsupported by the C language. The procedure make_addx is intended to return a new procedure, given an integer x, that simply adds x to its argument. It does so by defining a local procedure addx, and returning that procedure as the value of the call to make_addx(x).

The final two lines include a call to make_addx(3), expected to produce a new procedure add3 that simply adds 3 to its supplied integer argument; the call to add3(4), in a more perfect world, should return the value 7.

Again, however, this expectation exceeds the limited model of procedure values supported by C. A problem is that the definition of the returned add3 function involves the argument x, which no longer exists after make_addx(3) returns; thus we have another dangling reference problem. Variants of this implementation challange involve passing procedures as arguments rather than returned values; while avoiding dangling references, access to nonlocal variables in these cases requires mechanism beyond the simple stack-based model assumed by C (and used in this course).

As a consequence, C disallows nested procedure definitions like that in the above example. More ambitious languages solve this problem by representing procedures as heap-allocated data structures called closures, which bundle values of relevent non-local variables into the runtime representation of procedure value.