DA1. Compilers¶
Statement¶
In your own words, describe the steps of compilation described by Niklaus Wirth in his 2005 text “Compiler Construction”. You must include a description of what a ‘Context Free Grammar’ is, and how it relates to the compilation process.
Answer¶
Computers understand binary, while us humans do not. We tend to write code in a high-level english-like language, which is then compiled into a low-level binary language that the computer can understand. This process is called compilation.
A Compiler (and its similar Interpreter) is a program that takes source code written in a high-level language and converts it into a low-level Intermediate Representation (IR) or machine code. The IR is then converted into machine code by an assembler or another compiler.
The language is set of rules for writing programs. The programs themselves consist of sequence of symbols that conforms to the language grammar. The grammar is a set of production or syntactic rules that govern how the symbols can be combined to form valid programs.
Once a program is written, and ordered to run using the appropriate command, the source code is passed to the compiler, the compiler perform few steps on the source until it emits the machine code, which is then executed by the computer. The steps are: Lexical Analysis, Syntax Analysis, Semantic Analysis, Code Generation, and Code Optimization (Wirth, 2017).
1. Lexical Analysis¶
- The input is the source code as a sequence of characters (string).
- The compiler breaks the string into tokens (words) and classifies them into categories (keywords, identifiers, operators, etc).
- The generated set of tokens is then outputted to the next step.
2. Syntax Analysis (Parsing)¶
- The input is the set of tokens generated by the previous step.
- The purpose of this step is to check that order or tokens is correct and conforms to the grammar of the language.
- Any syntax errors are reported to the user at this stage.
- A syntax tree is generated from tokens using a Top-Down or Bottom-Up parsing algorithm.
- The syntax tree is finalized and named an Abstract Syntax Tree (AST), the AST is then passed to the next step.
3. Semantic Analysis (Type Checking)¶
- The input is the AST generated by the previous step.
- There may be numerous steps in this stage, but the main purpose is to check that the program is semantically correct which means confirming that all operations are allowed by checking the types of the operands and the result of the operation.
- Any semantic errors (e.g. summing a string with an integer) are reported to the user at this stage.
- The AST is annotated with type information and passed to the next step.
4. Code Generation¶
- The input is the annotated AST generated by the previous step.
- The purpose of this step is to generate the machine code that the computer can understand.
- If the compiler is able to generate machine code directly, then the AST is converted to an executable and all done.
- If the compiler is not able to generate machine code directly, then the AST is converted into an Intermediate Representation (IR) which is then passed to another compiler (usually assembler) to generate the machine code.
The set of production rules are described as Context Free which is -according to Avaram Noam Chomsky- any LHS (left-hand side) non-terminal can be replaced by any RHS (right-hand side) sequence of terminals and non-terminals; and this replacement should be correct whenever a non-terminal appears in the RHS (or LHS) of a production rule (Wirth, 2017). For example, the following grammar is context free: S -> ab; a -> 1,2,3; b -> 4,5,6;
so that S
can be replaced by ab
and a
can be replaced by 1,2,3
and b
can be replaced by 4,5,6
.
References¶
- Wirth N. (2017). Compiler Construction. https://people.inf.ethz.ch/wirth/CompilerConstruction/CompilerConstruction1.pdf
- Ben-Ari, M. (2006). Understanding programming languages. Weizman Institute of Science. Chapter 3: Programming Environments.