CC-Summer-2006: CKPMcc

Clemens Krainer & Peter Mödlhammer C-Compiler

Project Team

Project Description

We implement a compiler package that accepts a subset of the C programming language.

This package will contain:

  1. a pre-processor that combines source files to compilation units.
  2. a compiler that transforms the compilation units to machine code in object files.
  3. a link editor that combines object files to an executable.
  4. a virtual machine that can load and run these executable files.

Our goal is to design and implement a complete compiler that compiles itself and to meet the ProjectRequirements.

Content

  1. Input Language
  2. Pre-Processor
  3. Compiler
  4. Link Editor
  5. Virtual Machine
  6. Downloads
  7. Quality Assurance
  8. Project Statistics

Project Status

1. Input Language

The input language for the compiler is a subset of the C programming language. No new keywords are introduced to ensure C compatibility.

The compiler suports:

Functions are implemented to support local hiding, i.e. local variables. The compiler supports the corresponding control structures for the keywords if, else, while, break, continue and return. Arbitrarily nested structures are supported also. Initialisation of variables and global variables in general are not supported. See the document CC-EBNF.pdf for a detailed description of the supported language.

2. Pre-Processor

The pre-processor transforms source files into compilation units. It works in the same way as the available C pre-processors and provides the usually commands like if, ifdef, defined elsif, endif, include and define. Additionally, the pre-porcessor supports arithmetic and logical expressions, as well as macro substitutions and comment deletion. Function macros are not supported. To help the compiler to find the correct line number and source file, the pre-processor adds line information in the compilation units. The document PP-EBNF.pdf provides a detailed description of the supported pre-processor commands.

Status of the pre-processor:

3. Compiler

The compiler checks and transforms compilation units into object files for the link editor. The compiler is a recursive descent compiler, but no LL1 type. For parsing the imput language a dynamic look ahead machanism has been implemented that uses a save point concept. By this, save points can be set at arbitrary locations in the token stream and parsing can be reset to such a save point. When reseting to a save point, the code generation rolled back. Directory libc in the distribution package contains the bootstrap code that features the basic string functions and some definitions for file handling. See above for the compiler's imput language.

Status of the compiler:

4. Link Editor

The link editor combines object files to an executable program suitable to be executable by the virtual machine. Both, object files and executable use the same file format. By this, it is possible to compile several source files separately and link the generated object files to one big object file. The link editor relocates the code segments as well as the data segments and combines them to one common code area along with one common data area. The linker does the link fixups as possible and merges the fix-up chains of in the code segment.

Document VM-Specification.pdf describes the object file format in detail.

Status of the link editor:

5. Virtual Machine

The design of the virtual machine was essentially inspired by the design of the JavaTM virtual machine. It is a stack based machine, implemented in C and features file I/O functionality for opening, reading, writing and closing files as well as allocating and de-allocating memory from heap.

The heap is that part of virtual machine memory from which chunks of memory are allocated. The heap is not garbage collected, so allocated blocks have to be freed explicitly. The virtual machine hides the implementation of the functions for memory allocation and de-allocation for two reasons. First, the performance of the code provided by the virtual machine is much higher and second, for easier debugging. Currently the virtual machine manages the heap memory with both, a used and a free list applying the first fit algorithm for memory allocation.

Executables may have unresolved symbols, but the virtual machine is not able to dynamically load code at runtime yet. See the CKPMvm virtual machine specification in document VM-Specification.pdf for a detailled description of the instruction set and the CKLF linking file format.

Status of the virtual machine:

6. Downloads

EBNF of the Preprocessor:Attach:PP-EBNF.pdf(2006-07-13)
EBNF of the Compiler:Attach:CC-EBNF.pdf(2006-07-13)
Virtual Machine Specification:Attach:VM-Specification.pdf(2006-04-23)
Complete source code with documentation:Attach:CKPMcc-src.tar.gz(2006-07-13)
Presentation on 20 June 2006:Attach:CKPMcc-slides.pdf(2006-06-20)

7. Quality Assurance

This project emphasises quality assurance by supporting more than 70 test procedures for pre-processor, compiler, linker and virtual machine. All tests can be performed automatically by the make utility.

8. Current Project Statistics (July 13, 2006)

The lines of code do not include header files and are determined by grep and wc as follows:

grep '[};]' *.c | wc -l

Source Code:
ModuleLines of CodeLines total
C library63594
pre-processor13504075
compiler31377181
link editor4531228
virtual machine6612363
Total566415441
 
Tests Source Code:
ModuleLines of CodeLines total
C library108594
pre-processor1974738
compiler157412941
link editor1291414
virtual machine53234
Total206119921
 
Efford in hours
EBNF72%
C library93%
pre-processor247%
scanner93%
parser7321%
code generator14040%
link editor206%
virtual machine6418%
Total346