Team No Name 3 (Matt B,, John M., Weimeng P., and Todor N.) set out to create a compiler (and language) for our CS153 compiler design class. Here is the report.
For our CS153 class, we decided to design a compiler for the language “LilC” (or little C). LilC was originally intended to be a subset of the C language (c.f. TinyC, smallC, etc); however, it quickly evolved into its own language, both syntactically, and semantically. LilC was implemented in the same manner as the rest of the class’s projects. That is, we used the tooling of ANother Tool for Language Recognition (henceforth referred to as ANTLR) to generate a parser and lexer. Unlike the other teams, however, we designed our compiler in C++ (nonnative to ANTLR) which caused us much strife. Despite this, we were able to reach all requirements (in particular, we handle error-checking by inserting ? marks into the assembly file and printing debugging messages).
We used a source file with the same name as the language as an entry point for the compiler, which itself used two source files to override the default behavior of the “visitor”. (This visitor was generated by ANTLR in order to walk through the parse tree, also generated by ANTLR’s parser). The first pass found and associated types and definitions into a symbol table, which was fed into the second pass of the parse tree. (This symbol table structure was provided by our instructor, Dr. Mak, from previous assignments).
In the first pass, our team also took the initiative to pass a vector (list structure) which allowed us to “hoist” initializations of global variables into the second pass of the parse tree. This was necessary, because initializations could not be made until after writing the assembly to visit the main statement (which occurred in pass2). This allowed LilC to have global initializations, unlike in Pascal’s implementation. In pass2, we also took the initiative to use a map structure which would associate certain identifiers (those entered into the symbol table as variables, rather than fields) with slots (or register numbers). We chose to use the term “field” in the context of our own compiler as global variables (ironically, this is the same terminology as the Jasmin assembler). This map had to carefully reset its slot numbers as it entered and exited functions (similar to the stackframes of fully compiled languages).
The compiler, following LilC’s grammar, was ultimately able to generate code from a source file (which we affably gave the extension .lilc). This was outputted as a Jasmin assembly file (.j). These files could then be assembled by Jasmin to create Java executable files (.class). Notably, the compiler gives type-check warnings for variables when they are assigned to the result of a function, but this has not caused a problem in compilation, assembly, and execution. We also included Dr. Mak’s precompiled PascalRTL library in order to time the execution of our program. As a side note, we subsequently discovered that John’s computer was faster than Matt’s, but that is neither here nor there.
The code generation is where we had the most license to be creative. In C tradition, we kept assignments as expressions rather than statements, so that LilC might be able to chain assignments. Support for this exists in the grammar of the language; however, it is not actually implemented. This is because doing so would necessitate leaving an operand on the top of the JVM’s operand stack at the end of every assignment statement. However, many, if not most, assignments are top-level statements, and are not expected to leave anything on the operand stack. Thus, doing so could quickly “blow out the stack”. Matt believes C’s sequence points are used to wipe excess operands from carrying over between assignments, but this is beyond the scope of our project.
We also differentiated between procedures and functions (both being considered subroutines) in two ways. Firstly, functions are able to return a value, whereas procedures only return void. Secondly, procedures were intended to pass-by-reference, while functions were to pass-by-value. This was not implemented (both of them pass-by-value) due to time constraints and we realized only one or the other was necessary to meet requirements. Speaking of types, the language, despite being both statically and strongly typed, has nearly no references to type in the source code (the exception being subroutine arguments).
Additionally, the language inherits its output (or print) operator from C++ style. Its input is similarly styled, but not implemented (nor required). However, due to time restrictions, the print statement was handled one character at a time, and because LilC only has integers and floating point types implemented, this meant many conversions to ASCII were necessary. Much unlike C, LilC also has operators for exponentiation, ** (as in Python), and // for logarithms analogously. These were implemented with calls to Java’s precompiled libraries, largely because Todor needs more syntactic sugar in his life. The rest of the language and its implementation (the compiler) were relatively unnotable, compared to C.
Grammar for the Source Language (BNF)
Example from mersenne2.j
.method static fnDigitToChar(F)F ;{fpNum=fpNum+48.0;} ;fpNum=fpNum+48.0 ;fpNum+48.0 fload 0 ldc 48.0 fadd fstore 0 fload 0 freturn .limit stack 4 .limit locals 1 .end method
Example from mersenne2.j
;if(arg==count){<<121;<<101;<<115;}else{<<110;<<111;} ; arg==count getstatic prog/arg F getstatic prog/count F fcmpl ifne R2 ldc 1 goto R3 R2: ldc 0 R3: ifeq I0 ;{<<121;<<101;<<115;} ;<<121; getstatic java/lang/System/out Ljava/io/PrintStream; ldc 121 int2char invokevirtual java/io/PrintStream.print(C)V ;<<101; getstatic java/lang/System/out Ljava/io/PrintStream; ldc 101 int2char invokevirtual java/io/PrintStream.print(C)V ;<<115; getstatic java/lang/System/out Ljava/io/PrintStream; ldc 115 int2char invokevirtual java/io/PrintStream.print(C)V goto I1 I0: ;{<<110;<<111;} ;<<110; getstatic java/lang/System/out Ljava/io/PrintStream; ldc 110 int2char invokevirtual java/io/PrintStream.print(C)V ;<<111; getstatic java/lang/System/out Ljava/io/PrintStream; ldc 111 int2char invokevirtual java/io/PrintStream.print(C)V I1: (integerMode? "if_icmpgt R" : (realMode? "ifgt R" : "if??gt R")) (integerMode? "if_icmplt R" : (realMode? "iflt R" : "if??lt R")) (integerMode? "if_icmpge R" : (realMode? "ifge R" : "if??ge R")) (integerMode? "if_icmpne R" : (realMode? "ifne R" : "if??ne R")) (integerMode? "if_icmple R" : (realMode? "ifle R" : "if??le R")) (integerMode? "if_icmp?? R" : (realMode? "if?? R" : "if???? R"))
Essentially, if the instruction uses an integer type, we use the if_icmp. Otherwise, we use the standard ifcmp operators.
Example while statement pulled from catalan2.j
;while(itrBase>1){itrBase=itrBase-1;iBase=iBase*itrBase;} W0: ; itrBase>1 iload 1 ldc 1 if_icmple R0 ldc 1 goto R1 R0: ldc 0 R1: ifeq W1 ;{itrBase=itrBase-1;iBase=iBase*itrBase;} ;itrBase=itrBase-1 ;itrBase-1 iload 1 ldc 1 isub istore 1 ;iBase=iBase*itrBase ;iBase*itrBase iload 0 iload 1 imul istore 0 goto W0 W1: "\n;" << ctx->getText() << endl; // comment "\tW" << to_string(w_label++) << ":" << endl; // while visit(ctx->paren()); // eval conditional “\tifeq W" + to_string(w_label) << endl; // if not 0, goto done visit(ctx->statement()); // eval statement “\tgoto W" + to_string(w_label - 1) << endl; // goto while “"\tW" << to_string(w_label++) << ":" << endl; // done
;{callprompt:arg;while(count<=arg)count=count*2.0;count=count-1.0;if(arg==count){<<121;<<101;<<115;}else{<<110;<<111;}} ;callprompt:arg; getstatic prog/arg F invokestatic prog/prompt(F)V