Spring 2019

Phase 1

Prerequisites

Formal Grammars and Notation

You will need to be familiar with the notion of a formal grammar, and in particular need to be able to work with a specification of a new language detailed in this document. If the notation in this document is challenging to read, we suggest reviewing the foundational concepts of grammars.

Git Repository

You will also need to be familiar with git, as the scaffolding for phase one is hosted at git@github.com:6S081/6s081-sp19.git. In this repository you will find this semester's helper code and test cases. Furthermore, all submissions for the semester will be pushed to your own designated git repository hosted at git@github.com:6s081/sp19-$GITHUB-USERNAME where $GITHUB-USERNAME should be treated as a formal variable evaluating to the text string of your GitHub.com username. If you are getting permission errors while cloning, try cloning the https link instead. If you are still getting permission errors, then make sure you have accepted the invites to both the organization and to your private repo. To setup your repo, perform the following:
  • git clone git@github.com:6s081/sp19-$GITHUB-USERNAME
  • cd sp19-$GITHUB-USERNAME
  • git remote add 6s081 git@github.com:6S081/6s081-sp19.git
  • git pull 6s081 master

Submission

Submission of your phase should be accomplished by pushing your code to a branch with the name a1-submission . The last code pushed to this branch before the due date (see class schedule) is what will count as your final submission, unless you notify the staff otherwise.

Running The Parser

We provide a class
virtual machine for you to test and compile your code on. You don't have to use it, but we strongly recommend it. This is the environment we will be using to grade your code, so please make sure it compiles in the VM. We also highly encourage using the VM for Windows users as it is typically more time consuming to get C++ building and running on Windows. To run the environment, you will need to download VMWare which could be found here. To build the project within the interpreter directory, run
 make parser 
and to run a specific test case run
 cat ../tests/a1/public/bad1.mit | ./mitscript-parser 
after building.

Goals

Parsing MITScript

Your goal for this phase is to create a parser for the MITScript language. The grammar for the language is shown below.

Program ::= Statement* ;; Statement ::= Assignment | CallStatement | Global | IfStatement | WhileLoop | Return;; Global ::= 'global' Name ';' ;; Assignment ::= LHS '=' Expression ';' ;; CallStatement ::= Call ';' ;; Block ::= '{' Statement* '}';; IfStatement ::= 'if' '(' Expression ')' Block ( 'else' Block )? ;; WhileLoop ::= 'while' '(' Expression ')' Block ;; Return ::= 'return' Expression ';' ;; Expression ::= Function | Boolean | Record ;; Function ::= 'fun' '(' (Name (',' Name)*)? ')' Block ;; Boolean ::= Conjunction ( '|' Conjunction )* ;; Conjunction ::= BoolUnit ('&' BoolUnit)* ;; BoolUnit ::= '!'? Predicate ;; Predicate ::= Arithmetic ( ('<' | '>' | '<=' | '>='| '==') Arithmetic)?;; Arithmetic ::= Product ( ('+' | '-') Product)* ;; Product ::= Unit ( ('*' | '/') Unit)* ;; Unit ::= '-'? (LHS | Constant | Call | '(' Boolean ')' );; LHS ::= Name ('.' Name | '[' Expression ']' )* ;; Call ::= LHS '(' (Expression (',' Expression)*)? ')' ;; Record ::= '{' (Name ':' Expression ';')* '}' ;; Constant ::= integer_constant | string_constant | boolean_constant | none_constant;;

Implementation: lexing and parsing

You will create a lexer for MITScript (given by the grammar above) using antlr4 and write your own recursive descent parser for the language by hand.

The Lexer

The antlr4 tool provides a variety of great functionality for dealing with grammars. Moreover, there are many examples and resources on the web that illustrate its behavior. In this phase you will use antlr4's grammar specification language to automatically generate a lexer that maps a string of characters to tokens. The starter code includes a starter antlr4 grammar file named, "MITScript.g" that includes definitions for whitespace, simple integers, multiplication, division, and comments (all of which you may reuse and/or adapt for your project).

Your lexer must be able to recognize the following kinds of tokens in addition to all the keywords and operators listed above in the language specification:
  • integer constants consisting of one or more digits
  • string constants wrapped in double quotes and supporting the following escaped characters: \\ \" \n \t. Escaping any other character should result in error
  • None constant, equivalent to "NULL" in Java
  • Boolean constants 'true' and 'false'
  • Name identifiers that start with a letter or underscore, followed by sequence of letters, underscores and numbers. so x0 is a valid variable name, but 0x is not
  • comments, which start with '//' and end at the next line break
The file "parser.cpp" provides starter code for connecting antlr's automatically generated lexer to standard input and creating the appropriate token stream object for use in your parser. The antlr system has extensive advanced functionality for inspecting and manipulating the lexer. If you'd like to use that advanced functionality (though it is not necessary), please refer to the antlr documentation linked above and other examples on the web.

The Parser

You will develop a recursive descent parser by hand. The file "parser.cpp" includes an example parser that parses the Term language that we cover in lecture (here, Slide 3). This example provides a direct mapping from the lecture's example code to antlr's lexing API.

As output, your parser must produce an Abstract Syntax Tree (AST) with nodes for the following program constructs:

Block ::= [Statement] ;; Global ::= Name ;; Assignment ::= LHS Expression ;; ExpressionStatement ::= Expression ;; IfStatement ::= Condition ThenPart ElsePart ;; WhileLoop ::= Condition Body ;; Return ::= Expression ;; FunctionDeclaration ::= [Arguments] Body ;; BinaryExpression ::= LeftOperand Operator RightOperand ;; UnaryExpression ::= Operand Operator ;; FieldDereference ::= BaseExpression Field ;; IndexExpression ::= BaseExpression Index ;; Call ::= TargetExpression [Arguments] ;; Record ::= Map[String, Expression] ;; IntegerConstant ;; StringConstant ;; NoneConstant ;; BooleanConstant (Optional: not required as not included in original specification. You can alernatively map booleans to integers) ;;

You need to make sure that all arithmetic operators are left associative, so (w+x+y+z) should be parsed as (((w+x)+y)+z)

You are free to implement your parser as you see fit, provided that you deliver an implementation that uses no additional parsing libraries outside of what you develop yourself.

For example, antlr (and many other tools and libraries) can be used to automatically generate parsers from a grammar specification. If you would like to use antlr (or these other tools) to automatically generate a reference, executable parser with which you then compare the outputs of which against that of your manually-developed implementation, then you are free to do so. However, your manually-developed implementation must not use any of the code from these tools in their operation (outside of antlr's lexing capabilies).

Note that this restriction does not prevent you from implementing your own helper code, such as implementing your own parser combinator library (which is one neat way to implement a parser).

As a general note, parser generators are blackbox and unintuitive tools (unless you spend significant time understanding their generation process) and therefore we instead have you develop a recursive descent parser by hand. The advantage of this approach is that you will have an intimate understanding of your parser for later parts of the project when you decide to improve your parser with different functionality (e.g., better error-handling) or when -- inevitably -- you find bugs in your parser.

Deliverables

Your parser should read a program from stdin and attempt to parse it.

If the program is a syntactically correct MITScript program (according to the grammar), then you should pretty print the AST to stdout and return a return code of 0.

If the program is syntactically incorrect, you should return a return code of 1 (indicating that the program has a syntax error).

To develop this parser, your tasks will be:
  • Write the lexer and parser
  • Define all the types for your AST nodes
  • Define a visitor interface
  • Define a PrettyPrinter class that extends the visitor interface and pretty prints the AST
  • Provide ten test cases.
We don't care that much about how well your output is formatted because we will not grade the AST output (see Grading below). We have included output files that demonstrate a valid possible output from your parser, though the output AST does not need to match these output files character for character, they are just a reference.

For unary expressions, the operator should be outside the parenthesis, so for example !x should be pretty printed to !(x). One strict requirement is that your pretty printer should wrap all binary and unary expressions in parenthesis.

You should provide 10 tests named test1.mit, to test10.mit that your parser can parse correctly. Your tests should provide good coverage of the constructs in the language.

Grading (Extra Credit Checkpoint)

The extra credit checkpoint yields at most 10 points of extra credit (out of 100) towards your combined grade for Phase 1 and Phase 2.

For the extra credit checkpoint, we will grade your parser against the full suite of private tests and you will receive extra credit in proportion to the proportion of test cases you get correct (100% correct yields 10 points. 0% yields 0 points).

So please turn in whatever you have and try to get some points. Our goal for the checkpoint is to incentivize an early start and to give you the opportunity to get early feedback.

Methodology: we will grade your parser by running it on a combination of syntactically valid and invalid programs. We will inspect your parser's return code to determine if you have correctly determine if the inputted program is valid or invalid. We will manually sanity check your AST output to provide feedback where appropriate. However, the quality of the output does not factor into your grade.

Implementation notes

For this phase, you do not have to worry about reclaiming memory (we will implement garbage collection in a later phase).

Start small. This is true for every phase. Identify substasks that you can complete end-to-end instead of trying to implement the entire phase for the entirety of MITScript. For example, in 6.031 you built an interpreter for arithmetic. MITScript has arithmetic as well and, therefore, you can check your understanding by implementing an end-to-end implementation of the phase for arithmetic (and whatever limited additional program constructs you need to get it through). For Phase 1, that means first developing a parser for just arithmetic to verify your understanding of the approach.