Skip to main content

Syntax Definition

rascal-0.34.0

Synopsis

Syntax Definitions allow the definition of parsers for programming languages, domain-specific languages and data formats.

Syntax

start syntax Nonterminal = Alternatives;

lexical Nonterminal = Alternatives;

layout Nonterminal = Alternatives;

keyword Nonterminal = Alternatives;

where Start is either start or nothing, and Alternatives are one of:

Tags           Symbols         
Tags left Symbols
Tags right Symbols
Tags non-assoc Symbols
Tags left Name : Symbols
Tags right Name : Symbols
Tags non-assoc Name : Symbols
left ( Alternatives )
right ( Alternatives )
non-assoc ( Alternatives )

Alternatives₁ | Alternatives₂

Alternatives₁ > Alternatives₂

Here Associativity is nothing, or one of assoc, left, right or non-assoc, and Tags are a possibly empty list of Tags.

  • ❶ for binary recursive expression operators, Associativity disambiguates and chooses binding to the left, right, or non-assoc (demanding brackets)
  • ❷ a Name provides the algebraic constructor name for an abstract syntax definition derived from this grammar
  • ❸ an Associativity group is not only a short-hand, but also derives the cross product associativity rule between all members of the group.
  • ❹ this is the normal BNF-like rule alternative combinator
  • ❺ this defines a transitively closed Priority relation for all left- or right-recursive binary or unary expression operators (see Priority )

Description

Rascal supports full context-free grammars for syntax definition. It generates scannerless parsers from these definitions. These parsers produce Parse Trees that can be further processed by Rascal using Concrete Syntax fragments in Patterns and Expressions, or they can be imploded to Algebraic Data Types.

There are four kinds of non-terminals that can be defined with slightly different characteristics.

  • Syntax non-terminals are general context-free non-terminals. This means that left-recursion, right-recursion, any of the regular expression Symbols and all kinds of Disambiguation can be used to define it. It is important to note that in between the Symbols that define a syntax non-terminal the locally defined layout non-terminal will be interleaved. For example, if you define layout ML = [\ ]*; and syntax A = "a" "a", Rascal will modify the definition of A to syntax A = "a" ML "a"; before generating a parser.

  • Lexical non-terminals are very much like syntax non-terminals. However, the definition of a lexical is not modified with interleaved layout non-terminals. And, the structure of lexicals is not traversed by the visit statement and equality is checked between lexicals by checking the characters (not its structure) for equality.

  • Layout non-terminals are just like syntax non-terminals as well. However, they are used to preprocess all syntax definitions in the same module scope (see above).

  • Keyword non-terminals are not like syntax non-terminals. These only allow definition of enumeration of literal symbols and single character classes. Keyword non-terminals play an important role in the semantics of Disambiguation where some disambiguation constructs require finite, non-empty enumeration of strings. The prime example is the definition of reserved keywords.

Each alternative of a syntax definition is defined by a list of Symbols. Each of the Symbols can be labeled or not. The alternative of a defined syntax type may be labeled or not as well. With the label, additional operations are activated on the corresponding parse trees:

  • The is operator is defined for labeled alternatives (see Operators).
  • The has operator is defined for labeled Symbols in the right-hand side (see Operators).
  • Action functions can be written to override the construction of a parse tree, using the label of an alternative as the function name
  • Implode uses labeled alternatives to map to an Algebraic Data Type

Alternatives can be combined in a single Syntax Definition using the |, > and associativity combinators. The latter two represent Disambiguation constructs that you should read more about. The | is a short-hand for not having to repeat syntax A = for every alternative of A.

Alternatives can be named or not. The names are essential only if:

  • you need to implode Parse Trees
  • you need to use the is expression, as in myStatement is ifThenElse instead of using concrete pattern matching.
  • you want to write Actions that triggers on the construction of the alternative.

However, it is generally a good idea to name your rules even if you do not need them. Note that a name may be reused for different alternatives for a single non-terminal, provided that the lists of symbols for these "overloaded" alternatives use different non-terminal symbols. This implies that alternatives for lexicals generally do not use overloaded names because they are often defined only by regular expressions over terminal Symbols (literals and character classes).

The start modifier identifies the start of a grammar. The effect of a start modifier is that Rascal will generate an extra syntax definition before generating a parser that allows layout to before and after the start non-terminal. For example:

layout L = [\ ]*; start Program = Statement*;`

will produce syntax start[Program] = L Program top L;. Note that the start[Program] type is now available in your program, and Parse Trees assigned to variable of that type will allow access to the top field.

Examples

The following example makes use of practically all of the Syntax Definition features, except parse actions.

// layout is lists of whitespace characters
layout MyLayout = [\t\n\ \r\f]*;

// identifiers are characters of lowercase alphabet letters,
// not immediately preceded or followed by those (longest match)
// and not any of the reserved keywords
lexical Identifier = [a-z] !<< [a-z]+ !>> [a-z] \ MyKeywords;

// this defines the reserved keywords used in the definition of Identifier
keyword MyKeywords = "if" | "then" | "else" | "fi";

// here is a recursive definition of expressions
// using priority and associativity groups.
syntax Expression
= id: Identifier id
| null: "null"
| left multi: Expression l "*" Expression r
> left ( add: Expression l "+" Expression r
| sub: Expression l "-" Expression r
)
| bracket "(" Expression ")"
;

Benefits

  • Modular and compositional.
  • No grammar normalization or grammar factoring necessary.
  • Generate a parser for any context-free grammar.
  • Generated parsers are really fast (for general parsers).
  • Powerful disambiguation constructs for common programming language disambiguation patterns.
  • Data-dependent (context-sensitive) disambiguation via arbitrary functions.
  • Embedding of concrete syntax fragments in Rascal programs
  • Syntax Definitions follow the syntax and semantics of Algebraic Data Types quite closely.

Pitfalls