Synopsis Syntax Definitions allow the definition of parsers for programming languages, domain-specific languages and data formats.
  1. Start syntax Nonterminal = Alternatives;
  2. lexical Nonterminal = Alternatives;
  3. layout Nonterminal = Alternatives;
  4. keyword Nonterminal = Alternatives;
where Start is either start or nothing, and Alternatives are one of:
  1. Tags Associativity Symbols
  2. Tags Associativity Name : Symbols
  3. Associativity ( Alternatives )
  4. Alternatives1 | Alternatives2
  5. Alternatives1 > Alternatives2
where Associativity is nothing, or one of assoc, left, right or non-assoc, and Tags are a possibly empty list of tags.
Description Rascal supports full context-free grammars for syntax definition. It generates scannerless parsers from these definitions. These parsers produce ParseTrees that can be further processed by Rascal using ConcreteSyntax fragments in Patterns and Expressions, or they can be imploded to AlgebraicDataTypes.

There are four kinds of non-terminals that can be defined with slightly different characteristics.
  1. Syntax non-terminals are general context-free non-terminals. This mean left-recursion, right-recursion, any of the regular expression Symbols and all kinds of Disambiguation can be used to define it. It is important to note that in between the Symbols that define a syntax non-terminal the locally defined layout non-terminal will be interleaved. For example, if you define layout ML = [\ ]*; and syntax A = "a" "a", Rascal will modify the definition of A to syntax A = "a" ML "a"; before generating a parser.
  2. Lexical non-terminals are just like syntax non-terminals, very much like syntax non-terminals. However, the definition of a lexical is not modified with interleaved layout non-terminals. And, the structure of lexicals is not traversed by the Visit statement and equality is checked between lexicals by checking the characters (not its structure) for equality.
  3. Layout non-terminals are just like syntax non-terminals as well. However, they are used to preprocess all syntax definitions in the same module scope (see above).
  4. Keyword non-terminals are not like syntax non-terminals. These only allow definition of enumeration of literal symbols and single character classes. Keyword non-terminals play an important role in the semantics of Disambiguation where some disambiguation constructs require finite, non-empty enumeration of strings. The prime example is the definition of reserved keywords.
Each alternative of a syntax definition is defined by a list of Symbols. Each of the Symbols can be labeled or not. The alternative of a defined syntax type may be labeled or not as well. With the label additional operations are activated on the corresponding parse trees:
  • The is operator is defined for labeled alternatives (see Operators).
  • The has operator is defined for labeled Symbols in the right-hand side (see Operators).
  • Action functions can be written to override the construction of a parse tree, using the label of an alternative as the function name
  • implode uses labeled alternatives to map to an AlgebraicDataType
Alternatives can be combined in a single SyntaxDefinition using the |, > and associativity combinators. The latter two represent Disambiguation constructs that you should read more about. The | is a short-hand for not having to repeat syntax A = for every alternative of A.

Alternatives can be named or not. The names are essential only if:
  • you need to implode ParseTrees
  • you need to use the is expression, as in myStatement is ifThenElse instead of using concrete pattern matching.
  • you want to write Actions that triggers on the construction of the alternative.
However, it is generally a good idea to name your rules even if you do not need them. Note that a name may be reused for different alternatives for a single non-terminal, provided that the lists of symbols for these "overloaded" alternatives use different non-terminal symbols. This implies that alternatives for lexicals generally do not use overloaded names because they are often defined only by regular expressions over terminal Symbols (literals and character classes).

The start modifier identifies the start of a grammar. The effect of a start modifier is that Rascal will generate an extra syntax definition before generating a parser that allows layout to before and after the start non-terminal. For example: layout L = [\ ]*; start Program = Statement*; will produce syntax start[Program] = L Program top L;. Note that the start[Program] type is now available in your program, and ParseTrees assigned to variable of that type will allow access to the top field.
Examples The following example makes use of practically all of the SyntaxDefinition features, except parse actions.
// layout is lists of whitespace characters
layout MyLayout = [\t\n\ \r\f]*;

// identifiers are characters of lowercase alphabet letters, 
// not immediately preceded or followed by those (longest match)
// and not any of the reserved keywords
lexical Identifier = [a-z] !<< [a-z]+ !>> [a-z] \ MyKeywords;

// this defines the reserved keywords used in the definition of Identifier
keyword MyKeywords = "if" | "then" | "else" | "fi";

// here is a recursive definition of expressions 
// using priority and associativity groups.
syntax Expression 
  = id: Identifier id
  | null: "null"
  | left multi: Expression l "*" Expression r
  > left ( add: Expression l "+" Expression r
         | sub: Expression l "-" Expression r
  | bracket "(" Expression ")"
  • modular and compositional
  • no grammar normalization or grammar factoring necessary
  • generate a parser for any context-free grammar
  • generate parsers are really fast (for general parsers)
  • powerful disambiguation constructs for common programming language disambiguation patterns
  • data-dependent (context-sensitive) disambiguation via arbitrary functions
  • embedding of concrete syntax fragments in Rascal programs
  • SyntaxDefinitions follow the syntax and semantics of AlgebraicDataTypes quite closely
Is this page unclear, or have you spotted an error? Please add a comment below and help us to improve it. For all other questions and remarks, visit