Navigation Synopsis The ParseTrees module defines an algebraic data-type for parse trees, which all parsers generated from SyntaxDefinitions produce. Description Below is the full definition of Tree and Production and Symbol. A parse tree is a nested tree structure of type Tree. Most internal nodes are applications (appl) of a Production to a list of children Tree nodes. Production is the abstract representation of a SyntaxDefinition rule, which consists of a definition of an alternative for a Symbol by a list of Symbols. The leaves of a parse tree are always characters (char), which have an integer index in the UTF8 table. Some internal nodes encode ambiguity (amb) by pointing to a set of alternative Tree nodes. The Production and Symbol types are an abstract notation for rules in SyntaxDefinitions, while the Tree type is the actual notation for parse trees. Parse trees are called parse forests when they contain amb nodes. You can analyze and manipulate parse trees in three ways: Directly on the Tree level, just like any other AlgebraicDataType Using ConcreteSyntax Using Actions The type of a parse tree is the symbol that it's production produces, i.e. appl(prod(sort("A"),[],{}),[]) has type A. Ambiguity nodes Each such a non-terminal type has Tree as its immediate super-type. @license{ Copyright (c) 2009-2013 CWI All rights reserved. This program and the accompanying materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at http://www.eclipse.org/legal/epl-v10.html } @contributor{Jurgen J. Vinju - [email protected] - CWI} @contributor{Bas Basten - [email protected] (CWI)} @contributor{Tijs van der Storm - [email protected]} @contributor{Paul Klint - [email protected] - CWI} @contributor{Arnold Lankamp - [email protected]} @doc{ Synopsis: Library functions for parse trees. Description: A _concrete syntax tree_ or [parse tree](http://en.wikipedia.org/wiki/Parse_tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In Rascal parse trees, the interior nodes are labeled by rules of the grammar, while the leaf nodes are labeled by terminals (characters) of the grammar. Tree is the universal parse tree data type in Rascal and can be used to represent parse trees for any language. * Tree is a subtype of the type [$Values/Node]. * All [SyntaxDefinition] types (non-terminals) are sub-types of Tree * All [ConcreteSyntax] expressions produce parse trees the types of which are non-terminals * Trees can be annotated in various ways, see [IDEConstruction] features. Most importantly the \loc annotation always points to the source location of any (sub) parse tree. Parse trees are usually analyzed and constructed using [ConcreteSyntax] expressions and patterns. _Advanced users_ may want to create tools that analyze any parse tree, regardless of the [SyntaxDefinition] that generated it, you can manipulate them on the abstract level. In [$ParseTree/Tree] is the full definition of Tree, Production and Symbol. A parse tree is a nested tree structure of type Tree. * Most internal nodes are applications (appl) of a Production to a list of children Tree nodes. Production is the abstract representation of a [SyntaxDefinition] rule, which consists of a definition of an alternative for a Symbol by a list of Symbols. * The leaves of a parse tree are always characters (char), which have an integer index in the UTF8 table. * Some internal nodes encode ambiguity (amb) by pointing to a set of alternative Tree nodes. The Production and Symbol types are an abstract notation for rules in [SyntaxDefinition]s, while the Tree type is the actual notation for parse trees. Parse trees are called parse forests when they contain amb nodes. You can analyze and manipulate parse trees in three ways: * Directly on the Tree level, just like any other [AlgebraicDataType] * Using [ConcreteSyntax] * Using [Action]s The type of a parse tree is the symbol that it's production produces, i.e. appl(prod(sort("A"),[],{}),[]) has type A. Ambiguity nodes Each such a non-terminal type has Tree as its immediate super-type. The ParseTree library provides: Examples: // the following definition syntax A = "a"; // would make the following [Test] succeed: test a() = parse(#A,"a") == appl(prod(sort("A"),[lit("a")],{}),[appl(prod(lit("a"),[\char-class([range(97,97)]),[char(97)])]); // you see that the defined non-terminal A ends up as the production for the outermost node. As the only child is the tree for recognizing the literal a, which is defined to be a single a from the character-class [ a ]. // when we use labels in the definitions, they also end up in the trees: // the following definition lexical A = myA:"a" B bLabel; lexical B = myB:"b"; // would make the following [Test] succeed: test a() = parse(#A,"ab") == appl(prod(label("myA",lex("A")),[lit("a"),sort("bLabel",lex("B"))],{}),[appl(prod(lit("a"),[\char-class([range(97,97)]),[char(97)]),appl(prod(label("myB", lex("B"),[lit("b")],{}),[appl(prod(lit("b"),[\char-class([range(98,98)]),[char(98)])]) ]); // here you see that the alternative name is a label around the first argument of prod while argument labels become labels in the list of children of a prod. Examples: Benefits: Pitfalls: For historical reasons the name of the annotation is "loc" and this interferes with the Rascal keyword loc for the type of [\$Rascal/Expressions/Values/Location]s. Therefore the annotation name has to be escaped as \loc when it is declared or used. Questions: } module ParseTree extend Type; extend Message; extend List; @doc{ Synopsis: The Tree data type as produced by the parser. Description: * A Tree defines the trees normally found after parsing () and additional constructors () that are used in error trees. * A Production () is a rule of a grammar, with a defined non-terminal, a list of terminal and non-terminal symbols and a possibly empty set of attributes. * An Attr (attribute, ) documents additional semantics of a production rule. Neither tags nor brackets are processed by the parser generator. Rather downstream processors are activated by these. Associativity is a parser generator feature though. * Associativity () defines the kinds of associativity. * CharRange () defines a range of characters. * A CharClass () consists of a list of characters ranges. * The start symbol () wraps any symbol to indicate that it is a start symbol of the grammar and may occur at the root of a parse tree. * Symbol defines non-terminals (), terminals () and regular expressions (). * The Conditional wrapper () adds conditions to the existence of an instance of a symbol. * A Condition () on a symbol gives rise to a disambiguation filter. } data Tree = appl(Production prod, list[Tree] args) | cycle(Symbol symbol, int cycleLength) | amb(set[Tree] alternatives) | char(int character) ; data Production = prod(Symbol def, list[Symbol] symbols, set[Attr] attributes) | regular(Symbol def) | error(Production prod, int dot) | skipped() ; data Attr = \assoc(Associativity \assoc) | \bracket() ; data Associativity = \left() | \right() | \assoc() | \non-assoc() ; data CharRange = range(int begin, int end) ; alias CharClass = list[CharRange] ; data Symbol = \start(Symbol symbol) ; // These symbols are the named non-terminals. data Symbol = \sort(str name) | \lex(str name) | \layouts(str name) | \keywords(str name) | \parameterized-sort(str name, list[Symbol] parameters) | \parameterized-lex(str name, list[Symbol] parameters) ; public bool subtype(Symbol::\sort(_), Symbol::\adt("Tree", _)) = true; // These are the terminal symbols. data Symbol = \lit(str string) | \cilit(str string) | \char-class(list[CharRange] ranges) ; // These are the regular expressions. data Symbol = \empty() | \opt(Symbol symbol) | \iter(Symbol symbol) | \iter-star(Symbol symbol) | \iter-seps(Symbol symbol, list[Symbol] separators) | \iter-star-seps(Symbol symbol, list[Symbol] separators) | \alt(set[Symbol] alternatives) | \seq(list[Symbol] symbols) ; data Symbol = \conditional(Symbol symbol, set[Condition] conditions) ; @doc{ Synopsis: constructors for declaring preconditions and postconditions on symbols } data Condition = \follow(Symbol symbol) | \not-follow(Symbol symbol) | \precede(Symbol symbol) | \not-precede(Symbol symbol) | \delete(Symbol symbol) | \at-column(int column) | \begin-of-line() | \end-of-line() | \except(str label) ; @doc{ Here we extend productions with basic combinators allowing to construct ordered and un-ordered compositions, and associativity groups. The intended semantics are that * 'choice' means unordered choice (defined in |Type|) * 'priority' means ordered choice, where alternatives are tried from left to right, * 'assoc' means all alternatives are acceptable, but nested on the declared side * 'others' means '...', which is substituted for a choice among the other definitions * 'reference' means a reference to another production rule which should be substituted there, for extending priority chains and such. } data Production = \priority(Symbol def, list[Production] choices) | \associativity(Symbol def, Associativity \assoc, set[Production] alternatives) | \others(Symbol def) | \reference(Symbol def, str cons) ; @doc{ Synopsis: Nested priority is flattened. } public Production priority(Symbol s, [*Production a, priority(Symbol t, list[Production] b), *Production c]) = priority(s,a+b+c); @doc{ Synopsis: Choice under associativity is flattened. } public Production associativity(Symbol s, Associativity as, {*Production a, choice(Symbol t, set[Production] b)}) = associativity(s, as, a+b); @doc{Nested (equal) associativity is flattened} public Production associativity(Symbol rhs, Associativity a, {associativity(rhs, Associativity b, set[Production] alts), *Production rest}) = associativity(rhs, a, rest + alts) ; Production associativity(Symbol rhs, Associativity a, {prod(rhs, list[Symbol] lhs, set[Attr] as), *Production rest}) = \associativity(rhs, a, rest + {prod(rhs, lhs, as + {\assoc(a)})}) when !(\assoc(_) <- as); Production associativity(Symbol rhs, Associativity a, {prod(label(str l, rhs), list[Symbol] lhs, set[Attr] as), *Production rest}) = \associativity(rhs, a, rest + {prod(label(l, rhs), lhs, as + {\assoc(a)})}) when !(\assoc(_) <- as); @doc{Priority under an associativity group defaults to choice} public Production associativity(Symbol s, Associativity as, {*Production a, priority(Symbol t, list[Production] b)}) = associativity(s, as, a + { e | e <- b}); @doc{ Synopsis: Annotate a parse tree node with a source location. } anno loc [email protected]\loc; @doc{ Synopsis: Parse input text (from a string or a location) and return a parse tree. Description: # Parse a string and return a parse tree. # Parse a string and return a parse tree, origin defines the original location of the input. # Parse the contents of resource input and return a parse tree. Examples: import demo::lang::Exp::Concrete::NoLayout::Syntax; import ParseTree; // Seeing that parse returns a parse tree: parse(#Exp, "2+3"); // Catching a parse error: import IO; try { Exp e = parse(#Exp, "[email protected]"); } catch ParseError(loc l): { println("I found a parse error at line , column "); } } @javaClass{org.rascalmpl.library.Prelude} @reflect{uses information about syntax definitions at call site} public java &T<:Tree parse(type[&T<:Tree] begin, str input); //@experimental //@javaClass{org.rascalmpl.library.Prelude} //@reflect{uses information about syntax definitions at call site} //public java &T<:Tree parse(type[&T<:Tree] begin, map[Production robust, CharClass lookaheads] recovery, str input, loc origin); @javaClass{org.rascalmpl.library.Prelude} @reflect{uses information about syntax definitions at call site} public java &T<:Tree parse(type[&T<:Tree] begin, str input, loc origin); @javaClass{org.rascalmpl.library.Prelude} @reflect{uses information about syntax definitions at call site} public java &T<:Tree parse(type[&T<:Tree] begin, loc input); @doc{ Synopsis: Yield the string of characters that form the leafs of the given parse tree. Description: unparse is the inverse function of [parse], i.e., for every syntactically correct string TXT of type S, the following holds: unparse(parse(#S, TXT)) == TXT Examples: import demo::lang::Exp::Concrete::NoLayout::Syntax; import ParseTree; // First parse an expression, this results in a parse tree. Then unparse this parse tree: unparse(parse(#Exp, "2+3")); } @javaClass{org.rascalmpl.library.Prelude} public java str unparse(Tree tree); @doc{ Synopsis: Save the current object parser to a file. Description: saveParser will save the current object parser (constructed from (imported) syntax declarations) to a file. The name of the parser class is returned, for reference. The saved parser can be used later on by loading the parser class from the JAR file, instantiating it and calling the parse() method. Examples: import ParseTree; // import a grammar import demo::lang::Exp::Concrete::NoLayout::Syntax; // save the parser to a JAR file saveParser(|file:///tmp/Exp.jar|); } @javaClass{org.rascalmpl.library.Prelude} @reflect{uses evaluator's parser interface} public java str saveParser(loc outFile); @javaClass{org.rascalmpl.library.Prelude} public java str printSymbol(Symbol sym, bool withLayout); @javaClass{org.rascalmpl.library.Prelude} @reflect{Uses Evaluator to create constructors in the caller scope (to fire rewrite rules).} @doc{ Synopsis: Implode a parse tree according to a given (ADT) type. Description: Given a grammar for a language, its sentences can be parsed and the result is a parse tree (or more precisely a value of type Tree). For many applications this is sufficient and the results are achieved by traversing and matching them using concrete patterns. In other cases, the further processing of parse trees is better done in a more abstract form. The [abstract syntax](http://en.wikipedia.org/wiki/Abstract_syntax) for a language is a data type that is used to represent programs in the language in an _abstract_ form. Abstract syntax has the following properties: * It is "abstract" in the sense that it does not contain textual details such as parentheses, layout, and the like. * While a language has one grammar (also known as, _concrete syntax_) it may have several abstract syntaxes for different purposes: type analysis, code generation, etc. The function implode bridges the gap between parse tree and abstract syntax tree. Given a parse tree and a Rascal type it traverses them simultaneously and constructs an abstract syntax tree (a value of the given type) as follows: # Literals, layout and empty (i.e. ()) nodes are skipped. # Regular */+ lists are imploded to lists or sets depending on what is expected in the ADT. # Ambiguities are imploded to sets. # If the expected type is str the tree is unparsed into a string. This happens for both lexical and context-free parse trees. # If a tree's production has no label and a single AST (i.e. non-layout, non-literal) argument (for instance, an injection), the tree node is skipped, and implosion continues with the lone argument. The same applies to bracket productions, even if they are labeled. # If a tree's production has no label, but more than one argument, the tree is imploded to a tuple (provided this conforms to the ADT). # Optionals are imploded to booleans if this is expected in the ADT. This also works for optional literals, as shown in the example below. # An optional is imploded to a list with zero or one argument, iff a list type is expected. # If the argument of an optional tree has a production with no label, containing a single list, then this list is spliced into the optional list. # For trees with (cons-)labeled productions, the corresponding constructor in the ADT corresponding to the non-terminal of the production is found in order to make the AST. # If the provided type is node, (cons-)labeled trees will be imploded to untyped nodes. This means that any subtrees below it will be untyped nodes (if there is a label), tuples of nodes (if a label is absent), and strings for lexicals. # Unlabeled lexicals are imploded to str, int, real, bool depending on the expected type in the ADT. To implode lexical into types other than str, the PDB parse functions for integers and doubles are used. Boolean lexicals should match "true" or "false". NB: lexicals are imploded this way, even if they are ambiguous. # If a lexical tree has a cons label, the tree imploded to a constructor with that name and a single string-valued argument containing the tree's yield. An IllegalArgument exception is thrown if during implosion a tree is encountered that cannot be imploded to the expected type in the ADT. As explained above, this function assumes that the ADT type names correspond to syntax non-terminal names, and constructor names correspond to production labels. Labels of production arguments do not have to match with labels in ADT constructors. Finally, source location annotations are propagated as annotations on constructor ASTs. To access them, the user is required to explicitly declare a location annotation on all ADTs used in implosion. In other words, for every ADT type T, add: anno loc [email protected]; Examples: ==Example for rule 5== Given the grammar syntax IDTYPE = Id ":" Type; syntax Decls = decls: "declare" {IDTYPE ","}* ";"; Decls will be imploded as: data Decls = decls(list[tuple[str,Type]]); (assuming Id is a lexical non-terminal). ==Example for rule 6== Given the grammar syntax Formal = formal: "VAR"? {Id ","}+ ":" Type; The corresponding ADT could be: data Formal = formal(bool, list[str], Type); ==Example for rule 8== Given the grammar syntax Tag = "[" {Modifier ","}* "]"; syntax Decl = decl: Tag? Signature Body; In this case, a Decl is imploded into the following ADT: data Decl = decl(list[Modifier], Signature, Body); ==Example for rule 9== Given the grammar syntax Exp = left add: Exp "+" Exp; Can be imploded into: data Exp = add(Exp, Exp); } public java &T<:value implode(type[&T<:value] t, Tree tree); @doc{ Synopsis: Annotate a parse tree node with an (error) message. } public anno Message [email protected]; @doc{ Synopsis: Annotate a parse tree node with a list of (error) messages. } public anno set[Message] [email protected]; @doc{ Synopsis: Annotate a parse tree node with a documentation string. } anno str [email protected]; @doc{ Synopsis: Annotate a parse tree node with documentation strings for several locations. } anno map[loc,str] [email protected]; @doc{ Synopsis: Annotate a parse tree node with the target of a reference. } anno loc [email protected]; @doc{ Synopsis: Annotate a parse tree node with multiple targets for a reference. } anno set[loc] [email protected]; @doc{ Synopsis: Tree search result type for [treeAt]. } public data TreeSearchResult[&T<:Tree] = treeFound(&T tree) | treeNotFound(); @doc{ Synopsis: Select the innermost Tree of a given type which is enclosed by a given location. Description: Select the innermost Tree of type t which is enclosed by location l. } public TreeSearchResult[&T<:Tree] treeAt(type[&T<:Tree] t, loc l, a:appl(_, _)) { if (([email protected]\loc)?, al := [email protected]\loc, al.offset <= l.offset, al.offset + al.length >= l.offset + l.length) { for (arg <- a.args, TreeSearchResult[&T<:Tree] r:treeFound(&T<:Tree _) := treeAt(t, l, arg)) { return r; } if (&T<:Tree tree := a) { return treeFound(tree); } } return treeNotFound(); } public default TreeSearchResult[&T<:Tree] treeAt(type[&T<:Tree] t, loc l, Tree root) = treeNotFound(); public bool sameType(label(_,Symbol s),Symbol t) = sameType(s,t); public bool sameType(Symbol s,label(_,Symbol t)) = sameType(s,t); public bool sameType(Symbol s,conditional(Symbol t,_)) = sameType(s,t); public bool sameType(conditional(Symbol s,_), Symbol t) = sameType(s,t); public bool sameType(Symbol s, s) = true; public default bool sameType(Symbol s, Symbol t) = false; @doc{Determine if the given type is a non-terminal type} public bool isNonTerminalType(Symbol::\sort(str _)) = true; public bool isNonTerminalType(Symbol::\lex(str _)) = true; public bool isNonTerminalType(Symbol::\layouts(str _)) = true; public bool isNonTerminalType(Symbol::\keywords(str _)) = true; public bool isNonTerminalType(Symbol::\parameterized-sort(str _, list[Symbol] _)) = true; public bool isNonTerminalType(Symbol::\parameterized-lex(str _, list[Symbol] _)) = true; public default bool isNonTerminalType(Symbol s) = false; //@doc{Determine the size of a concrete list} //int size(appl(regular(\iter(Symbol symbol)), list[Tree] args)) = size(args); //int size(appl(regular(\iter-star(Symbol symbol)), list[Tree] args)) = size(args); // //int size(appl(regular(\iter-seps(Symbol symbol, list[Symbol] separators)), list[Tree] args)) = size_with_seps(size(args), size(separators)); //int size(appl(regular(\iter-star-seps(Symbol symbol, list[Symbol] separators)), list[Tree] args)) = size_with_seps(size(args), size(separators)); // //int size(appl(prod(Symbol symbol, list[Symbol] symbols , attrs), list[Tree] args)) = // \label(str label, Symbol symbol1) := symbol && [Symbol itersym] := symbols // ? size(appl(prod(symbol1, symbols, attrs), args)) // : size(args[0]); // //default int size(Tree t) { // throw "Size of tree not defined for \"\""; //} // //private int size_with_seps(int len, int lenseps) = (len == 0) ? 0 : 1 + (len / (lenseps + 1));  Examples // the following definition syntax A = "a"; // would make the following [Test] succeed: test a() = parse(#A,"a") == appl(prod(sort("A"),[lit("a")],{}),[appl(prod(lit("a"),[\char-class([range(97,97)]),[char(97)])]); // you see that the defined non-terminal A ends up as the production for the outermost node. As the only child is the tree for recognizing the literal a, which is defined to be a single a from the character-class [ a ].  // when we use labels in the definitions, they also end up in the trees: // the following definition lexical A = myA:"a" B bLabel; lexical B = myB:"b"; // would make the following [Test] succeed: test a() = parse(#A,"ab") == appl(prod(label("myA",lex("A")),[lit("a"),sort("bLabel",lex("B"))],{}),[appl(prod(lit("a"),[\char-class([range(97,97)]),[char(97)]),appl(prod(label("myB", lex("B"),[lit("b")],{}),[appl(prod(lit("b"),[\char-class([range(98,98)]),[char(98)])]) ]); // here you see that the alternative name is a label around the first argument of prod while argument labels become labels in the list of children of a prod.   | [New Subconcept] | [Recompile Course] | [Warnings] Is this page unclear, or have you spotted an error? Please add a comment below and help us to improve it. For all other questions and remarks, visit ask.rascal-mpl.org.