Disambiguation¶
The semantics of SDF3 can be seen as two-staged. First, the grammar generates all possible derivations. Second, the disambiguation constructs remove a number of derivations that are not valid. Note that SDF3 actually performs some disambiguation both when generating the parse table as during parsing.
Rejections¶
Rejections filter derivations.
The semantics of a rejection is that the set of valid derivations for the left-hand side of the production will not contain the construction described on the right-hand side.
In other words, the language defined by the sort on the left-hand side has become smaller, removing all the constructions generated by the rule on the right-hand side.
Disambiguation by reject
occurs at parse time (mostly).
A rule can be marked as rejected by using the attribute {reject}
after the rule:
$Sort = ... {reject}
The {reject}
attribute works well for lexical rejections, especially keyword reservation in the form of productions like:
ID = "keyword" {reject}
Preferences¶
The preferences mechanism is another disambiguation filter that provides a post parse filter to parse forests.
The attributes prefer
and avoid
are the only disambiguation constructs that compare alternative derivations after parsing.
Warning
prefer
and avoid
are deprecated and will be removed in a future version of Spoofax.
The following definition assumes that derivations are represented using parse forests with "packaged ambiguity nodes". This means that whenever in a derivation there is a choice for several sub-derivations, at that point a special choice node (ambiguity constructor) is placed with all alternatives as children. We assume here that the ambiguity constructor is always placed at the location where a choice is needed, and not higher (i.e. a minimal parse forest representation). The preference mechanism compares the top nodes of each alternative:
- All alternative derivations that have
avoid
at the top node will be removed, but only if other alternatives derivations are there that do not haveavoid
at the top node. - If there are derivations that have
prefer
at the top node, all other derivations that do not haveprefer
at the top node will be removed.
The preference attribute can be used to handle the case when two productions can parse the same input. Here is an example:
Exp.FunctionApp = <<Expr> <Expr*>>
Exp.Constructor = <<ID> <Expr>> {prefer}
Priorities¶
Priorities are one of SDF3's most often used disambiguation constructs. A priority section defines the relative priorities between productions. Priorities are a powerful disambiguation construct because it occurs at parse generation time. The idea behind the semantics of priorities is that productions with a higher priority "bind stronger" than productions with a lower priority. The essence of the priority disambiguation construct is that certain parse trees are removed from the "forest" (the set of all possible parse trees that can be derived from a segment of code). The basic priority syntax looks like this:
context-free priorities
$ProductionRef > $ProductionRef
Where $ProductionRef>
can either be $Sort.$Constructor
or the entire production itself.
Several priorities in a priority grammar are separated by commas.
If more productions have the same priority they may be grouped between curly braces on each side of the >
sign.
context-free priorities
{$ProductionRef $ProductionRef}
> $ProductionRef,
$ProductionRef
> $ProductionRef
By default, the priority relation is automatically transitively closed (i.e. if A > B and B > C then A > C).
To specify a non-transitive priority relation it is necessary to include a dot before the > sign (.>
).
SDF3 provides safe disambiguation, meaning that priority relations only remove ambiguous derivations. Furthermore, SDF3 also allows tree filtering by means of indexed priorities such as:
context-free priorities
$ProductionRef $Index > $ProductionRef
where the symbol at position $Index
(starting with 0) in the first production should not derive the second production.
An example defining priorities for the addition, subtraction and multiplication operators is listed below. Because addition and subtraction have the same priority, the are grouped together between brackets.
context-free priorities
{Exp.Times} >
{Exp.Plus Exp.Minus}
Associativity¶
Like with priorities, the essence of the associativity attribute is that certain parse trees are removed from the "forest".
- The
left
associativity attribute on a production P filters all occurrences of P as a direct child of P in the right-most argument. This implies thatleft
is only effective on productions that are recursive on the right (as inA B C -> C
). - The
right
associativity attribute on a production P filters all occurrences of P as a direct child of P in the left-most argument. This implies thatright
is only effective on productions that are recursive on the left ( as inC A B -> C
). - The
non-assoc
associativity attribute on a production P filters all occurrences of P as a direct child of P in any argument. This implement thatnon-assoc
is only effective if a production is indeed recursive (as inA C B -> C
). - The
assoc
attribute means the same asleft
Associativity declarations occur in two places in SDF3. The first is as production attributes. The second is as associativity declarations in priority groups.
An example on how to mention associativity as a production attribute is given below:
Exp.Plus = <<Exp> + <Exp>> {left}
In priority groups, the associativity has the same semantics as the associativity attributes, except that the filter refers to more nested productions instead of a recursive nesting of one production. The group associativity attribute works pairwise and commutative on all combinations of productions in the group. If there is only one element in the group the attribute is reflexive, otherwise it is not reflexive.
context-free priorities
{left: Exp.Times} >
{left: Exp.Plus Exp.Minus}
Restrictions¶
The notion of restrictions enables the formulation of lexical disambiguation strategies. Examples are "shift before reduce" and "longest match". A restriction filters applications of productions for certain non-terminals if the following character (lookahead) is in a certain class. The result is that specific symbols may not be followed by a character from a given character class. A lookahead may consist of more than one character class (multiple lookahead). Restrictions come in two flavors:
- lexical restrictions that apply to lexical non-terminals
- context-free restrictions that apply to context-free non-terminals.
The general form of a restriction is:
$Symbol+ -/- $Lookahead
The semantics of a restriction is to remove all derivations that produce a certain $Symbol
.
The condition for this removal is that the derivation tree for that symbol is followed immediately by something that matches the lookahead declaration.
Note that to be able to check this condition, one must look past derivations that produce the empty language, until the characters to the right of the filtered symbol are found.
Also, for finding multiple lookahead matches, one must ignore nullable sub-trees that may occur in the middle of the matched lookahead.
In case of lexical restrictions $Symbol
may be either a literal or sort.
In case of context-free restrictions only a sort or symbol is allowed.
The restriction operator -/-
should be read as may not be followed by.
Before the restriction operator -/-
a list of symbols is given for which the restriction holds.
As an example, the following restriction rule implements the “longest match” policy: an identifier can not be followed by an alpha-numeric character.
ID -/- [a-zA-Z0-9\_]
Created: October 17, 2024