Mimir:Draft2 Chapter9
Chapter TODOs
 quote
 content
 summary
 key terms
 exercises
Chapter 9: Grammars
"QUOTE HERE"  Author
Introduction to Grammars
Grammars are used to completely specify syntax rules of languages. The structure imposed by a grammar gives a systemic way of processing expressions.
A grammar consists of four main pieces identifiers, terminals, nonterminals, and production rules. A grammar must contain at least one rule and this rule must have an identifier. Identifiers distinguish rules from other rules, while also acting as a reference for the rule. They must begin with either an alpha character or an underscore. Following can be either an alpha, a numeric, or an underscore. Nonterminals are symbols that are used to write out the grammar. They can be expanded and are represented in < >. Terminals are symbols that appear in the language generated by the grammar. They are written exactly as they are, inside single quotes ' '. Production Rules are composed of of a nonterminal followed by ::= followed by a combination terminal &/or nonterminal.The main notation used to represent grammars is BackusNaur Form or BNF for short. This notation looks a lot like HTML and is used to describe the structure of a language.

::= <statements>; <code> < production rule in BNF
<code> ::= <statement>;
<statement> ::= <variable> <operator> <value>;
<variable> ::= <identifier>;
<op> ::= +  =   
<value> ::= <variable>  <number>
<identifier> ::= number ...
<number> ::= ...
<statement> ::= ;
<statement> ::= <if – clause>;
<code  block> ::= {<code>}
<if – statement> ::= if(<expression>) <statement>;
<if – statement> ::= if(<expression>) <block – statement>;
::= means "is defined as"
 means "or"
Identifiers
Identifiers distinguish rules from other rules, while also acting as a reference for the rule.
 <id> ::= <startchar><startchar><seqclass>
 <startchar> ::= <letter><symbol>
 <seq chars> ::= <anychar><anychar><anychar><seqchars>
 <anychar> ::= <letter><symbol><digit>
 <letter> ::= a...zA...Z
 <symbol> ::= ~$
 <digit> ::= 0...9
Terminals
Terminals are symbols that appear in the language generated by the grammar.They cannot be broken down further, e.g. a literal character or digit.
They are written exactly as they are, inside single quotes ' '.
NonTerminals
Nonterminals are symbols that are used to write out the grammar and that can be reduced further by the production rules.They are represented in < >.
Production rules
Production Rules are composed of of a nonterminal followed by ::= followed by a combination terminal &/or nonterminal.
Chomsky's Hierarchy
Type 0: Unrestricted Grammars
These are unrestricted grammars that encompasses all other formal languages. You can also refer to this type as a recursive enumerable language. Any language that is acceptable by a deterministic Turing Machine is considered to be recursive enumerable language. Where the rules given are allowed to at most perform one action for any given situation.
Type 0 grammars have rules of the form a > b where a and b are arbitrary strings over a vocabulary or V, a cannot be empty. a and b are just terminals {a, b} in this grammar. Their are no nonterminals {S, A, B} and no start symbols S.
As an example of how a basic recursive enumerable language is interpreted by a deterministic Turing Machine.
For string : w
Let be a L recursively enumerable language
and the Turing Machine that accepts it M
if w ∈ L then M halts in a final state
if w ∉ L then M halts in a nonfinal state or loops forever
Type 1: Context Sensitive Grammars
These are grammars that can be parsed by a linear bounded nondeterministic Turing machine (The linear bounded portion of this definition means that the tape is bounded by a constant times the length of the input). In contrast to the deterministic Turing Machine the linear bounded nondeterministic Turing machine may have a set of rules that allows for more than one action for a given situation. For example, a nondeterministic Turing machine may have both "If you are in state 2 and you see an 'A', change it to a 'B' and move left" and "If you are in state 2 and you see an 'A', change it to a 'C' and move right" in its rule set.
Type 1 grammars have rules of the form Alpha A Beta > Alpha L Beta Alpha L and Beta are strings of terminals and/or nonterminals. Alpha and Beta can be empty but L must NOT be empty. the rule S > E is allowed but only if it does not show up on the right side of the rule.
Example for Language L
L = {a^n b^n c^n  n >= 1}
let V = {S, B} and set
S → aSBc
abc
cB → Bc
bB → bb
Type 2: Context Free Grammars
With contextfree grammars we are always free to replace a nonterminal in a working string with the righthandside of any of its rules. It doesn't matter what's on either side of that nonterminal in the working string. In other words, the context of the nonterminal doesn't matter. These are defined by the rules Alpha > Lambda with Alpha being a nonterminal and Lambda a string of terminals and nonterminals. These are languages taht can be recognized by a nondeterministic pushdown automation where a stack is formed and the parsing starts from the top and continues down the stack.
Example grammar that generates strings representing arithmetic expressions with the four operators +, , *, /
<expr> > number
<expr> > ( <expr> )
<expr> > <expr> + <expr>
<expr> > <expr>  <expr>
<expr> > <expr> * <expr>
<expr> > <expr> / <expr>
 The nonterminal symbols in this example are the <expr>, which in this case is also the start symbol .The terminal symbols are {+,,*,/,(,),number}
 The first rule (or production) states that an <expr> can be rewritten as (or replaced by) a number. In other words, a number is a valid expression.
 The second rule says that an <expr> in parentheses is also an <expr>. Note that this rule defines an expression in terms of expressions, an example of the use of recursion in the definition of contextfree grammars.
 The remaining rules say that the sum, difference, product, or division of two <expr> is also an expression.
With the above grammar we can run a parse tree for the following expression x + y * z
<expr> <expr>
 
/  \ /  \
<expr>*<expr> <expr> + <expr>
/ \ / \
/  \ z x /  \
<expr> + <expr> <expr> * <expr>
   
x y y z
Type 3: Regular Grammars
These languages are all languages that can be decided by a finite state automaton also known as a state machine. These are known as Regular languages and are commonly utilized to define search patterns and the lexical structure of programming languages.
Regular Expressions
Regular Expressions (usually shortened to regex) is a sequence of characters that form a search pattern. This provides a common way of searching through strings. Regex's were created in the 1950s by an American mathematician named Stephen Kleene. In type three of Chomsky's Hierarchy the regular grammars use a similar system of terminals and nonterminals to Regex's.
Regular Expression Syntax
Some syntax rules for regular expressions are shown below. Note that this list does not include all of the regular expression syntax rules because there are many of them. The great thing about these rules however is that you can combine rules to create some very clever string parsing techniques.
 . matches any single character
 [ ] matches a single character that is inside the brackets
 [^ ] matches a single character that is not contained within the brackets
 ^ matches the starting position within the string
 $ matches the ending position or the position just before a stringending new line
Below is a list of metacharacters that can be used to combine some of the previous syntax rules to make it more flexible.
 ? matches the preceding element zero or one time
 + matches the preceding element one or more times
  the choice operator matches the expression before or after the operator. Essentially a boolean OR
Some examples of these rules in actions are
 ab?c matches only "ac" or "abc"
 [hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at"
 catdog matches "cat" or "dog"
 .at matches any threecharacter string ending with "at", including "hat", "cat", and "bat"
 ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line
Summary
A quick summary of the chapter should go here
Key Terms
A list of key terms should go here. This should be created using some sort of glossary type plugin.
Problem Sets
A list of practice problems