I ran across the Wikipedia article on this subject today. While I found it fascinating and enthralling, it also took me about half an hour to properly decode the notation being used, so here I have interpreted it for the layman.
What is a formal grammar?
Begin with symbols. A symbol can be anything. One common example is an upper- or lower-case letter of the Roman alphabet.
If we allow other special characters to be symbols, then we could also have:
But our symbols could also be English words. If this were the case, then our symbols would be things like:
A symbol in isolation does not need to mean anything.
A set of symbols is called an alphabet. An alphabet must be finite.
A string is a finite series of symbols. All the symbols must be chosen from the same alphabet. If we use Roman letters as our alphabet, then some example strings would be:
If our alphabet contained special characters, then some more example strings might be:
If, instead, our alphabet consisted of English words, then a string (or sentence) would simply be a series of words, such as
“bucket July embarrass whatsoever isn’t a the”
Although every string must be finite, notice how there are infinitely many possible finite strings.
“bucket bucket bucket”
“bucket bucket bucket bucket”
For every alphabet, there is also a single empty string:
Notice, again, how strings have no meaning ascribed to them.
Given an alphabet, a formal languageL is any set of finite strings over that alphabet.
- A formal language may contain a finite number of strings.
- The smallest possible language is the empty language, containing no strings at all (this is not to be confused with the language containing only the empty string).
- Or, a language may contain infinitely many strings.
- The largest possible language is the one containing every possible finite string.
Strings which are in L can be considered “correct”. Strings which are not in L can be considered “incorrect”. For example, if our alphabet was the set of all English words, then our language could be the set of all grammatically correct English sentences:
“The cat sat on the mat”
“Colourless green ideas sleep furiously”
For a language L to be well-defined, we only need one thing: a process which generates all of the strings in L.
A formal grammar is a finite set of rules for generating “grammatically correct” sentences.
A formal grammar works by starting with a single “unfinished sentence” or “root”. Then, different rules are applied in turn, modifying the sentence each time, until the sentence is deemed to be “finished” and the process terminates.
(By definition,) the set of sentences which can be generated by following the rules of a formal grammar comprise a formal language. The sentences which cannot be generated in this way do not fall in the formal language.
Once again, note that a grammar ascribes no meaning to the language it generates.