Regular Languages

2 min read Updated Fri Apr 24 2026 03:19:45 GMT+0000 (Coordinated Universal Time)

Regular languages are the simplest class of languages in the Chomsky hierarchy.

They are built using simple operations on basic languages.

Basic Languages

For a given alphabet Σ\Sigma, its basic languages are:

  • {a}\set{a} for each symbol aΣa \in Σ
  • \varnothing (empty language)
  • {λ}\set{\lambda} (language containing the null string)

Construction of Regular Languages

Regular languages are obtained from basic languages using:

  • Union
  • Concatenation
  • Kleene star

Using only concatenation gives single-string languages. Union and * allow infinite languages.

Regular Expressions

A regular expression is a symbolic representation of a regular language. Every regular language has a regular expression.

A regular expression describes the typical form of strings in the language.

Example: 1101*10 means any number of 11s followed by 1010.

Notation Rules

  • { } are removed or replaced by ( )
  • Union \cup is written as |
  • Concatenation is written by juxtaposition

Example: 01{0} ∪ {1}010|1

Recursive Definition of Regular Expressions

Let Σ\Sigma be an alphabet.

Regular expressions are defined as follows:

  1. \varnothing is a regular expression and denotes \varnothing
  2. λ\lambda is a regular expression and denotes {λ}\set{\lambda}
  3. For each aΣa \in \Sigma, aa is a regular expression denoting {a}\set{a}
  4. If pp and qq are regular expressions:
    • (pq)(p | q) denotes union
    • (pq)(pq) denotes concatenation
    • (p)(p*) denotes Kleene star

Only expressions formed using these rules are valid.

Operator Precedence

To reduce parentheses:

  • * has highest precedence
  • Concatenation has next highest
  • | has lowest precedence

Examples:

  • a((b)c)a|((b*)c)abca|b*c
  • ((0(1))0)((0(1*))|0)01001*|0

Equivalence of Regular Expressions

2 regular expressions are equal iff they denote the same language.

Examples:

  • 1(1Λ)=11*(1|Λ) = 1*
  • 11=11*1* = 1*
  • 01(01)0* | 1* ≠ (0|1)*
  • (01)=(01)(0*1*)* = (0|1)*

Recognizing a Language

Language recognition means deciding whether a given string belongs to a language.

Recognition is done by reading input left to right in one pass.

The machine must remember limited information.