Bash shell, a quick overview

A quick overview of how the Bash shell processes commands

The shell is the text-based interface that allows us to interact with our computer. A typical Bash shell, which exists in almost all Linux systems reads the command entered by the user, interprets it and sends the appropriate system calls to the operating system to execute it.

Contents

Shell modes

The Bash shell can operate in a few different modes:

  1. Interactive: This is the default mode when using a terminal. The terminal displays a prompt and waits for the user to type a command.

  2. Non-Interactive: In this mode, the shell reads commands from a script or a file instead of a terminal. This is typically used when executing shell scripts. The user can also use the bash -c <COMMAND> to run commands in a non-interactive mode.

  3. Login: When we log in directly into a terminal (via SSH, or by logging to the local system in the console), the shell starts a session in a "login" mode. Login shells read certain initialization files such as .bash_profile or .profile.

  4. Non-Login: When already logged in to the system, typically in a graphical interface, and open up a new terminal, the terminal starts in a "non-login" mode. Non-login shells read the .bashrc file.

The main difference between login and non-login shells is that login shells set up the user’s environment from scratch, while non-login shells inherit the environment from their parent process.

How Bash processes commands

Here is a generic overview of how the Bash shell processes commands:

  1. User input: The user types the command and presses enter.
  2. Tokenisation: The shell breaks the command into tokens, typically separated by whitespace.
  3. Lexical analysis and parsing: The shell interprets the tokens according to its grammar rules. The first token is typically the command and the rest can be arguments, options, piped commands, etc. The shell will recognise keywords, operators and other elements in the user's input. Bash uses for this a yacc-generated parser.
  4. Expansion: This step involves brace or tilde expansion, variable expansion, command substitutions, expansion of wildcard characters, arithmetic expansions.
  5. Command execution: The shell executes the command, either by creating a new process via fork(), or by directly executing the command, if it is a shell build-in command.
  6. Reporting and exiting: After the command is executed, the shell prints any output or error messages produced by the command and exits.

Processing user input

When we type a command or execute a file, Bash reads characters from the terminal or the input file line by line and parses them. In interactive shells, Bash uses the readline library to allow a set of features such as moving around the command, command line editing, tab completion, search in history, and more.

The readline library

The readline library allows interactive programs with a command line to provide in-line editing and history capabilities. It was created to allow Bash to conform with an optional POSIX requirement for an in-line editor. Readline initially was a single file in the Bash source code, that soon became an independent project, still closely associated with Bash.

Breaking into tokens

The shell breaks the input line into words, operators, reserved words or comments, using whitespace characters (space, tab or newline) as delimiters. The first token is generally considered the command to run and the following tokens, if any, are command arguments or options.

  • A word in this context is any sequence of characters that does not contain any special character or whitespace character (|, &, ;, (, ), <, >, space tab newline). Characters enclosed in quotes, single (') or double ("), are treated as a single token, even if they contain whitespace characters. Similarly, an escaped whitespace character (escaped by a backslash \) is not treated as a word break.

  • Special characters, such as |, >, <, ;, (, ), or &, are considered operators.

  • Reserved words are the bash keywords (for, if, case, while, etc).
  • Comments are lines that start with the # character and end in a newline.

Aliases, built-in and external commands

Once Bash has determined the name of the command, it searches for the executable to run. The order of the lookup is generally:

  1. Bash hash table: Bash keeps a hash table with the paths of the executables it has already encountered in the current session. If the command is already stored in the hash table, Bash uses the path from the hash table to execute the commands.
  2. Aliases: If the shell has access to user-defined aliases, it checks if the command matches an existing alias, and uses the alias definition.
  3. Shell functions: If the shell has access to user-defined functions, or if the command is run by a file, Bash looks if there is a function with the same name as the command and executes it.
  4. Built-in commands: If the command entered is a Bash built-in command, the shell executes it directly in the same process.
  5. $PATH lookup: Finally, Bash searches for the executable in the directories specified by the $PATH environment variable, in the order specified by $PATH. If it finds an executable that matches the command name, it forks the process and runs the command.

We can check if a command is an alias or function or built-in with the type command:

$ which ls
/bin/ls
$ type ls
ls is aliased to `ls --color=auto'
$ type cd
cd is a shell builtin
$ type cat
cat is /bin/cat
$ type which
which is hashed (/bin/which)