Slightly More Advanced Linux Usage

Advanced Topics: Stream Editing

For repeated edits within a file, or on a batch of files, stream-editors are a powerful tool. The main stream editor on most Linux systems is sed. It is particularly useful for systematic replaces. Being programmable, it can be used for far more powerful tasks. See the Sed section of the Advanced Bash Scripting Guide for details.

ImportantPipes
 

Before using a stream-line editor, some general remarks about one of the most powerful concepts in Linux systems: Pipes. Pipes allow you to redirect the output of one command to the input of another command. This way you can compose several commands on the command line to perform a more complex task. As an example, we want to count the number of files in our home directory. First, use cd ~ to go to your home directory. Now, you can list all your files in your directory, line-by-line, using the command ls -l. Another Linux command, wc, counts the number of bytes, words and lines in a text file. With option -l it only counts the number of lines. Composing both commands using the | symbol gives the number of files in the directory plus 1 (because the first line gives the total). In summary, do:


$ cd
$ ls -l | wc -l

Returning to our home page example, we now want to add a sub-title as a slogan to our web page. Examining the index.html file, you will see that the "Website Title" text is a level-1 heading (<h1>). In the next line, there is a comment mentioning the slogan text. We want to replace this comment-line with a level-2 heading, containing our slogan. However, there are already level-2 headings (<h2>) in the HTML code, and we need to turn each of them into a level-3 heading. Rather than doing this manually in an editor, we use a stream-line editor to do this in one go. To do this, use sed to replace every instance of an H2-element to an H3-element. The notation to do such a substitution in sed is: s/h2>/h3>/g. It performs a substitution of the first string by the second string, with / being delimiters, and the g suffix indicating a global, rather than a one-off, substitution. In summary, do:


$ cat index.html | sed 's/h2>/h3>/g' > index_new.html
$ less index_new.html
$ diff index.html index_new.html
$ cp -b index_new.html index.html

Note that the > notation redirects the output of the sed command to a file, here index_new.html. You can check that the substitution worked as expected, by displaying the resulting file with the less command (q to exit). Alternatively, you can of course view the file in an editor. Another useful tool is diff, which displays the differences in two text-files. In this case, you'll see that all lines with an <h2> have been replaced by an <h3>. As with all commands, use the man page (e.g. man diff) for details on the command and its output. Once you are satisfied that the substitution worked as expected, copy the new file to the old one, using the cp. In this case we don't want to overwrite the old file. Therefore, we specify the option -b to back-up any exisiting file, rather than overwriting it (the back-up will have suffix ^).

Now, start an editor (e.g. type emacs from the command line or select gedit from the System -> Office -> gedit menu) and load index.html so that you can modify it. Search for the line that contains


 Your website slogan goes here 
Replace the entire line by your own text, wrapped into <h2> </h2> to make it a level-2 heading, such as

<h2> This is my slogan </h2>
Save the modified file and exit the browser.

Now, go back to your web browser, the firefox window, and reload the page, to see how your web page has changed. This concludes the basic steps of this tutorial. You can continue to edit your web page, or proceed with the next sections, that cover other useful Linux commands.

Advanced Topics: Other Uses of Pipes

Stream editing is one of the more advanced usages of basic Linux commands. Other useful commands for manipulating files are:

As another example of pipes, you can use the following command to display all files in the current directory, sort them by date (with most recent last) and display only the 10 most recent files:

$ ls -ltr | tail -10
In fact, this is such a useful command that it makes sense to define a short-cut, or alias, for it

$ alias l='ls --format=long --no-group --sort=time --reverse --color | tail -"${LL:-10}"'
Put this alias definition in your .profile or .bashrc file in your home directory, to make sure that you only have to type l to display the 10 most recent files in the current directory.

NoteExercise
 

Define an alias that lists all your files in long format, sorted by file size (largest last). Test this alias on your command-line and put it into your .bashrc file so that you can use it in every session.

Regular expressions

In this section, assume that you have the source code for a project, that uses CVS as version control system. To do the examples in this section, first extract all files in the tar-file below, containing all sources. The command tar deals with (un-)packing files and directories, similar to zip on Windows systems. The options xfz specify to extract a compressed file, given as argument.


$ # unpack this (compressed) archive  
$ tar xfz /home/msc/public/LinuxIntro/ghc-6.12-eden-gum.tar.gz
$ # go into the sub-directory 'rts' for the runtime-system
$ cd ghc-6.12-eden-gum/rts/

NoteRunning the Linux Intro on the MACS VM image
 

If you are working in the MACS Linux VM image, you first need to download the tar file above. Use this sequence of commands instead of the sequence above:


 $ # download the tar file from a departmental server
 $ scp -q -r `whoami`@jove.macs.hw.ac.uk:/home/msc/public/LinuxIntro/ghc-6.12-eden-gum.tar.gz .
 $ # unpack this (compressed) archive  
 $ tar xfz ghc-6.12-eden-gum.tar.gz
 $ # go into the sub-directory 'rts' for the runtime-system
 $ cd ghc-6.12-eden-gum/rts/
 
The project has several sub-directories, and contains a mix of C source and header files. We want to list all such files, i.e. all files ending with either a c or h. We can do this easily by using regular expressions:

$ ls *.[ch]

Regular expressions are a powerful concept to enumerate an entire class of words or filenames. The basic constructs are:

Several tools use regular expressions to specify filenames or text. For example the egrep uses regular expression to define text to search for in a file. For example, if you want to search for all instances of slogan or Slogan in your index.html file, you can type in the following command:


 $ egrep [Ss]logan index.html

CautionRegular expression notations
 

There are different flavours of regular expression notation. The above notation is used for filename matching in the shell. Other notations are more powerful, for example Perl regular expression, or more easily readable, for example POSIX regular expressions. See these slides for a summary.

We now use regular expressions, to list only a subset of all files (remember: lines starting with an # symbol are comments and you only have to cut-and-paste the commands after an $ symbol):


# Regular Expressions

# go to a checkout of ghc/rts
# explain different versions/aliases for ls

# list all C source or header files
$ ls *.[ch]
# search for Flags in these files
$ egrep Flags *.[ch]
# colourise
$ egrep -n --color Flags *.[ch]
# everything that accesses Flags, ie. a '.' afterwards
$ egrep -n --color "(Par|Gc)Flags[.]" *.[ch]
# several dots
$ egrep -n --color "(Par|Gc)Flags[.]{3,}" *.[ch]
# Flags but not with an 'n' after the '.'
$ egrep -n --color "(Par|Gc)Flags[.][^n]" *.[ch]

Another powerful command for finding files is find. It requires as first argument the directory where to start the search. Then, a sequence of options is given, that determines what to search for. In our example, we specify the filenames we are interested in, using the same regular expression as above. Other criteria in the search can be modification time (as in the -mtime example below), permissions, file type etc. See the man and info pages for find for more detail. The -print option specifies to print each successful match. The -exec option specifies a command that is executed for each successful match. The magic string \{\} is replaced by each matched filename in turn, and the string \; is needed to terminate the command. In our example, we want to search for the words ParFlags. and GcFlags. in all C source or header files, using egrep, which also uses regular expressions. In summary, do:


# recursively in all dirs
$ find . -name "*.[ch]" -print
# do something recursively in all dirs
$ find . -name "*.[ch]" -print -exec egrep -n --color "(Par|Gc)Flags[.][^n]" \{\} \;
# files modified within the last 24 hrs
$ find . -mtime -1 -print

NoteThe multi-purpose find command
 

Find is far more powerful than just a tool for searching files: it does a traversal of an entire directory structure. It is worth studying the man page for find to get an idea what you can achieve with this command.

Another example below shows, how to find large files in a subdirectory, using pipes and a little bit of shell programming, using shell variables such as $z. An introduction to shell programming is given in the Section called Basic Shell Scripting Using bash. The du -s *.[ch] command lists all C source and header files in all subdirectory, together with the file size. The sort command, sorts the output by the first column, numerically (-n). You can get a list of all files, sorted by size, with largest file last, using this command:


 $ du -s *.[ch] | sort -k 1 -n 
To continue with the example, you can pipe this command into a final while loop iterates over all lines, binding the filesize to variable x and the filename to variable nam, and calculating the total files size of all matched files in the loop body. This total size is printed at the end. The brackets group the final 2 commands together. In summary, do:

# disk usage
$ du -s *.[ch] | sort -k 1 -n | (while read x nam ; do z=$[ $z + $x ] ; done; echo "Total size of all C files in kB: $z")

NoteMore on regular expressions:
 

Check out this section in the Advanced Bash Scripting Guide for more information on regular expressions.

Security

One of the most basic aspects of security on a Linux system, are the permissions you set on your files and directories. These allow you to control who can access your files and who can enter your directories. The set of all users on the system are partitioned into: the owner or user of the file (u), the members of the primary group the user belongs to (g), and all other users (o). For each of these 3 partitions, you can give read (r), write (w) and (x) execute permissions, permitting the corresponding operations. For directories, a user needs execute permission to enter it. For example, to allow users in your group to enter the directory containing your home page, but to prevent all except the owner from modifying, executing or even looking into the files themselves you can do the following (note that + indicates adding a permission, - removing a permission and = setting exactly that permission):


$ cd ~/public_html
$ chmod g+x .
$ chmod go-wx *

To print your username, as well as the groups you belong to, type id. To just print the groups, type groups. To just print your username type whoami.

At this point, take a moment to check the permissions on your home directory and on the files in your home directory. These should be set so that only you can access the home directory. Check this by typing in the following command, and you should see a response like this (you should see your username in place of xyz12 below):


$ cd
$ ls -lad .
drwx--s--x 16 xyz12 xyz12 4096 Sep 12 15:05 .
The first block of characters summarises the permissions for this directory. The leading d indicates that this is a directory. The next 3 characters show the permissions of the owner of this directory, in this case read (r), write (w) and (x) execute permissions are granted, because you, as the owner, should be allowed to enter the directory, see the files and modify them. The next 3 characters show the permissions of the group the user is in, in this case group members are allowed to enter the directory, but they won't see the contents, shown by 2 - characters (no read, no write) and by an s character instead of an x in the third (execute) position. The last 3 characters show the permissions of all other users, in this case only execute permissions are granted (x), so other users can enter the directory but won't be able to examine or change the contents. This setting, by the way, is needed so that applications such as the apache web server can enter your home directory and pick up the files in your web-directory, to display the contents of your web pages.

Warning

If you see different permissions for your home directory, in particular if you see read or write permissions for users other than yourself, you should change the permissions back to the original settings. You can do so with this one-liner:


$ chmod 711 .      
Here 711 is an octal number that encodes all permissions in one go. The octal value expands to the binary value 111001001, which in turn a bitmask for the read, write and execute permission of the user, group and others. Therefore, you read 711 as the permissions rwx--x--x as displayed by the ls -l command.

Most Linux distributions extend this scheme of base permissions: named users and named groups can be created. This lifts the restriction of associating only 1 user (owner) and 1 group to a file or folder. For each named user and named group, an own set of permissions can be defined. The command getfacl displays the current set of permissions for a file and the command setfacl modifies these. For more detail see the section on Access Control Lists (ACLs) on these slides (4up version).

One concrete example is the case where you want the web server user apache to be able to write files. You can grant write access on a particular file filename to the apache user like this:


$ setfacl -m u:apache:rwx filename 
You can get details on named users and groups for the current directory, together with a more verbose explanation of base permissions by typing:

$ getfacl .
# file: .
# owner: xyz12
# group: msc
# flags: -s-
user:apache:rwx
user::rwx
group::--x
other::--x
Note, that additionally to the user, group and other settings, this command lists all named users and named groups, in this case the user apache, for whom we have set read, write and execute permissions in the command above. If a file or directory has permissions for named users attached to it, this will show up as a + in an ls -l listing.

Since some of these commands need special permission, you best try them on your own laptop, referring to the slides above for explanation.

NoteMore on file security:
 

For more information on file permissions and ACLs, see Chapter four in Sobell's textbook.

Your environment

You can customise your Linux environment, by modifying the files .profile and .bashrc in your home directory. The most common customisation is to define an alias as discussed above. Or you can set your prompt, which will replace the $ symbol in your examples.

One important example of tuning your environment, is to tune the list of directories that are searched when you type in a command. By default, only system directories are searched. You can check this using the which command:


$ which ls
/bin/ls
You can add a subdirectory bin in your home directory (represented by the symbol ~) by typing

$ export PATH=$PATH:~/bin
Now, if you have a shell script hello.sh, as shown in the Section called Hello world, in your bin directory, you can call it just like this

$ hello.sh
The screenshot below shows modifications made at the end of a new user's .bashrc file:

To test that the changes are in effect, start a new shell by typing bash and then execute one of the new commands, such as his, or execute a file in your ~/bin directory, such as hello.sh, like this:

History and command line editing

The shell keeps a history of all executed commands, which you can display by typing history. On the command-line, use UP-ARROW or CTRL-P to show the previous command in the history, and DOWN-ARROW or CTRL-N to show the next command in the history. You can edit the line shown, to execute a slightly modified command. Use CTRL-A to jump to the beginning, and CTRL-E to jump to the end of the line, and use backspace for deleting characters. For filename completion, type the first letters of the file you want to refer to, and then use TAB. All keybindings available on the command line are provided by the readline library (see info readline for more detail: info rluserman).

Other useful commands

Some more useful commands are:

Check their man pages and try them out.

Some Exercises

NoteAdvanced Exercises
 

Caution

The exercise below is part of an interactive lab session on Unix shell scripting by Lewis Sharpe, Heriot-Watt University. Follow this link for more information.

As an advanced exercise, modify your prompt on the command line, so that it shows the user-name, the host-name, as well ass the available disk space and disk utilisation on the current partition. Use colour background to high-light the last 2 pieces of information. Once implemented correctly, you should see a prompt as in the picture below.

First, test your solution by directly changing prompt on the command-line using


$ PS1=...
This will display the information, e.g. disk utilisation, at the time when executing this command. In order to always get the most recent information, e.g. disk utilisation, generate a file df-prompt.sh and define a function this file, i.e. use this structure

#!/bin/bash
function prompt_command {
  ...
  PS1=...
}
You need to fill in the ... to look up free disk space information, extract from this info the available disk space and the disk utilisation (just for the current partition), and use this info in the line that starts with PS1=.... Hint: Check the Section called Other useful commands for some useful commands.