For repeated edits within a file, or on a batch of files, stream-editors are a powerful tool. The main stream editor on most Linux systems is sed. It is particularly useful for systematic replaces. Being programmable, it can be used for far more powerful tasks. See the Sed section of the Advanced Bash Scripting Guide for details.
![]() | Pipes | |
---|---|---|
Before using a stream-line editor, some general remarks about one of the most powerful concepts in Linux systems: Pipes. Pipes allow you to redirect the output of one command to the input of another command. This way you can compose several commands on the command line to perform a more complex task. As an example, we want to count the number of files in our home directory. First, use cd ~ to go to your home directory. Now, you can list all your files in your directory, line-by-line, using the command ls -l. Another Linux command, wc, counts the number of bytes, words and lines in a text file. With option -l it only counts the number of lines. Composing both commands using the | symbol gives the number of files in the directory plus 1 (because the first line gives the total). In summary, do:
|
Returning to our home page example, we now want to add a sub-title as a slogan to our web page. Examining the index.html file, you will see that the "Website Title" text is a level-1 heading (<h1>). In the next line, there is a comment mentioning the slogan text. We want to replace this comment-line with a level-2 heading, containing our slogan. However, there are already level-2 headings (<h2>) in the HTML code, and we need to turn each of them into a level-3 heading. Rather than doing this manually in an editor, we use a stream-line editor to do this in one go. To do this, use sed to replace every instance of an H2-element to an H3-element. The notation to do such a substitution in sed is: s/h2>/h3>/g. It performs a substitution of the first string by the second string, with / being delimiters, and the g suffix indicating a global, rather than a one-off, substitution. In summary, do:
$ cat index.html | sed 's/h2>/h3>/g' > index_new.html $ less index_new.html $ diff index.html index_new.html $ cp -b index_new.html index.html |
Note that the > notation redirects the output of the sed command to a file, here index_new.html. You can check that the substitution worked as expected, by displaying the resulting file with the less command (q to exit). Alternatively, you can of course view the file in an editor. Another useful tool is diff, which displays the differences in two text-files. In this case, you'll see that all lines with an <h2> have been replaced by an <h3>. As with all commands, use the man page (e.g. man diff) for details on the command and its output. Once you are satisfied that the substitution worked as expected, copy the new file to the old one, using the cp. In this case we don't want to overwrite the old file. Therefore, we specify the option -b to back-up any exisiting file, rather than overwriting it (the back-up will have suffix ^).
Now, start an editor (e.g. type emacs from the command line or select gedit from the System -> Office -> gedit menu) and load index.html so that you can modify it. Search for the line that contains
Your website slogan goes here |
<h2> This is my slogan </h2> |
Now, go back to your web browser, the firefox window, and reload the page, to see how your web page has changed. This concludes the basic steps of this tutorial. You can continue to edit your web page, or proceed with the next sections, that cover other useful Linux commands.
Stream editing is one of the more advanced usages of basic Linux commands. Other useful commands for manipulating files are:
gawk (a column-based stream editor),
head and tail for extracting the start or the end of the file.
$ ls -ltr | tail -10 |
$ alias l='ls --format=long --no-group --sort=time --reverse --color | tail -"${LL:-10}"' |
![]() | Exercise |
---|---|
Define an alias that lists all your files in long format, sorted by file size (largest last). Test this alias on your command-line and put it into your .bashrc file so that you can use it in every session. |
In this section, assume that you have the source code for a project, that uses CVS as version control system. To do the examples in this section, first extract all files in the tar-file below, containing all sources. The command tar deals with (un-)packing files and directories, similar to zip on Windows systems. The options xfz specify to extract a compressed file, given as argument.
$ # unpack this (compressed) archive $ tar xfz /home/msc/public/LinuxIntro/ghc-6.12-eden-gum.tar.gz $ # go into the sub-directory 'rts' for the runtime-system $ cd ghc-6.12-eden-gum/rts/ |
![]() | Running the Linux Intro on the MACS VM image | |
---|---|---|
If you are working in the MACS Linux VM image, you first need to download the tar file above. Use this sequence of commands instead of the sequence above:
If the scp command gives an error message, you can download this ghc...tar.gz file in a web browser. |
$ ls *.[ch] |
Regular expressions are a powerful concept to enumerate an entire class of words or filenames. The basic constructs are:
. The dot matches any single character (e.g. ab. matches aba, abb, abc, etc).
[ ] A bracket expression matches a single character contained within the bracket (e.g. [abc] matches a, b or c).
[a-z] specifies a range which matches any lowercase letter from a to z (e.g. [abcx-z] matches a, b, c, x, y and z).
[^ ] negation of [ ], Matches a single character not contained in bracket. (e.g. [^abc] matches any character other than a, b or c).
^ matches the starting position of a string.
$ matches the ending position of a string.
* matches the previous element zero or more times (e.g. abc*d matches abd, abcd, abccd, etc).
Several tools use regular expressions to specify filenames or text. For example the egrep uses regular expression to define text to search for in a file. For example, if you want to search for all instances of slogan or Slogan in your index.html file, you can type in the following command:
$ egrep [Ss]logan index.html |
![]() | Regular expression notations |
---|---|
There are different flavours of regular expression notation. The above notation is used for filename matching in the shell. Other notations are more powerful, for example Perl regular expression, or more easily readable, for example POSIX regular expressions. See these slides for a summary. |
We now use regular expressions, to list only a subset of all files (remember: lines starting with an # symbol are comments and you only have to cut-and-paste the commands after an $ symbol):
# Regular Expressions # go to a checkout of ghc/rts # explain different versions/aliases for ls # list all C source or header files $ ls *.[ch] # search for Flags in these files $ egrep Flags *.[ch] # colourise $ egrep -n --color Flags *.[ch] # everything that accesses Flags, ie. a '.' afterwards $ egrep -n --color "(Par|Gc)Flags[.]" *.[ch] # several dots $ egrep -n --color "(Par|Gc)Flags[.]{3,}" *.[ch] # Flags but not with an 'n' after the '.' $ egrep -n --color "(Par|Gc)Flags[.][^n]" *.[ch] |
Another powerful command for finding files is find. It requires as first argument the directory where to start the search. Then, a sequence of options is given, that determines what to search for. In our example, we specify the filenames we are interested in, using the same regular expression as above. Other criteria in the search can be modification time (as in the -mtime example below), permissions, file type etc. See the man and info pages for find for more detail. The -print option specifies to print each successful match. The -exec option specifies a command that is executed for each successful match. The magic string \{\} is replaced by each matched filename in turn, and the string \; is needed to terminate the command. In our example, we want to search for the words ParFlags. and GcFlags. in all C source or header files, using egrep, which also uses regular expressions. In summary, do:
# recursively in all dirs $ find . -name "*.[ch]" -print # do something recursively in all dirs $ find . -name "*.[ch]" -print -exec egrep -n --color "(Par|Gc)Flags[.][^n]" \{\} \; # files modified within the last 24 hrs $ find . -mtime -1 -print |
![]() | The multi-purpose find command |
---|---|
Find is far more powerful than just a tool for searching files: it does a traversal of an entire directory structure. It is worth studying the man page for find to get an idea what you can achieve with this command. |
Another example below shows, how to find large files in a subdirectory, using pipes and
a little bit of shell programming, using shell variables such as $z
.
An introduction to shell programming is given in the Section called Basic Shell Scripting Using bash.
The du -s *.[ch] command lists all C source and header files in all subdirectory, together with the file size.
The sort command, sorts the output by the first column, numerically (-n).
You can get a list of all files, sorted by size, with largest file last, using this command:
$ du -s *.[ch] | sort -k 1 -n |
while
loop
iterates over all lines, binding the filesize to variable x
and the filename to variable nam
,
and calculating the total files size of all matched files in the loop body.
This total size is printed at the end. The brackets group the final 2 commands together.
In summary, do:
# disk usage $ du -s *.[ch] | sort -k 1 -n | (while read x nam ; do z=$[ $z + $x ] ; done; echo "Total size of all C files in kB: $z") |
![]() | More on regular expressions: |
---|---|
Check out this section in the Advanced Bash Scripting Guide for more information on regular expressions. |
One of the most basic aspects of security on a Linux system, are the permissions you set on your files and directories. These allow you to control who can access your files and who can enter your directories. The set of all users on the system are partitioned into: the owner or user of the file (u), the members of the primary group the user belongs to (g), and all other users (o). For each of these 3 partitions, you can give read (r), write (w) and (x) execute permissions, permitting the corresponding operations. For directories, a user needs execute permission to enter it. For example, to allow users in your group to enter the directory containing your home page, but to prevent all except the owner from modifying, executing or even looking into the files themselves you can do the following (note that + indicates adding a permission, - removing a permission and = setting exactly that permission):
$ cd ~/public_html $ chmod g+x . $ chmod go-wx * |
To print your username, as well as the groups you belong to, type id. To just print the groups, type groups. To just print your username type whoami.
At this point, take a moment to check the permissions on your home directory and on the files
in your home directory. These should be set so that only you can access the home directory.
Check this by typing in the following command, and you should see a response like this (you should see your username in place of xyz12
below):
$ cd $ ls -lad . drwx--s--x 16 xyz12 xyz12 4096 Sep 12 15:05 . |
![]() | If you see different permissions for your home directory, in particular if you see read or write permissions for users other than yourself, you should change the permissions back to the original settings. You can do so with this one-liner:
711 is an octal number that encodes all permissions in one go. The octal value expands
to the binary value 111001001 , which in turn a bitmask for the read, write and execute permission
of the user, group and others. Therefore, you read 711 as the permissions rwx--x--x as displayed by the ls -l command.
|
Most Linux distributions extend this scheme of base permissions: named users and named groups can be created. This lifts the restriction of associating only 1 user (owner) and 1 group to a file or folder. For each named user and named group, an own set of permissions can be defined. The command getfacl displays the current set of permissions for a file and the command setfacl modifies these. For more detail see the section on Access Control Lists (ACLs) on these slides (4up version).
One concrete example is the case where you want the web server user apache to be able to write files. You can grant write access on a particular file filename to the apache user like this:
$ setfacl -m u:apache:rwx filename |
$ getfacl . # file: . # owner: xyz12 # group: msc # flags: -s- user:apache:rwx user::rwx group::--x other::--x |
apache
, for whom we have set read, write and execute permissions in the command above.
If a file or directory has permissions for named users attached to it, this will show up as a +
in an ls -l
listing.
Since some of these commands need special permission, you best try them on your own laptop, referring to the slides above for explanation.
![]() | More on file security: |
---|---|
For more information on file permissions and ACLs, see Chapter four in Sobell's textbook. |
You can customise your Linux environment, by modifying the files .profile and .bashrc in your home directory. The most common customisation is to define an alias as discussed above. Or you can set your prompt, which will replace the $ symbol in your examples.
One important example of tuning your environment, is to tune the list of directories that are searched when you type in a command. By default, only system directories are searched. You can check this using the which command:
$ which ls /bin/ls |
$ export PATH=$PATH:~/bin |
$ hello.sh |
To test that the changes are in effect, start a new shell by typing bash and then execute one of the new commands, such as his, or execute a file in your ~/bin directory, such as hello.sh, like this:
The shell keeps a history of all executed commands, which you can display by typing history. On the command-line, use UP-ARROW or CTRL-P to show the previous command in the history, and DOWN-ARROW or CTRL-N to show the next command in the history. You can edit the line shown, to execute a slightly modified command. Use CTRL-A to jump to the beginning, and CTRL-E to jump to the end of the line, and use backspace for deleting characters. For filename completion, type the first letters of the file you want to refer to, and then use TAB. All keybindings available on the command line are provided by the readline library (see info readline for more detail: info rluserman).
Some more useful commands are:
du estimate file space usage
df report free disk space
file determine file type
diff compare files line by line
cmp compare two files byte by byte
dd convert and copy a file
pushd adds a directory to the top of the directory stack, making the new top of the stack the current working directory.
popd removes entries from the directory stack.
dirs displays the list of currently remembered directories.
tar A tar (tape archiver) program.
gzip compress or expand files
bzip2 a block-sorting file compressor
![]() | Advanced Exercises | ||||
---|---|---|---|---|---|
As an advanced exercise, modify your prompt on the command line, so that it shows the user-name, the host-name, as well ass the available disk space and disk utilisation on the current partition. Use colour background to high-light the last 2 pieces of information. Once implemented correctly, you should see a prompt as in the picture below. First, test your solution by directly changing prompt on the command-line using
|
<<< Previous | Home | Next >>> |
Basic Linux Usage | Basic Shell Scripting Using bash |