mugshot, larkspur

g13n


Gopal Venkatesan's Former Journal

My writings on free/open source software and other technologies


Moved out!
I have moved my journal to my own VPS setup powered by Amazon EC2. Visit g13n.in for my updated journal!

Moved out!
mugshot, larkspur
g13n
I have moved out to my own VPS setup powered by Amazon EC2 at g13n.in/blog since I was not getting enough SEO. Thanks LiveJournal.

Configuring nginx with daemontools
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


In this blog post I will how to set up nginx with daemontools.
Installation of nginx is covered in my previous blog post. The reason I chose daemontools over recent monitoring tools like monit is it's simplicity in configuring a service. While a lot of posts out there criticising D.J.Bernstein's philosophies, it should be admitted that his works focus a lot on simplicity, portability and performance. Here is a good post from Aaron Swartz on DJB, it is worth reading.



Installing daemontools



Installing daemontools is extremely easy.



sudo mkdir /package && sudo chmod 1755 /package && cd /package
sudo wget http://cr.yp.to/daemontools/daemontools-0.76.tar.gz && sudo tar -xzpf daemontools-0.76.tar.gz
cd admin/daemontools-0.76 && sudo package/install


Just in case you are getting an error about "errno: TLS definition in /lib/libc.so.6", then follow the steps below. I got the error on my CentOS 5.5 VPS and the following steps worked for me.



shell$ sudo package/install
Linking ./src/* into ./compile...
Compiling everything in ./compile...
./load envdir unix.a byte.a 
/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference in envdir.o
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [envdir] Error 1
Copying commands into ./command...
cp: cannot stat `compile/svscan': No such file or directory


Edit /package/admin/daemontools/src/conf-cc and add -include /usr/include/errno.h to the gcc line. Rerun package/install and it should work.
The recent versions of daemontools set the appropriate steps to start svscan, for example, adding the svscan in /etc/inittab on SysV systems. In case your system is a BSD, a reboot is necessary for running svscan.



Configuring nginx



daemontools cannot supervise daemons that put itself into the background. According to D.J.Bernstein, author of daemontools, it is a bad software design. That makes most daemons bad, including nginx! Fortunately nginx allows itself to be run in foreground by just adding daemon off in its configuration.
So, after adding daemon off in /opt/nginx/conf/nginx.conf, it is fairly simple to put nginx supervised by daemontools.
Create a directory nginx under /service and create a shell script named run under it with the following contents.



#!/bin/sh
echo starting nginx
exec /opt/nginx/sbin/nginx


Set the following permissions to the files.



shell$ sudo chmod 1755 /service/nginx/run


That's it, supervise will automatically start and monitor nginx now.
To see the status of nginx, just run svstat /service/nginx.
After altering nginx configuration, just kill nginx using pkill nginx, supervise will take care of automatically restarting the service. Alternately the following is a better option for nginx to re-read its configuration.



shell$ svc -h /service/nginx

nginx php-fpm: using nginx
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


This blog post will be a quick installation and configuration guide to nginx (pronounced Engine-X) with PHP.


Although many blog posts out there point out that nginx outpaces Apache on performance, with my tests both Apache 2 and nginx give the same throughput. But, where nginx definitely is far better compared to Apache 2.2 is in its memory usage. Since the focus of this article is not comparing the performance of nginx and Apache you will not find the comparative study in this post.


What is PHP-FPM?


PHP-FPM is an alternative PHP FastCGI implementation that includes process management, adaptive process spawning and a whole lot of features. This is very similar to Apache with mod_php module.


Compiling and installing nginx


Assuming you have the development tools installed in your distribution, you can download nginx, extract the tarball, compile and install. At the time of this writing, the latest nginx version is 0.9.7. In spite of being a development version it is very stable, so you can go ahead and use it at your own risk ;-).


shell$ wget http://nginx.org/download/nginx-0.9.7.tar.gz && tar -xzf nginx-0.9.7.tar.gz && cd nginx-0.9.7
shell$ ./configure --prefix=/opt/nginx && make && sudo make install

The stock configuration that comes with nginx installation is good enough. The only change that would be required is if gzip is "turned on", then worker_processes need to be doubled.


Compiling and installing PHP-FPM


Once you have downloaded the PHP distribution, apart from the regular configuration options add --enable-fpm option. That's it.


shell$ wget http://in2.php.net/get/php-5.3.6.tar.bz2/from/us.php.net/mirror && tar -xzf php-5.3.6.tar.bz2 && cd php-5.3.6
shell$ ./configure --prefix=/opt/php --enable-fpm && make && sudo make install

Configuring nginx with PHP-FPM


Now we need to modify both nginx.conf and php-fpm.conf.


Edit /opt/nginx/conf/nginx.conf and uncomment/add the following lines under the server section.


location ~ \.php$ {
    root           html;
    fastcgi_pass   unix:/tmp/php-fpm.socket;
    fastcgi_index  index.php;
    fastcgi_param  SCRIPT_FILENAME  /opt/nginx/html$fastcgi_script_name;
    include        fastcgi_params;
}

Edit /opt/php/etc/php-fpm.conf and change the listen directive to /tmp/php-fpm.socket.


Restart nginx and php-fpm and you're done.


shell$ sudo /opt/nginx/sbin/nginx
shell$ sudo /opt/php/sbin/php-fpm

Using Unix sockets over the default TCP (port 9000) configuration you can squeeze in a little more throughput, of course it is faster!


Enabling auto-completion of hostnames in Bash
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


Among the several popular GNU/Linux distributions only Ubuntu supports auto-completion of hostnames in Bash out of the box. I use Fedora at home and Mac OS X at work and both does not have this capability by default. So I started to RTFM how to program the bash(1) auto-completion, and to my surprise it requires only a few lines of shell programming to enable auto-completion for any command.

Providing a function to list the options

For bash(1) to provide the auto-completion, it expects the list of options that can be matched against the (partial) user input. For example, consider a directory containing the following filenames under it:

shell$ ls
file1
file2
file3

Under this directory, when the user types “ls f” and hits a TAB, bash(1) auto-completes “ile” and asks the user to provide more input. The list of words in the current command line is available to programmable completion under the array variable COMP_WORDS. The COMP_CWORD variable is the index into the COMP_WORDS array variable for the current cursor position. Now we can correlate these and determine that the current word that needs to be completed is COMP_WORDS[COMP_CWORD].


For bash(1) to complete the word, it needs to match the current partially completed word against the list of possible options. In our “ls” example, the list of possible options are the names of files and directories under the current directory (or some other directory if the user has provided a path.) Fortunately for us, bash(1) provides a built-in called compgen to generate the possible list of completions for the word. The various options and the detailed usage of compgen is beyond the scope of this blog post, so you can read bash(1) at your leisure.


Finally, for asking bash(1) to complete for a given command, we can use the complete built-in function. The list of possible options is read from the COMPREPLY array variable.


Auto-complete hostnames for ssh and scp


Without any more delay let us look at our nifty function to complete hostnames for ssh(1). Here it is:

# Complete ssh and scp
_ssh()
{
    local cur opts

    # the current partially completed word
    cur="${COMP_WORDS[COMP_CWORD]}"
    # the list of possible options - what we have found reading known_hosts
    opts=$(sed '{ s/^\([^ ]*\) .*$/\1/; s/^\(.*\),.*$/\1/ }' $HOME/.ssh/known_hosts)
    # return the possible completions as a list
    COMPREPLY=($(compgen -W "${opts}" ${cur}))
}

complete -F _ssh ssh scp

That’s it! You can either chose to install the above shell procedure as a file under /etc/bash_completions.d/ssh and source it from your $HOME/.bashrc, or have the entire content in your $HOME/.bash_profile in case you are not the system administrator.


PHP 5.3.4 and MySQL 5.5.8 GA (libmysql)
mugshot, larkspur
g13n

This content has been updated and moved to a new place.

As you are probably aware, PHP 5.3.4 does not compile with MySQL 5.5 GA. The details can be seen in MySQL bug queue. Basically, the problem boils down to incorrect installation of MySQL headers. MySQL 5.5 build system does not install the headers under the include-prefix/mysql directly but instead installs under the include-prefix directory itself. So, when the PHP build system looks for the MySQL headers, it cannot find <mysql/psi/mysql_thread.h> and so forth.

What is the fix?

The MySQL dev team has committed patches to fix this issue in MySQL which is due to go out in the next release. If you are impatient for the next release, go ahead and grab the patches and apply them onto the MySQL (5.5.8) source tree and recompile MySQL. You need to download the patch 1 first and apply it, followed by patch 2. The following are the basic steps to be followed:

  1. If MySQL server is running, bring it down. shell$ sudo /path/to/mysql/bin/mysqladmin -u root shutdown
  2. Extract the source tarball (5.5.8) and apply the patches
  3. Configure and build the system.
    1. MySQL 5.5 and above uses CMake instead of the traditional autotools.
    2. shell$ cmake -DDEFAULT_CHARSET=utf8 -DDEFAULT_COLLATION=utf8_unicode_ci -LH .
    3. shell$ make
    4. shell$ sudo make install
  4. Start the MySQL server. shell$ sudo /path/to/mysql/bin/mysqld_safe --user=mysql &

Configuring PHP

Now that we have MySQL "fixed", we can configure PHP with the traditional configure --with-mysqli=/path/to/mysql/bin/mysql_config followed by make && sudo make install and you're good to go!


Downloading, compiling, and installing MySQL Server from source code
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


If you are running any GNU/Linux server operating system like RHEL 5 or CentOS 5, you may probably install MySQL server that comes with the operating system packages either during the initial setup or later using yum(8). The advantage being addition/removal of packages either using the GUI package manager or rpm(8), yum(8). Fair enough. But unfortunately the MySQL package (mysql-server) that comes bundled with RHEL 5.5 or CentOS 5.5 is fairly old (5.0.77). What if you want to install the latest stable version of MySQL yet have the advantage of removing/re-installing the software using rpm(8)?


In this blog post, I will guide you with compiling MySQL from source code yet installing the software through rpm(8) so that we tune and configure the software for the target machine and yet uninstall the software using RedHat package manager.


Compiling and Installing MySQL using rpmbuild(8)


First make sure you have sudo(8) access and the machine has rpmbuild(8) installed.


shell> rpm -q rpm-build

If rpm(8) reports the package is not installed, install it using yum(8).


shell> sudo yum install rpm-build

Assuming the installation is successful, let us prepare the machine by installing the dependencies for MySQL.


shell> sudo yum install gperf

You have to download the source RPM (SRPM) for MySQL since we would be building and installing from source rather than install a pre-built binary package.


shell> wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-community-5.1.53-1.rhel5.src.rpm/from/http://mysql.mirrors.pair.com/

Before we begin, you need to setup the RPM build environment. This is covered in the CentOS wiki and also at several other places around the web. Basically it boils down to installing the rpmbuild(8) package and following the steps below.


shell> mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
shell> echo '%_topdir %(echo $HOME)/rpmbuild' >~/.rpmmacros
# In case gcc(1) and make(1) are not installed ...
shell> sudo yum install gcc make

Now we are set to start the build process. If you have ever compiled from source code you would know that you can configure the software by passing command line arguments to the configure script. Unfortunately for building using rpmbuild(8) the parameters have to be edited by hand in the corresponding SPEC file.


# Note, no sudo(8)
shell> rpm -ivh MySQL-community-5.1.53-1.rhel5.src.rpm

The above command extracts the source tarball into $HOME/rpmbuild/SOURCES and the SPEC file into $HOME/rpmbuild/SPECS. Normally there is no need to edit the SPEC file, but MySQL by default uses latin-1 server character set and latin1_swedish_ci as the corresponding default collation. This is preferred if the data stored is not just latin-1 characters, and it is always better over the default. So, edit the $HOME/rpmbuild/SPECS/mysql-5.1.53.rhel5.spec file and add the following lines to the configure line.


            --with-charset=utf8 \
            --with-collation=utf8_unicode_ci \
            --enable-profiling \

Here is the patch if you want to directly apply it using the patch(1) command.


--- mysql-5.1.53.rhel5.spec 2010-12-08 09:01:38.000000000 +0530
+++ mysql-5.1.53.rhel5.spec.new 2010-12-08 09:01:27.000000000 +0530
@@ -552,6 +552,9 @@
            --with-unix-socket-path=/var/lib/mysql/mysql.sock \
            --with-pic \
            --prefix=/ \
+ --with-charset=utf8 \
+ --with-collation=utf8_unicode_ci \
+ --enable-profiling \
 %if %{CLUSTER_BUILD}
            --with-extra-charsets=all \
 %else

Now everything is done to trigger the build. The build will take quite an amount of time given the number of tests that would be run after the server code is built. Trigger the build using the rpmbuild(8) command. The define parameter is necessary without which the build will not succeed.


shell> rpmbuild --define 'community 1' -bb ~/rpmbuild/SPECS/mysql-5.1.53.rhel5.spec

Once the build is successful, the binaries are placed under the RPMS directory under the appropriate architecture directory. My machine is a 64-bit machine and hence my build target is x86_64. You can use rpm(8) to install them.


shell> cd ~/rpmbuild/RPMS/x86_64
shell> sudo rpm -ivh MySQL-*

MySQL server will automatically be started after the install is successful. However note that my.cnf is not installed by default. Based on your installation and purpose you can copy one of the small, medium, large configurations as the default configuration. On my machine I chose the medium as the default.


shell> sudo cp /usr/share/mysql/my-medium.cnf /etc/my.cnf

This completes the MySQL server installation with the default datadir and other options. If you want to move your datadir to a different directory other than /var/lib/mysql because of space constraints I will reserve that for my next post :-)


shell> mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.53-community-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SELECT VERSION();
+----------------------+
| VERSION()            |
+----------------------+
| 5.1.53-community-log |
+----------------------+

1 row in set (0.01 sec)

mysql>

Restoring SLES boot
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


My home (desktop) computer runs three operating systems viz., Windows 7, RHEL 5.5 (evaluation copy), SLES 11 SP1 (evaluation copy) with RHEL GRUB boot loader. I did experience enough problems setting up SLES but I’ll explain them later in this post. Everything worked fine until yesterday. Since yesterday selecting SLES (SuSE Linux Enterprise Server) gave a very weird error from GRUB “Error 2: Bad file or directory type”.

What is this GRUB loader error “Error 2: Bad file or directory type” mean?

I checked for the error description and found that the error reported by GRUB stage 2 and it means the file (here kernel) is not a regular file, but something like a symbolic link, directory, or FIFO. The partition was a small “/boot” partition for SLES. When I tried to mount the partition from RHEL, it did mount without reporting any errors or warnings. Doing a file system check (fsck) after dismounting the file system also didn’t report any errors!

I searched on the Internet for this problem and came across some solutions indicating reinstallation of GRUB. I tried that, but that too didn’t work!

How I solved the problem?

I tried using SLES install DVD, trying to repair the system and finally almost gave up. I made up my mind to reinstall the OS from scratch after one more try. I mounted the file system from RHEL, copied all the files, recreated the file system and copied all the files back into it. Now that worked!

shell$ sudo mount /dev/sda6 /mnt
shell$ sudo mkdir suse_boot
shell$ sudo cp -R -p /mnt/* suse_boot
shell$ sudo umount /mnt
shell$ sudo mke2fs -j /dev/sda6
shell$ sudo mount /dev/sda6 /mnt
shell$ sudo cp -R -p suse_boot/* /mnt
shell$ sudo umount /mnt

My experience with SLES


With my personal experience with both RHEL and SLES I feel that RHEL is far ahead when it comes to a rock solid server/enterprise operating system. Right from the installation SLES makes you feel you are not installing a GNU/Linux or a Unix-like operating system, but a desktop very much like Windows or Mac OS X. Most of the options for an experienced Unix administrator has been hidden, including configuring the boot loader. You’re definitely sure of overwriting your boot loader like me if you aren’t careful about the screens. Configuring the boot loader is almost hidden from the installation process. After installation the OS doesn’t register properly with Novell. You have to use the command line “suse_register” to successfully register the evaluation copy. I’ll reserve a separate blog post for my experience with RHEL and SLES.



A “well defined” undefined behaviour in C++
mugshot, larkspur
g13n

This content has been updated and moved to a new place.


Having found this one very recently by first-hand experience, I was quite stumped by the behaviour of C++ (compilers) described below. It is a surprise because of my understanding that C++ is very strict compared to its blood brother C. And of course C++ is more strict compared to C, but there are places where the standard has left the behaviour to the implementation.

The “well defined” undefined behaviour that I’m referring to in this post is about return values from functions. We all know that, in C++ the return value from a function (or method) should match its declared return type. But its true that most (not all) C++ compilers don’t check all code paths for the return from within a function. Isn’t this surprising? Well, take a look at the following code that generates a Fibonacci series.

  1: #include <iostream>
  2: #include <vector>
  3: 
  4: using std::cout;
  5: using std::endl;
  6: using std::vector;
  7: 
  8: vector<int> fibo(const int max);
  9: 
 10: int main()
 11: {
 12:         vector<int> f = fibo(30);
 13:         for (vector<int>::iterator i = f.begin(); i != f.end(); ++i)
 14:                 cout << *i << ' ';
 15:         cout << endl;
 16:         return 0;
 17: }
 18: 
 19: vector<int> fibo(const int max)
 20: {
 21:         vector<int> rv;
 22: 
 23:         int i = 1, sum, j;
 24: 
 25:         sum = j = 0;
 26:         rv.push_back(i);
 27:         
 28:         while (sum <= max) {
 29:                 sum = i + j;
 30:                 if (sum <= max)
 31:                         rv.push_back(sum);
 32:                 j = i;
 33:                 i = sum;
 34:         }
 35: }

 


If you look closely, you’ll notice that I missed the “return” statement in the “fibo” function. If you think that compiling this will produce an error, you’re wrong. Well it depends on the compiler that you used to compile this program. On most Unix and Unix-like systems (including GNU/Linux) the default system compiler is the GNU project C++ compiler. So, if you tried compiling this with the GNU project C++ compiler, at most you might get a warning (depending upon the version and the compiler flags), but it will produce the binary!


If you think GNU C++ compiler sucks, even Sun Studio Express C++ compiler produces the same result i.e., it emits a warning but produces “a.out”.


I was stumped because I didn’t expect this. Of course you don’t realize the mistake until you the run the program and see disastrous results. On the systems which I tried, almost all of them ran for a few seconds before the program “segfaulted”!


What is the reason for this?


The C++ standard says



Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior
in a value-returning function


Most compilers tend to avoid this check because checking every code path for return may prove to be difficult.


Java compiler does this explicit checking because the standard mandates this check. But that makes it a little brittle because you can easily fool the compiler and the compiler will annoy you at times. The following example will refuse to compile with a Java compiler.

  1: public class ReturnTest {
  2:     public static void main(String[] args) {
  3:         if (args.length == 1) {
  4:             String greeting = (new ReturnTest()).greet(args[0]);
  5:             System.out.println(greeting);
  6:         }
  7:     }
  8: 
  9:     public String greet(final String name) {
 10:         if (name.length() < 3) {
 11:             System.err.println("Name should be at least 3 characters");
 12:             System.exit(1);
 13:         } else {
 14:             return "Hello, " + name;
 15:         }
 16:     }
 17: }

 


Agreed that its a bad program, but I wanted to show an example. Even though the control flow will not go to line 16, Java compiler will refuse to compile this program without you adding a dummy return statement.


A better C++ compiler?


Did I say that not all compilers behaved the same way. Even though I rarely use Windows for developing software and/or try out such things, I do have the free Visual C++ Express Edition installed. If you try compiling the above program using Visual C++ compiler it fails with a fatal error reporting the function “fibo must return a value”.


Hadoop and Pig on your laptop or personal computer
mugshot, larkspur
g13n

This content has been updated and moved to a new place.

Yes, I'm writing this article again for the second time for there have been some changes with respect to Hadoop, and also I'll show you how to install it better.

I'll choose two popular server operating systems CentOS, and FreeBSD for the purpose of this installation tutorial. Yes, Hadoop runs on FreeBSD without any glitches albeit unsupported as per their documentation. This blog post assumes that you're either running CentOS version 5.4 or you're running FreeBSD 8.0 RELEASE.

As part of the standard convention, shell> refers to commands to be entered as a normal user and shell# refers to commands to be entered as the system supervisor (root). I'll assume that you're running a Bourne compatible shell (either Bash or Ksh) and lines beginning with # are comments for reference.

For CentOS, you can do:


shell> cat /etc/redhat-release

CentOS release 5.4 (Final)

On FreeBSD, you can run:


shell> uname -mrs

FreeBSD 8.0-RELEASE i386

System Requirements

Of course it is recommended to have a computer with at least a gigabyte (1G) of RAM.

On the software side, you need the following:

  • Java version 1.6 (Sun JDK preferred)
  • sshd (installed and running)
  • rsync

On CentOS the pre-requisites can be met by installing the packages using yum(1):


shell> sudo yum install java-1.6.0-openjdk openssh-server rsync

# After the installation is successful ...

shell> sudo /sbin/service sshd start

On FreeBSD it is best to install them from ports.

At the time of writing, Pig stable version is 0.5.0 and it works with the latest Hadoop version 0.20.1.

Installing Hadoop

We'll create a special user that runs Hadoop.

On CentOS you can use useradd(8) command to add a new user.


shell# groupadd hadoop

shell# useradd -g hadoop -s /bin/sh -m hadoop

shell# /usr/bin/passwd hadoop

# Set some password for the user

On FreeBSD you can use the pw(8) command to add a new user.


shell# groupadd hadoop

shell# pw useradd -g hadoop -s /bin/sh -m hadoop

shell# /usr/bin/passwd hadoop

# Set some password for the user

Download and unpack Hadoop version 0.20.1 under a directory say, /usr/local/sfw such that it is installed under /usr/local/sfw/hadoop-0.20.1. Henceforth I'll refer to /usr/local/sfw/hadoop-0.20.1 as $HADOOP_PATH and hadoop as $HADOOP_USER.

Setup necessary permissions for $HADOOP_USER.


shell# mkdir $HADOOP_PATH/logs

shell# chown $HADOOP_USER:$HADOOP_GROUP $HADOOP_PATH/logs

Edit $HADOOP_PATH/conf/hadoop-env.sh and set the JAVA_HOME environment variable.

The Hadoop configuration has been split into multiple files now. The older hadoop-site.xml has been deprecated in favor of these new files.

Edit $HADOOP_PATH/conf/core-site.xml and overwrite it with the following contents:


<?xml version="1.0"?>

<configuration>

    <property>

        <name>fs.default.name</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

Edit $HADOOP_PATH/conf/hdfs-site.xml and overwrite it with the following contents:


<?xml version="1.0"?>

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>

Edit $HADOOP_PATH/conf/mapred-site.xml and overwrite it with the following contents:


<?xml version="1.0"?>

<configuration>

    <property>

        <name>mapred.job.tracker</name>

        <value>localhost:9001</value>

    </property>

</configuration>

Starting Hadoop

Check if you can ssh to your machine without a passphrase as $HADOOP_USER.


shell> su - $HADOOP_USER

shell> ssh localhost

If not, create a passphraseless ssh key using ssh-keygen(1).


shell> whoami

hadoop

shell> ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

shell> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

If you're not able to ssh still, the permissions for the files under $HOME/.ssh may be incorrect. Unfortunately on some systems ssh(1) fails silently without any warning.


# As user $HADOOP_USER

shell> chmod 711 $HOME/.ssh

shell> chmod 600 $HOME/.ssh/*

Format a new distributed file-system:


# As user $HADOOP_USER

shell> PATH=$PATH:$HADOOP_PATH/bin; export PATH

shell> hadoop namenode -format

In case you get an exception like:


INFO util.MetricsUtil: Unable to obtain hostName java.net.UnknownHostException

Check if your hostname has an entry in /etc/hosts file and add it if it is not found.

shell> grep `hostname` /etc/hosts

Now that you're done with configuring, you're ready to start the Hadoop.


shell> $HADOOP_PATH/bin/start-all.sh

If everything went well, you can check the list of files under the newly created HDFS. You'll see the output similar to the one below:


shell> ./bin/hadoop fs -ls /

Found 1 itemsdrwxr-xr-x - hadoop supergroup 0 2009-09-05 22:10 /tmp

Starting Hadoop on boot

To start Hadoop everytime upon boot, I have created rc(8) scripts for CentOS and FreeBSD. They can be downloaded from my GitHub samples repository. The CentOS version has its name suffixed with -centos54 whereas the FreeBSD version has -freebsd8 suffix to its name.

Installing Hadoop boot script on CentOS

As root copy the downloaded hadoop_rc_script-centos54 to /etc/rc.d/init.d/hadoop.

Carefully review and set the HADOOP_PATH and HADOOP_USER values appropriately in the script and save it.

Next run chkconfig(8) to add the script to the run-level.


shell# /sbin/chkconfig --add hadoop

That's it, Hadoop will be automatically started upon boot. If you want to start it manually, you can use service(8).


shell# /sbin/service hadoop start

Installing Hadoop boot script on FreeBSD

As root copy the downloaded hadoop_rc_script-freebsd8 to /etc/rc.d/hadoop.

Carefully review and set the HADOOP_PATH and HADOOP_USER values appropriately in the script and save it.

That's it, Hadoop will be automatically started upon boot. If you want to start it manually, you can use the following command to start it:


shell# /etc/rc.d/hadoop start

To start Hadoop upon boot, edit /etc/rc.conf and add hadoop_enable="YES".

Installing Pig

Download and unpack Pig version 0.5.0 under a directory say, /usr/local/sfw such that it is installed under /usr/local/sfw/pig-0.5.0. Henceforth I'll refer to /usr/local/sfw/pig-0.5.0 as $PIG_PATH.

To use Pig with installed Hadoop cluster, the PIG_CLASSPATH needs to be set to the installed Hadoop's configuration directory. So, I have the following procedure installed in $HADOOP_USER's $HOME/.profile to invoke Pig with the appropriate environment.


pig() {

    HADOOP_PATH=/usr/local/sfw/hadoop-0.20.1

    PIG_PATH=/usr/local/sfw/pig-0.5.0

    JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk \

    PIG_CLASSPATH=$PIG_PATH/pig-0.5.0-core.jar:$HADOOP_PATH/conf \

    $PIG_PATH/bin/pig $@

}

To test, you can run the ls command in the grunt prompt to check if things are fine.

shell> pig
2010-02-03 22:32:51,685 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/home/hadoop/pig_1265216571684.log
2010-02-03 22:32:52,325 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2010-02-03 22:32:52,867 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> ls /
hdfs://localhost:9000/tmp	
grunt>

Getting started with Mutt and IMAP
mugshot, larkspur
g13n

This content has been updated and moved to a new place.

All mail clients suck. This one just sucks less.

That is what the mail client Mutt claims, and yes it is very true. Even though at work I use Thunderbird, I enjoy using Mutt at home. It gives me complete control using just the keyboard, much like my text editor.

Installing Mutt

Though almost all Unices and Unix-like operating systems bundle Mutt with them, I recommended that you download the source and build it. This is because the bundled version is most likely an older version, or the stable version (1.4.x). The development version (1.5.x) has loads of good features including built-in SMTP support. Now if you are a true Unix geek you would argue that one program should do only one thing, so a mail client (MUA) should only be used to read e-mails and not worry about how to fetch e-mails nor how to compose and send them. I do agree. Let me come to that in my next blog post. For this blog post let me introduce configuring Mutt to fetch, read and send e-mails.

As like any free software, the standard way to build Mutt is to run sh configure && make && sudo make install. But, to configure Mutt with SMTP and IMAP support, the following arguments need to be passed to the configure script.

sh configure --enable-imap --with-ssl --enable-smtp --enable-hcache && \
make && \
sudo make install

Setting up Mutt and IMAP

Now that Mutt has been installed, we'll set up a minimum configuration to effectively use Mutt with IMAP.

The Mutt configuration file is by default stored in your home directory as .muttrc. So let us begin by creating this file, and I'll explain the use of each line in the configuration file through comments. Like many Unix configuration files, # is recognized as the comment character in muttrc also.

# Instead of beginning from scratch, start with the installed configuration
source /usr/local/etc/Muttrc

# Inform Mutt where to fetch your mails from?
# If the server (specified by hostname) doesn't support SSL, change imaps to imap below
set spoolfile = imaps://username@hostname:port/INBOX

# Set the folder so as to use shortcuts while specifying draft and sent folders
# If the draft and sent folders are below the INBOX directory, append INBOX below
set folder = imaps://username@hostname:port/

# Inform Mutt where to save a copy of the outgoing mail
# Typically for Exchange mail, the sent folder is "Sent Items" parallel to the INBOX folder!
set record = "=Sent Items"

# Set the drafts folder
set postponed = "=Drafts"

# Tell Mutt which folders to query for new mails
# The following says only the $spoolfile folder (INBOX) needs to be queried
mailboxes !

Setting Mutt with SMTP

Starting with version 1.5.x, Mutt adds support for SMTP as well. This means that you don't have to configure Sendmail to relay local mails to remote server, nor you need to install a light-weight MTA like MSMTP.

# Tell Mutt how to send mails
# If your SMTP server doesn't support SSL or TLS, use smtp over smtps
# If your SMTP server doesn't support authentication, remove the username@
set smtp_url = smtps://username@hostname:port/

Putting it altogether

That's it, you're done configuring Mutt with IMAP. But, with the above configuration Mutt will fetch the mails from the server everytime which can be slow depending upon the server and the number of messages on the server. Thankfully Mutt can be configured to use a local cache for speeding up things. If you remember while configuring Mutt I had passed --enable-hcache to the configure script. This option builds Mutt with support for caching. Add the following to the Mutt configuration file and you're done.

# Cache the headers locally under $HOME/mail/headers folder
# Create this folder manually before starting Mutt
set header_cache = "~/mail/headers"

# For better performance cache the message body as well
# Create this folder manually before starting Mutt
set message_cachedir = "~/mail/messages"

Additional goodies - using an LDAP address book

If you're using Mutt in an organization where the address book is available through LDAP, then you can ask Mutt to query it for addresses. Unfortunately Mutt by itself doesn't come with querying remote address books, so I have written a small shell procedure that wraps ldapsearch which is bundled with OpenLDAP. You can download and build the software yourself or grab the pre-built package from here.

Once OpenLDAP is installed, you can download my shell script and use it with Mutt. Here's how you can use it assuming you have installed my script under some directory which is in $PATH:

# Substitute $BASEDN, $HOST and $PORT with your organization's LDAP settings
set query_command = "ldap_query -b $BASEDN -h $HOST -p $PORT %s"

When composing a mail, you can hit Control-T (^T) to query the LDAP server.

Hope that was useful. In the upcoming posts I'll write about configuring Mutt with multiple accounts.