Venus-C.png

                              Computer Algebra on Cloud

                      Infrastructures 

             CALcIUM

                                  VENUS-C  

    

 Edinburgh:         



Project News:

 

 

First Part of the Project

Date Descriptions Comments
 1.09.2011  The project (kick off) start  
 Week 01 

Overall Amazons and Windows Cloud Computing review. Windows Azure installation and testing.  

Week 02

Overall Amazons and Windows Cloud Computing review. Windows Azure installation and testing.  

Week 03

SymGrid-Par installation on x64 (lxpara3, bwlf01-32 )and x32 (lxpara02,Desktop Vm) bits machines

 

Private cloud installation for three clusters in one machine. The used Tools are,Vmware, Ubuntu, OpenNebula and MPI.

 Week 04

SymGrid-Par under Vm and some MPI tests.

------------------------------------------------------------------------------------------------

During the MPI installation under Ubuntu:

We have tested a hello message passing between 2-6 proc. These steps we have followed to run that test:

mpicc -Wall -O -o hello hello.c

mpirun -np 2 ./hello <------------------ 2 -4 num. of processors.

 

1. no mpd is running on this host

2. an mpd is running but was started without a "console" (-n option)

In case 1, you can start an mpd on this host with:

mpd &

$ mpd&

$ cd $HOME

$ touch .mpd.conf

$ chmod 600 .mpd.conf

vi hostnames                        // add the cluster node name e.g cluster01....... cluster0n

mpirun -machinefile hostnames ./hello

mpirun -np 2 ./hello <------------------ 2 -4 num. of processors

 

telnet cluster03 80

-------------------------------------------------------------------------------------------------------------------------

During symGrid-Par installation, first we have to know if it is a x32 or x64 bits machine:

Anyway, in both machines installation, we have faced some issues since not all the files are probably installed (may bcz of some ubuntu packages are not 'already' installed ).

1- We have upgraded the ubuntu to the lates version.

2- These files are missed and we have to copy them according to their paths:

~/SGP_v0.3.2/bin/sgp_admin.sh

~/SGP_v0.3.2/etc/sample-sgprc

~/SGP_v0.3.2/bin/CoordinationServer_pp

3- We have updated line 156 in CoordinationServer_pp file (the right path to LINUX/

hwloidl=CoordinationServer_pp )

 

~/SGPDir/SGP_v0.3.2/bin$ vi CoordinationServer_pp

 

#$executable = '/u1/staff/hwloidl/BUILDS/SGP/tx32b/SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/

hwloidl=CoordinationServer_pp';

$executable = '/var/lib/one/SGPDir/SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/hwloidl=CoordinationServer_pp';

4- These files are missed too

/SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/hwloidl=CoordinationServer_pp .

/SGP_v0.3.2/bin/testClient

/SGP_v0.3.2_BUILDS/SymGrid-Par-v0.3.2/SCSCP/testClient

5- Finally, at this stage everything is running fine as we expect: The following tests are done:

-          Sequential Fibonacci computation on the GAP server

-          This uses the Karatsuba algorithm for performing (sequential) polynomial multiplication on 2 random polynomials of degree 10.

-          Skeletons:parMapFold: A range of skeletons is supported by the Coordination Server (see Deliverable 5.8 for details). This example computes sumEuler over the list [87,88,89], with 0 as neutral element

Ok

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ok

 

 

 

 

 

 

 Week5

 -          GAP client   test

-        Timings  test

-        Parallel sumEuler test

 Suspended
 Week6

Trying to create and deploy some applications in windows azure, using Windows Xp SP2 and V. Studio 2010.Unfortuantlay, we couldn’t enable the azures tools, because of the Windows XP O.S. In Microsoft windows azure homepage, we found that Windows XP sp2 is one of the operating systems that the windows azures supported, but in fact that is not true. Windows Xp Sp2 is not supported.

 

-          Installing private cloud in three physical machines (3 clusters) using Vmware workstation, Ubuntu and OpenNebula. We have installed the OpenNebula package in the Host machine (cluster01), while opennebula-node package has been installed in Cluster02, and cluster03. During the test stage, we couldn’t make a successful connection despite changing the setting of the network connection to a bridge rather than NAT connection.

 

-          We think that is a routing issue. It might be not possible to make routing from Vmware to another Vmware in a different machine!!!!!! Yes of course we could do routing between three nodes (clusters) in the Vmware, but that in one physical machine.

 

-          Deploying a Vmware workstation image to Vm role in windows azure. Vmware image format is not supported (Some pages say, Vmware makes a conflict with windows azure). Therefore, VHD (virtual hard disk) is the only image that supported in windows azure. VHD can be created by one of the Windows server 2008, Windows 7 or virtual PC 2007 software under any other Microsoft O.s (e.g Windows XP).

 

-          Also, we have applied to access the Vm role Beta Program in  windows azure since 3 days, but we didn’t receive any feedback yet. At the time being the status shows that, it is pending.

 

VM Role

Description: The VM Role Beta Program includes a new Windows Azure role that allows you to upload a custom virtual hard disk image of a Windows Server 2008 R2 virtual machine and run it in Windows Azure.  By checking the box to opt-in to the VM Role Beta program, you accept the license terms for your use of the Windows Server 2008 R2 software in the VM Role Beta Program.

 

Status      Pending

Failed

 

 

 

 

Ok

 now is Solved see week 10-11

 

 

Ok

Solved see week 10-11

 

 

 

Ok , this issue is solved now

 

 

 Week7

 -        Creating VHD using virtual PC 2007 software. At the moment, we are going to test the installation of Ubuntu and MPI as one image (VHD), then we will upload it to the Vm role.

-       Moving to windows 7 server.

-       Creating private cloud in three physical machines, without use of the Vmware.

 

 PASS

Not supported for Vm role

See week 10-11

Week 8-9

 

During this time, we have successfully created two virtual hard disks (VHDs), one using Window7 server x64 bit, and the other

using Microsoft free Virtual Pc 2007 (VPC2007). Each VHD includes some virtual nodes (Clusters), each Cluster has Ubuntu as an

 O.S + MPI user in the main node (Node0), and this user is accessible from each cluster. All the nodes are communicated to each

other successfully. All these nodes (Clusters) were created as all-in-one image (VHD).

 

-          We have created the required thumbprint and any needed certificates (e.g X. 509, X. 511) for deploying any VM role or other role in windows azure.

 

-          After the needed certificates creation, we have deployed that two VHDs images, each VHD as all-in-one image  in the windows azure platform (Vm Role).

 

-          We have deployed the two images successfully, using this command from windows azure’s tool kit command line :

C:\Program Files\Windows Azure SDK\v1.5>csupload Add-VMImage -Connection "Subscr

iptionId=c4dee5b0-4b46-497a-a208-0ba23c76f851; CertificateThumbprint=3BCB0B8F7C2

F9CAB7085D16BC001A3C39BA0DB2E" -Description "CALcIUM VHD" -LiteralPath "D:\My Vi

rtual Machines\Virtual Machines-HardDisks\VHDubu.vhd" -Location "North Europe" -

TempLocation %TEMP% -SkipVerify

 

-          The message below shows that the VHD is deployed successfully.

 

Created new VM image VHDubu.vhd in location North Europe.

Creating new page blob of size 1259920896...

Elapsed time for upload: 00:01:34

Successfully uploaded and committed the VM image.

Name                       : VHDubu.vhd

Label                        : VHDubu.vhd

Description               : CALcIUM VHD

Location                    : North Europe

Status                       : Committed

Uuid                          : 8e550cc6-0b06-e111-818d-aed357bc6aa1

Timestamp                : 2011-11-03T11:05:43Z

MountSizeInBytes     : 8388608000

CompressedSizeInBytes : 1259920896

 

-          Despite the successful deploying of the two VHDs, the windows azure portal (Platform explorer) shows that, our VM role status is committed, and In use is False.

 

Issues, and the proposed solutions:

-          During the one day training (Windows Azure Bootcamp - Powered by Tech.Days on 11th November 2011 at John McIntyre Conference, Edinburgh First, Pollock Halls), I discussed CALcIUM pilot for VENUS-C project aims, and our plan for deploying a VHD includes (Virtual clusters, where Ubuntu is the main O.S for each node) as we mentioned above with Mr. Planky, who is the Windows Azure Bootcamp training presenter (Instructor). He is a cloud computing expert from Microsoft in London (http://plankytronixx.com/aboutus.aspx). He said that the idea (our proposed Ubuntu VHD) is not possible in windows azure, since that windows azure doesn’t support any Linux o.s or applications!. He suggested to try Amazon, or Vmware clouds. I’ve already contacted Amazon and VMware cloud support team, for any suggestion regarding.

 

-          Also, we have received an answer (Email) from Microsoft windows azure support team regarding the Vm role issue (status is committed, and In use is False). He “ jaganathan” said that

     “Currently there is no support of Linux VHDs. The product team is looking at enhancing VM role and I will forward them your request for supporting linux VHD. “    jaganathan.

-          We have guessed! May be, if we create a VHD using Hyper-V under Window server 2008 R2, it may work probably, but we don’t know how much it will be reliable, and fast. We suggest that as a last try with Vm role in Windows azure.

 

Ok

 Week 10-11

Finally, we have successfully installed our private cloud in 3 Physical machines. The issue that we faced during week 6, which we couldn’t mange a communication between the virtual nodes (Clusters). It was mainly because of the Networking 3 modes in the virtual machine s.w (Vmware and VPC2007). Both Vmware and VPC2007 software supports 3 modes for networking adapter.

1-    Local Only: This mode enables the virtual nodes to communicate with each other if they all are installed (created) in one physical machine. But, they (nodes) can’t make any outside connection (e.g: No Internet access, No any other server access) unless, the mode changed to NAT for instant (to make such connection).

 

2-    Intel(R) 82567LM-3 Gigabit Networking Connections: This mode enables the virtual nodes to communicate with each other, if each node is installed (created) in a different physical machine. They (nodes) can’t make any outside connection as the case in local only mode (e.g: No Internet access, No any other server access) unless, the mode changed to NAT for instant (to make such connection).

 

 

3-    Shared Networking (NAT): This is the default mode at the node (cluster) booting time, start up. This mode enables the internet access and other servers access (e.g  root@cluster1$ ssh user@macs.hw.ac.uk)

 

Note: Changing the networking Mode from any other mode to NAT and vice versa, doesn’t affect the network or the communication between the nodes (Clusters), provided that, in any network mode the NAT should be the cluster’s booting mode (Node should use NAT mode at booting time, start up). Otherwise, the cluster (virtual node) will given a different IP address, that will cause a conflict in the network, and made the communication between the nods (Clusters) impossible (e.g : root@cluster1$ ssh mpisuer@cluster2  -> error unknown host name, or  time out error).

 

-          At this stage, we can confirm that we can go ahead using our private cloud for CALcIUM project. Also, We still have a hope to continue using windows azure platform, if we could create VHDs using Hyper-V and windows server 2008 R2, this is what we are suggested for week 12.

 

-          Also, we need to discuss the possibility of using MPI library, for C# to mange clustering in Microsoft cloud computing, since C# is a member of the .Net which is supported by windows azure platform and toolkit.

Ok

Week12 - 13

 - Since we have told by windows azure support team, that the only tool to create a successful VHD "supported by windows azure" is Hyper-v, server2008 R2, we have downloaded, and installed both Windows server 2008 R2 and Hyper-V (160 days free trial version) for test. The installation passed (successfully).

 Note:      Important For use in Windows Azure VM role instances, the operating system that is installed on the base VHD must be an English edition of one of the following: Windows Server 2008 R2 Standard, Windows Server 2008 R2 HPC Edition, or Windows Server 2008 R2 Enterprise: For more details see this link:  http://technet.microsoft.com/en-us/library/hh184311(WS.10).aspx

- Creating a virtual machine or VHD by  Hyper-V under Windows server 2008.We still facing a problem in creating either a Vm, and VHD. !!

- Start of applications in our private cloud, we mainly focus on Finite state automata (NFS). Understanding and Installing the needed packages (s.w) for NFS.

Ubuntu, or any other windows OS is not

Supported

 

 

 

 

In Progress

 

In Progress

Week 14 - 16

 

- We have tested totient function http://en.wikipedia.org/wiki/Euler%27s_totient_function . These are some of the results:                         

root@Cluster0:~/C+MPI# mpirun.openmpi -np 3 ParallelTotientRange3 1 10000
Test  for 3  Processors:
----------------------------------------------------
Sum of Totients  between [1..10000] is 30397485
Time: 5.523674 seconds
----------------------------------------------------

root@Cluster0:~/C+MPI# mpirun.openmpi -np 5 ParallelTotientRange3 1 10000
Test  for 5 processors:
----------------------------------------------------
Sum of Totients  between [1..10000] is 30397485
Time: 5.509558 seconds
----------------------------------------------------

root@Cluster0:~/C+MPI# mpirun.openmpi -np 7 ParallelTotientRange3 1 10000
Test for 7 Processors:
----------------------------------------------------
Sum of Totients  between [1..10000] is 30397485
Time: 5.491629 seconds
                              

-We have successfully created a virtual machine, and VHD by  Hyper-V under Windows server 2008.

C:\Program Files\Windows Azure SDK\v1.5>csupload Add-VMImage -Connection "Subscr

iptionId=c4dee5b0-4b46-497a-a208-0ba23c76f851; CertificateThumbprint=3BCB0B8F7C2

F9CAB7085D16BC001A3C39BA0DB2E" -Description "CALcIUM VHD" -LiteralPath "D:\My Vi

rtual Machines\HyperV Vm.vhd"  -Location "North Europe" -TempLocation %TEMP% -Sk

ipVerify

Windows(R) Azure(TM) Upload Tool version 1.5.0.0

for Microsoft(R) .NET Framework 3.5

Copyright (c) Microsoft Corporation. All rights reserved.

 

Using image name 'HyperVVm.vhd'

Using temporary directory C:\Users\macsadmin\AppData\Local\Temp...

Preparing VHD D:\My Virtual Machines\HyperV Vm.vhd...

The mounted size of the VM image is 20 GB. This image can be used with the follo

wing Windows Azure VM sizes: Small, Medium, Large, ExtraLarge

 

Windows(R) Azure(TM) VHD Preparation Tool. version 1.5.0.0

for Microsoft(R) .NET Framework 3.5

Copyright (c) Microsoft Corporation. All rights reserved.

 

Created new VM image HyperVVm.vhd in location North Europe.

Creating new page blob of size 4460148224...

Elapsed time for upload: 00:05:08

Successfully uploaded and committed the VM image.

Name                  : HyperV Vm.vhd

Label                 : HyperV Vm.vhd

Description           : CALcIUM VHD

Location              : North Europe

Status                : Committed

Uuid                  : bbaa01d7-14ab-4eb9-869f-e866eeefb31b

Timestamp             : 2011-12-22T20:23:58Z

MountSizeInBytes      : 21474836480

CompressedSizeInBytes : 4460148224

 

C:\Program Files\Windows Azure SDK\v1.5>

 

Ok

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ok

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Week 17 - 18

 

Break

 

  Second Part of The Project
 Date    Descriptions  Comments
 09.01.2012 

 The project 2nd Part start date

 
 Week 19

 - Testing the first stand alone project in windows azure Vm role with 2 instances.

 Ok
 Week 20-21

 After the first test for our cloud in week 14-16 ( totient function), we have decided to improve our cloud performance:

 - We have re-installed our private cloud network, by moving from MACS Network to a separate network. This network is a  a switch contains 3 machines (Cluster0, Cluster1 and Cluster3), all are associating in one Monitor, keyboard, and mouse by using a Belkin Omnicube KVM 4 Port Switch for PS/2 Compone  http://www.amazon.com/Belkin-Omnicube-Port-Switch-Components/dp/B00004Z84S.

- Images  of our Network (Image1, Image2)

- In the previous network all the clusters are associated one hard disk (H:\ Drive). In the new network, we have installed each Cluster in a separate hard disk (D:\).

 

Setting Up an MPICH2 cluster in our private cloud: We have followed the Ubuntu Community documentation : 1- https://help.ubuntu.com/community/MpichCluster

         
 Steps for MPICH2 Clusters Setting in Ubuntu:

1- Defining hostnames in each Cluster etc/hosts/       $nano /etc/hosts    

127.0.0.1 localhost

192.168.131.65 Cluster0

192.168.133.66 Cluster1

192.168.133.67 Cluster2

 

Note: if the Virtual machine IP address was not set correctly, use the following command to set it according to its IP

         address in the /etc/hosts   file:

Ex:    root@cluster:~$  ifconfig ethx 192.168.131.65  netmask 255.255.255.0

ethx (x is 0,1,2,..........etc). By the way; if the cluster's IP address changed (set) by this command (ifconfig), then the cluster may not be able to connect to the internet. If the internet is needed, you have to change the network mode in the Virtual PC 2007 to NAT, and then reboot (logout, and login aging).

2-Installing NFS: NFS (Network file System) allows us to create a folder (/mirror) on the master node (Cluster0) and have it synced on all the other nodes. This folder can be used to store programs. To Install NFS just run this in the master node's terminal:

root@Cluster0:~$ sudo apt-get install nfs-kernel-server

3-Sharing Master Folder: Make a folder in all nodes (/mirror), we'll store our data and programs in this folder.

root@Cluster0:~$ sudo mkdir /mirror

root@Cluster1:~$ sudo mkdir /mirror

root@Cluster2:~$ sudo mkdir /mirror

And then we share the contents of this folder located on the master node to all the other nodes. In order to do this we first edit the /etc/exports file on the master node to contain the additional line

/mirror *(rw,sync)

This can be done using vim or by issuing this command:

root@Cluster0:~$ sudo echo  /mirror *(rw,sync) >> /etc/exports

Note:  Than we store out data and programs only in master node and other nodes will access them with NFS:

4-Defining a user for running MPI programs: We define a user with same name and same userid :

root@Cluster0:~$ sudo adduser mpiuser          

root@Cluster1:~$ sudo adduser mpiuser

root@Cluster2:~$ sudo adduser mpiuser

 

Notes:

 1- We gave the same password 12345 to the user in all the Clusters (easy remember).

 2- Change  the mpiuser to be owned by itself to solve the privileges issue,

    (~mpiuser needs to be owned by mpiuser):

     root@Cluster0:~$ sudo chown –R mpiuser ~mpiuser

 3- Also, We need to change the owner of   /mirror   to mpiuser.

     root@Cluster0:~$ sudo chown mpiuser  /mirror

5-Mounting /master in nodes: Now all we need to do is to mount the folder on the other nodes. This can be done manually each time like this: 

root@Cluster1:~$ sudo mount Cluster0:/mirror /mirror ;During this step we have received this Error  

mount: wrong fs type, bad option, bad superblock on Cluster0:/mirror,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail  or so

To solve this issue, we have followed the Setting Up NFS  in Ubuntu Documentation Community : https://help.ubuntu.com/community/SettingUpNFSHowTo

Steps for NFS Setting in Ubuntu::

Install the required packages...

1- NFSv4 server: Install the required packages for the Server:

root@Cluster0:~$ sudo apt-get install nfs-kernel-server   ;Already Done 

NFSv4 exports exist in a single pseudo filesystem, where the real directories are mounted with the --bind option. Let's say we want to export our files in /mirror directory. First we create the export filesystem:         root@Cluster0:~$ mkdir -p /export/mirror       and mount the real users directory with:

root@Cluster0:~$ mount --bind /home/mpiuser /export/mirror 

To save us from retyping this after every reboot we add the following

line to /etc/fstab         ; Only in Cluster0

/mirror  /export/mirror   none    bind  0  0

There are three configuration files that relate to an NFSv4 server: /etc/default/nfs-kernel-server, /etc/default/nfs-common and /etc/exports.

  • Those config files in our example would look like this:

In /etc/default/nfs-kernel-server   we set:

NEED_SVCGSSD=no # no is the default

because we are not activating NFSv4 security this time.

In /etc/default/nfs-common we set: 
NEED_IDMAPD=yes
NEED_GSSD=no # no is the default

because we want UID/GUID to be mapped from names.

In order for the ID names to be automatically mapped, both the client (Cluster1, Cluster2) and server (Cluster0) require the /etc/idmapd.conf file to have the same contents with the correct domain names. Furthermore, this file should have the following lines in the Mapping section:

·         [Mapping]
·          
·         Nobody-User = nobody
·         Nobody-Group = nogroup

Now restart the service:

·         root@Cluster0:~$/etc/init.d/nfs-kernel-server restart  ;Cluster0
·         Note: Also you can use stop, start and restart Commands to check the NFS in the server

2- NFSv4 client: Install the required packages for the Clients(  ;Cluster1, and Cluster2):

·            root@Cluster1:~$ apt-get install nfs-common    ;Cluster1

·            root@Cluster2:~$ apt-get install nfs-common    ;Cluster2

The client needs the same changes to /etc/default/nfs-common to connect to an NFSv4 server.

·         In     /etc/default/nfs-common     we set:

·         NEED_IDMAPD=yes
NEED_GSSD=no # no is default

On the client we can mount the complete export tree with one command:

·            root@Cluster2:~$ mount -t nfs4 -o proto=tcp,port=2049 nfs-server:/ /mnt

load the nfs module by giving the command

root@Cluster0:~$ modprobe nfs

root@Cluster1:~$ modprobe nfs

root@Cluster2:~$ modprobe nfs

To make sure that the module is loaded at each boot, simply add nfs on the last line of /etc/modules.

Mounting /master in nodes: Now we can  mount the folder on the other nodes. 

root@Cluster0:~$ sudo mount Cluster0:/mirror /mirror

root@Cluster1:~$ sudo mount Cluster0:/mirror /mirror

root@Cluster2:~$ sudo mount Cluster0:/mirror /mirror

 

6-Installing SSH Server: SSH server (remote login program),  if not installed you need to run this in all nodes (Cluster0, Cluster1 and Cluster2) in order to install OpenSSH Server

root@Cluster0:~$ sudo apt­-get install openssh-server

7-Setting up SSH with no pass phrase for communication between nodes (Clusters): First we login with our new user (mpiuser):

root@Cluster0:~$ su - mpiuser              ; Change to the MPIUSER account

Then we generate DSA key for mpiuser:

root@Cluster0:~$ ssh­-keygen ­-t dsa

Leave passphrase empty.

Now, we need to add this key to authorized keys (copy user's key from id_pub.dsa file to authorized_keys file), these files are exist in .ssh directory (hidden directory).

mpiuser@Cluster0:~$   cd .ssh
mpiuser@Cluster0:~$~/.ssh$   cat   id_pub.dsa    >> authorized_keys

As the home directory of mpiuser in all nodes is the same (/mirror/mpiuser) , there is no need to run these commands on all nodes.

Note: We have copied the key of each cluster  found in its own  id_pub.dsa file, and pasted into each cluster authorized_keys file. So, the authorized_keys files in all the clusters are same, and contain all the clusters  id_pub.dsa files. This gave us ability to login remotely from cluster to another.

To test SSH run:    mpiuser@Cluster0:~$ ssh Cluster1 hostname

It should change  to remote hostname account, returns remote hostname (mpiuser@Cluster0:~$) without asking for passphrase (without password).

The Network

reconfigured

 and tested

 

 

 

 

The Performance

average was improved

OK

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ok, this issue is solved later

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 Week 22 - 23

 8-Installing required packages (GCC and MPICH2) : Install build-essential package: Make sure that the

network mode in the Virtual PC 2007 is set to NAT, to make a connection to the internet.

 

mpiuser@Cluster0:~$ sudo apt-get install build-essential

mpiuser@Cluster0:~$ sudo apt-get install mpich2

Note: if the cluster's IP address changed (set) by this command (ifconfig) as we described previously, then the cluster may not be

able to connect to the internet. If the internet is needed, you have to change the network mode in the Virtual PC 2007 to NAT, and

 then reboot (logout, and login aging). Now, you can install the required packages.

For testing our installation we run the following commands:

mpiuser@Cluster0:~$  which mpd             >>      /usr/bin/mpd
mpiuser@Cluster0:~$  which mpiexec       >>      /usr/bin/mpiexec
mpiuser@Cluster0:~$ which mpirun           >>      /usr/bin/mpirun

9-Setting up MPD: We created mpd.hosts file in mpiuser's home directory with nodes names: 

Cluster0
Cluster1
Cluster2


Then we ran :

root@Cluster0:~$ echo secretword=test  >> ~/.mpd.conf
root@Cluster0:~$ chmod 600 ~/.mpd.conf

10-Some Setting before Run MPI programs: Now we are ready to test some MPI programs, before

that we have to run the following command : 

mpiuser@Cluster0:~$ mpd &
mpiuser@Cluster0:~$ mpdtrace
mpiuser@Cluster0:~$ mpdallexit

After all run mpd daemon:

mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3  
where 3 is the clusters Number.mpiuser@Cluster0:~$ mpdtrace 
It should return all the clusters hostnames without errors
 
Cluster0
Cluster1
Cluster2

11-Run MPI programs: Now we are ready to test some MPI programs:     

============================================ ( hello2.c ) ==============================

mpiuser@Cluster0:/mirror$ mpicc -Wall -O -o hell02 hello2.c 
mpiuser@Cluster0:/mirror$ mpirun -np 3 hello2
 

mpiexec_Cluster0: cannot connect to local mpd (/tmp/mpd2.console_mpiuser); possible causes:

1. no mpd is running on this host

2. an mpd is running but was started without a "console" (-n option)

In case 1, you can start an mpd on this host with:

mpd &

and you will be able to run jobs just on this host.

For more details on starting mpds on a set of hosts, see

the MPICH2 Installation Guide.

 

mpiuser@Cluster0:/mirror$ mpd &
[1] 1477
 
mpiuser@Cluster0:/mirror$ mpirun -np 3 hello2
 
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory 
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory 
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory 
 
mpiuser@Cluster0:/mirror$ mpirun -np 3 /mirror/hello2
 
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 1 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster0)
 
mpiuser@Cluster0:/mirror$ cd
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
[1]+ Done mpd (wd: /mirror)
(wd now: ~)
 
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
 
mpiuser@Cluster0:~$ mpirun -np 3 /mirror/hello2
 
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster2)
Hello, I am 1 of 3 (hostname is Cluster1)
mpiuser@Cluster0:/mirror$ cd
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
[1]+ Done mpd (wd: /mirror)
(wd now: ~)
 
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
 
mpiuser@Cluster0:~$ mpirun -np 3 /mirror/hello2
 
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster2)
Hello, I am 1 of 3 (hostname is Cluster1)
 

 =============================== ( ParallelTotientRange ) ========================

mpiuser@Cluster0:/mirror/C+MPI$ ls
ParallelTotientRange2.c ParallelTotientRange3.c ParallelTotientRange.c
 
mpiuser@Cluster0:/mirror/C+MPI$ mpicc -Wall -O -o ParallelTotientRange3 ParallelTotientRange3.c
mpiuser@Cluster0:/mirror/C+MPI$ mpdboot -f mpd.hosts -n 3
 
unable to open (or read) hostsfile mpd.hosts

mpiuser@Cluster0:/mirror/C+MPI$ cd

mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3   

mpiuser@Cluster0:~$ mpdtrace

 

Cluster0
Cluster2
Cluster1
 

mpiuser@Cluster0:~$ cd /mirror/C+MPI

mpiuser@Cluster0:/mirror/C+MPI$ mpirun -np 2 /mirror/C+MPI/ParallelTotientRange3 1 10000

 

(hostname ... Cluster0)
(hostname ... Cluster2)
 
----------------------------------------------------
Sum of Totients  between [1..10000] is 30397485
Time: 4.959099 seconds
----------------------------------------------------

mpiuser@Cluster0:/mirror/C+MPI$ mpirun -np 3 /mirror/C+MPI/ParallelTotientRange3 1 10000

 

(hostname ... Cluster0)
(hostname ... Cluster2)
(hostname ... Cluster1)
 
----------------------------------------------------
Sum of Totients  between [1..10000] is 30397485
Time: 3.317835 seconds
----------------------------------------------------

 

Week 24 -25
Preparing the cloud for running (NFS->DFA) program: To run the NFS->DFA program, 
we need mpi (mpich2 is preferable) and Roomy ( http://roomy.sourceforge.net/ ). The 
README files in thetwo tarballs, explain in details how to compile and run the code.
There are also 3 small sample inputs in parser/input .We also need a java installation 
(the parser is written in java). So we  would first have to install mpich2, then Roomy, then 
run the parser on an slightly modified input file (the parser/README file shows how this
needs to be done) and then feed the file outputted by the parser in.

the input of the nfa-mindfa program (the nfa-mindfa/README file shows how to do this).

Steps for Cloud Preparing to run (NFS->DFA) program:
1-     Install Roomy. Roomy uses MPI for interprocess communication. Currently, Roomy 
has only been tested with MPICH2. http://sourceforge.net/apps/trac/roomy/wiki/RoomyInstall
2-     Download and Install parser. A program that breaks large units of data into smaller,
 more easily interpreted pieces.For example, a will browser reads documents prepared 
with a markup language (such as HTML). The markup language identiies the parts of the 
document (such as document headings, bulleted lists, or body text), but says nothing about 
how those portions of the document should appear on-screen. The parser reads the tagged
 text and formats the various portions of the document for on-screen display. See Hypertext 
Markup Language (HTML).A simple parser that translates a GAP-format file containing an 
NFA into a format that can be passed as inputto the Roomy-based NFA to min-DFA 
program. The implementation uses the java commons-lang-2.5 package.

3-     Install java. The parser is written in java.

Run and test  (NFS->DFA) program:
Compile the parser:
make
To run the parser, for example for input file inputs/a1.g and output 
file a1.out:
make run IN=inputs/a1.g OUT=a1.out
Then a1.out can be passed as input to the Roomy-based NFA to min-DFA 
program.
 
1. cd to the nfa-mindfa folder
2. make ROOMY=/usr/local/

3. modyfied params.in

4. mpd &

5. mpdboot -n 3 -f machines

6.mpdtrace

shows the 3 clusters successfully:

Cluster0

Cluster1

Cluster2


7. run the program (example1) in nfa-mindfa

mpiuser@Cluster0:/mirror/nfa-mindfa$ mpiexec -n 3 /mirror/nfa-mindfa/example a1.out

all Files In Dir couldn't open /mirror/roomy-data/roomy/locks/

error: : No such file or directory

rank 2 in job 1 Cluster0_44661 caused collective abort of all ranks

exit status of rank 2: killed by signal 9

=====================================================

mpiuser@Cluster0:/mirror/nfa-mindfa$ mpiexec -n 3 /mirror/nfa-mindfa/example a1.out

Tue Feb 7 13:33:07 2012: NFA to Expanded DFA

Could not create directory: /mirror/roomy-data/roomy/locks/

rank 2 in job 2 Cluster0_44661 caused collective abort of all ranks

exit status of rank 2: killed by signal 9

Fatal error in MPI_Recv: Other MPI error, error stack:

 

We have solved this issue by modifying  SHARED_DISK to 1 , in the  params.in file .

SHARED_DISK
should be 0 if running on a cluster of separate disks, should be 1if running on shared disk with multiple MPI processes.

 

mpiuser@Cluster0:/mirror/nfa-mindfa$ mpiexec -n 3 /mirror/nfa-mindfa/example a1.out

Tue Feb 7 15:52:36 2012: minimal DFA summary:

Tue Feb 7 15:52:36 2012: Total states: 412

Tue Feb 7 15:52:36 2012: Number initial states: 1

Tue Feb 7 15:52:36 2012: Number final states: 1

Tue Feb 7 15:52:37 2012: Minimal DFA :

Tue Feb 7 15:52:37 2012 [rank 0] 187

Tue Feb 7 15:52:37 2012: Size DFA: 412



Roomy Statistics

--------------------------------

Total wall clock run time: 25 m, 42 s, 306 ms, 265 us

Total number of syncs: 760

Barrier time for each process:

rank 0: 17 m, 34 s, 930 ms, 800 us

rank 1: 6 m, 22 s, 215 ms, 758 us

rank 2: 6 m, 5 s, 234 ms, 591 us

Remote write wait time for each process:

rank 0: 30 s, 524 ms, 250 us

rank 1: 3 m, 32 s, 322 ms, 172 us

rank 2: 6 m, 5 s, 224 ms, 782 us
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

All the steps 
were fine 
(successfully),
 but I've 
received 
some errors.
 Btw,
 the errors are 
not the 
same every 
run time:
 
 
 
 
 
 
 
 
The Problem 
solved by
setting 
PARAM SHARED_DISK
 to 1
 
 
 
 
 
 
 
 
 
412 the 
size of 
the DFA for 
the a1 example
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Week 25 -26

Experimental Results:

Parallel Disk-based Computations

Parallel disk-based computations were carried out on our private cloud; it contains a 3 nodes,

each node’s processors being an Intel Pentium dual core  CPU E2160@ 1.8Ghz. Nodes

had 1.2 GB of RAM at 300 Mb free, and 8GB of hard disk, with < 1 GB of hard disk free in Node0, and

 4.7 GB free in Nodes 2 and 3. The 3 Nodes ran Ubuntu 10.04 under Virtual PC 2007 (Vm).

 

This Table shows the sizes of the NFA and Min DFA, for example A1, and A2. All nodes have access to

a shared folder in the main node (Node 0) (/home/mirror/roomy-data)

 

example

NFA size

Min DFA

Real run time

1 Node

Real run time

2 Nodes

Real run time

3 Nodes

Notes

A1

802

412

3m11s

19m25s

36m42s

As the number of processors increases the runtime increases as well.

 

A2

3541

1236

27m27s

 

60m32s

 

64m47s

 

Experimental Results Log File Output:

 

NFA->DFA Results-Log
================ a1 1 Cluster ===========================
Wed Feb 22 12:38:24 2012: minimal DFA summary:
Wed Feb 22 12:38:24 2012: Total states: 412
Wed Feb 22 12:38:24 2012: Number initial states: 1
Wed Feb 22 12:38:24 2012: Number final states: 1
Wed Feb 22 12:38:24 2012: Minimal DFA :
Wed Feb 22 12:38:24 2012 [rank 0] 284
Wed Feb 22 12:38:24 2012: Size DFA: 412
Roomy Statistics
--------------------------------
Total wall clock run time: 3 m, 10 s, 962 ms, 185 us
Total number of syncs: 760
Barrier time for each process:
rank 0: 290 ms, 155 us
Remote write wait time for each process:
rank 0: 23 ms, 867 us

real 3m11.699s
user 0m0.640s
sys 0m0.720s

 

 

 

================ a1 2 Clusters ==========================
Wed Feb 22 14:02:21 2012: minimal DFA summary:
Wed Feb 22 14:02:21 2012: Total states: 412
Wed Feb 22 14:02:21 2012: Number initial states: 1
Wed Feb 22 14:02:21 2012: Number final states: 1
Wed Feb 22 14:02:21 2012: Minimal DFA :
Wed Feb 22 14:02:21 2012 [rank 0] 350
Wed Feb 22 14:02:21 2012: Size DFA: 412



Roomy Statistics
--------------------------------
Total wall clock run time: 19 m, 24 s, 334 ms, 40 us
Total number of syncs: 760
Barrier time for each process:
rank 0: 15 m, 3 s, 966 ms, 474 us
rank 1: 42 s, 245 ms, 802 us
Remote write wait time for each process:
rank 0: 14 s, 727 ms, 200 us
rank 1: 38 s, 219 ms, 477 us

real 19m25.314s
user 0m1.784s
sys 0m2.624s

================ a1 3 Clusters ==========================
Wed Feb 22 15:14:08 2012: minimal DFA summary:
Wed Feb 22 15:14:08 2012: Total states: 412
Wed Feb 22 15:14:08 2012: Number initial states: 1
Wed Feb 22 15:14:08 2012: Number final states: 1
Wed Feb 22 15:14:08 2012: Minimal DFA :
Wed Feb 22 15:14:08 2012 [rank 0] 187
Wed Feb 22 15:14:08 2012: Size DFA: 412

Roomy Statistics
--------------------------------
Total wall clock run time: 36 m, 42 s, 396 ms, 279 us
Total number of syncs: 760
Barrier time for each process:
rank 0: 29 m, 15 s, 354 ms, 847 us
rank 1: 18 m, 26 s, 44 ms, 404 us
rank 2: 6 m, 59 s, 560 ms, 706 us
Remote write wait time for each process:
rank 0: 25 s, 494 ms, 74 us
rank 1: 1 m, 55 s, 834 ms, 412 us
rank 2: 20 m, 20 s, 226 ms, 914 us

real 36m42.396s
user 0m3.616s
sys 0m5.296s

================ a2 1 Cluster ===========================
Fri Feb 22 19:01:00 2002: minimal DFA summary:
Fri Feb 22 19:01:00 2002: Total states: 1236
Fri Feb 22 19:01:00 2002: Number initial states: 1
Fri Feb 22 19:01:00 2002: Number final states: 1
Fri Feb 22 19:01:00 2002: Minimal DFA :
Fri Feb 22 19:01:00 2002 [rank 0] 214
Fri Feb 22 19:01:00 2002: Size DFA: 1236



Roomy Statistics
--------------------------------
Total wall clock run time: 27 m, 26 s, 926 ms, 111 us
Total number of syncs: 809
Barrier time for each process:
rank 0: 280 ms, 863 us
Remote write wait time for each process:
rank 0: 30 ms, 718 us



real 27m27.966s
user 0m1.184s
sys 0m1.692s
================ a2 2 Cluster ===========================
Wed Feb 22 20:20:13 2012: Total states: 1236
Wed Feb 22 20:20:13 2012: Number initial states: 1
Wed Feb 22 20:20:13 2012: Number final states: 1
Wed Feb 22 20:20:13 2012: Minimal DFA :
Wed Feb 22 20:20:14 2012 [rank 0] 1106
Wed Feb 22 20:20:14 2012: Size DFA: 1236

Roomy Statistics
--------------------------------
Total wall clock run time: 60 m, 32 s, 905 ms, 980 us
Total number of syncs: 809
Barrier time for each process:
rank 0: 41 m, 2 s, 221 ms, 819 us
rank 1: 1 m, 4 s, 915 ms, 694 us
Remote write wait time for each process:
rank 0: 15 s, 985 ms, 651 us
rank 1: 38 s, 58 ms, 568 us

real 60m32.905s
user 0m2.888s
sys 0m7.760s
================ a2 3 Cluster ===========================
Thu Feb 23 12:15:31 2012: minimal DFA summary:
Thu Feb 23 12:15:31 2012: Total states: 1236
Thu Feb 23 12:15:31 2012: Number initial states: 1
Thu Feb 23 12:15:31 2012: Number final states: 1
Thu Feb 23 12:15:32 2012: Minimal DFA :
Thu Feb 23 12:15:32 2012: Size DFA: 1236
Thu Feb 23 12:18:20 2012 [rank 2] 578

Roomy Statistics
--------------------------------
Total wall clock run time: 64 m, 47 ms, 458 us
Total number of syncs: 809
Barrier time for each process:
rank 0: 40 m, 38 s, 711 ms, 143 us
rank 1: 10 m, 45 s, 360 ms, 441 us
rank 2: 7 m, 36 s, 353 ms, 129 us
Remote write wait time for each process:
rank 0: 27 s, 928 ms, 816 us
rank 1: 3 m, 18 s, 737 ms, 479 us
rank 2: 8 m, 11 s, 924 ms, 484 us

 

real 64m47.458s
user 0m3.732s
sys 0m12.557s

 

 

Week 26 -27

 

Speed up improving (NFS->DFA) program: During the test stage we ran different examples, 
we found that  as the number of processors increases the runtime increases as well. Actually, 
the runtime increased because of the Barrier Time (overhead), the time it takes for all 
processes to synchronize for some operations (i.e. a parallel update, or parallel access).
There are 760 syncs for example A1 and 809 syncs for A2. So a sync means that all 
MPI processes wait for the local operation to reach acertain point in the code 
(like an MPI_Barrier).
 
We overcome this issue by using a non-shared disk, so each node can reads (parallel access),
and writes (parallel update) without need to wait the other nodes to finish its local operations.
 
1- Modifying  SHARED_DISK to 1 , in the params.in file.
2- Create a folder with the same name in each node (non-shared folder),
    /home/mpiuser/roomy-data
3- Modifying  params.in file  PARAM DISK_DATA_PATH   /home/mpiuser/roomy-data
 
 

example

NFA size

Min DFA

Real run time

1 Node

Real run time

2 Nodes

Real run time

3 Nodes

Notes

A2

3541

1236

27m27s

 

60m32s

 

64m47s

All nodes have access to a shared namespace

A2

3541

1236

43m2

 

25m32

 

21m

Each node has its own local disk

 

 

Week 28 -29

 

Moving to real, public cloud: 
At this stage we are ready to move to an OpenNebula-based cloud,since we
found that the Window Azure is not the right choice for our application,
we applied to access the KTH (OpenNebula-based)cloud.
At the same time we are studding the COMPS . It is a framework in the 
VENUS-C Platform enabling e-Science applications on the cloud. 
COMP Superscalar is a new version of GRID Superscalar which aims to 
easing the development of Grid applications. COMP Superscalar exploits
the inherent parallelism of applications when running them on the Grid.
 
COMP Superscalar:
 
 
COMPSs manual and other documentation

 

Week 30 -32

 

Visualizing NFA and DFA: 
It would be useful to tie in the NFA->DFA together with the 
visualisation tool into a COMPSs job,that would show our usage
of COMPSs and provide a nice way to visualise the result of
NFA->DFA program.
 
We found that Graphviz is a good tool to visualise the NFA->DFA
results. Therefore, we decided to draw a simple map for ting in
the NFA->DFA together with the visualisation tool.
 
At the beginning, we have a GAP format file (eg. a1.g), we used
parser to convert it to another readable format(eg  a1.in).
 
We have wrote a java program (AutomataToGraphviz.java) reads, 
and converts NFA, and DFA files to Graphviz scripts. 
That scripts used to generated gif files representing the 
NFA, and DFA states by using a Graphviz tool.
 
Parameters:
a1.g   (NFA in a GAP format  ; complex, and unreadable format)
a1.in  (NFA in a simple, readable format)
a1.dot (NFA in a Graphviz script)
a1.gif (NFA in a gif format for visualization)
 
mindfa.g   (DFA in a GAP format  ; complex, and unreadable format)
mindfa.out (DFA in a simple, readable format)
mindfa.dot (DFA in a Graphviz script)
mindfa.gif (NFA in a gif format for visualization)
 
 
Programs:
nfa.mindfa (NFA->DFA program )
AutomataToGraphviz.java (NFA, and DFA files --> to Graphviz scripts)
 
Our Flow Map:
 
1- a1.g --> Parser --> a1.in --> nfa.mindfa --> mindfa.g 
   mindfa.g --> Parser --> mindfa.out
   
2- a1.in --> AutomataToGraphviz.java --> a1.dot 
   a1.dot --> Graphviz ---> a1.gif
 
3- mindfa.out --> AutomataToGraphviz.java --> mindfa.dot 
   mindfa.dot--> Graphviz ---> mindfa.gif
 
 
At this stage, we could present any NFA, or DFA states in an image
(visualisation format).
 

 

Week  32-34

Start  KTH, PDC2 with Command Line Interface (CLI ).

 
- We have applied for PDC2 account on 26.Jan.2012 .
- We have got account access on 25.April.2012. 
- After getting the PDC2 account, we start to launch our instances.
 
PDC2 offers two client interfaces:
1- A web interface where users only need a browser.
2- Command Line Interface where the user has more control but have 
   to install client libraries.
General requirements :
 1- Putty (on windows)
 2- SSH (on linux)
 
After creating a cloud instance, we will need private/public keys 
in place to login with admin access, For security reasons like other
Cloud providers (eg. Amazon).
 
Guide for Command Line interface (CLI)for PDC Cloud (PDC2). 
We have followed the steps provided by PDC centre in this link:
 
1- RUBY installation 
2- PDC2  client libraries  
3- Setting up end point
4- Setting up credentials 
 
PDC2  client libraries:
 
Create a new user oneuser in our (PC) end point
 
  root@cluster0:~$ sudo adduser oneuser 
  root@cluster0:~$ su oneuser 
  root@cluster0:~$ pwd 
  root@cluster0:~$ /home/oneuser 
  root@cluster0:~$ su oneuser   # or, replace with any other name 
  oneuser@cluster0:~$ wget http://www.pdc.kth.se/resources  
                     /computers/pdc-cloud/pdc2client.tar.gz 
  oneuser@cluster0:~$ tar zxvf pdc2client.tar.gz  
  oneuser@cluster0:~$ cd pdc2client  
  oneuser@cluster0:~$ pdc2client$ ls   
   bin  etc  include  lib  share examples
 
  oneuser@cluster0:~$ cd pdc2client/examples 
  oneuser@cluster0 pdc2client/examples :~$ ls
   base-centos5.one  sercdev-centos5.one  sercservices-centos5.one
 
Setting up end point:
oneuser@cluster0:~$export ONE_XMLRPC=http://front.pdc2.pdc.kth.se:2633/RPC2
oneuser@cluster0:~$export ONE_LOCATION=/home/oneuser/pdc2client  
oneuser@cluster0:~$export PATH=$ONE_LOCATION/bin:$PATH
 
Setting up end credentials:
Client libraries read credentials from the ~/.one/one_auth 
file by default. 
 
root@Cluster0:~$ mkdir ~/.one 
root@Cluster0:~$ cat ~/.one/one_auth   
                 skloul:urpass    ;(note : Colon in between)
 
Note : User name and password are given by KTH PDC, once the PDC
        cloud account application form successfully accepted.
 
                                 
After a successfully installation, we able to execute the PDC 
virtual machines commands:
 
A- The first command (onecluster list) shows clusters in KTH PDC.
oneuser@Cluster0$ onecluster list
  ID     NAME
  0    default
  1      pdc
 
 
B- The 2nd command (onecluster list) shows images and OS 
   available in KTH PDC.
 
oneuser@Cluster0$ oneimage list
ID USER       NAME            TYPE REGTIME            PUB PER STAT #VMS
38 livenson cdmi-template      OS  Jun 03, 2011 09:22 No  No disa  0
13 oneadmin base-centos5       OS  Feb 25, 2011 09:04 Yes No used  5
14 oneadmin serc-services      OS  Feb 25, 2011 09:50 Yes No rdy   0
34 chgustaf RoboCloud_Tomcat   OS  May 17, 2011 11:55 Yes No rdy   0
39 oneadmin Debian-Squeeze     OS  Jun 07, 2011 14:12 No  No rdy   0
28 oneadmin Ubuntu-10.04       OS  May 04, 2011 14:47 No  No rdy   0
26 oneadmin Ubuntu-Hardy       OS  Apr 12, 2011 14:14 Yes No used  1
33 chgustaf RoboCloud_HAProxy  OS  May 17, 2011 11:42 Yes No used  1
40 oneadmin debiansqueeze      OS  Jun 07, 2011 15:09 Yes No used  5
41 oneadmin cdmi-v1            OS  Jun 10, 2011 10:02 Yes No rdy   0
42 livenson cdmi-v2            OS  Jul 01, 2011 10:27 Yes No used  1
43 oneadmin VenusC_Debian_Base OS  Nov 28, 2011 14:52 Yes No rdy   0
47 oneadmin venuscdebianbase   OS  Mar 01, 2012 14:35 Yes No rdy   0
12 oneadmin sercdev-centos5    OS  Feb 25, 2011 08:46 Yes No rdy   0
48 oneadmin ttylinux           OS  Mar 21, 2012 13:15 Yes No rdy   0
52 oneadmin venuscdebian2      OS  Apr 19, 2012 17:48 No  No rdy   0
 
 
SSH Keys to Cloud instance: 
We need private/public keys for login to a PDC Cloud (PDC2) Instance
 - Generate ssh keys for login on PDC2 instance 
 root@cluster0$ su oneuser      
 oneuser@Cluster0$ ssh-keygen 
 Generating public/private rsa key pair.
 Enter a file in which to save the key (/home/oneuser/.ssh/id_rsa):  
  Note: # Hit enter or write alternate path
 Enter passphrase (empty for no passphrase):    
  Note: # write strong/easily memorisable password 
After that:
  oneuser@Cluster0$ cat /home/oneuser/.ssh/id_rsa.pub   
 
  send the output of above to PDC support along with your PDC2 username.

																									
Creating the 1st instance:
oneuser@Cluster0:~/pdc2client$ cd examples 
oneuser@Cluster0:~/pdc2client/examples$ ls
base-centos5.one baseubuntu.one list.txt onevm-list 
sercservices-centos5.one base-debian.one error.txt 
list-txt.txt sercdev-centos5.one
 
We need to edit, and make the following changes in 
the base-debian.one file for each instance:
instance 1:
NAME = Cluster0
MEMORY = 1024
PDC"USER = skloul
 
oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one
 
instance 2:
NAME = Cluster1
MEMORY = 1024
PDC"USER = skloul
oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one
 
instance 3:
NAME = Cluster2
MEMORY = 1024
PDC"USER = skloul
 
oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one
 
 
C- The 3rd command (onevm list m) shows our instances in KTH PDC.
 
oneuser@Cluster0$ onevm list m
ID   USER    NAME   STAT CPU  MEM      HOSTNAME  TIME 
545 skloul Cluster0 runn 99  1020.3M   nebula11  06 05:33:52
573 skloul Cluster1 runn 0   1020.3M   nebula10  02 01:05:06
576 skloul Cluster2 runn 99  1020.3M   nebula6   00 01:36:15

 

Week  34-35

Prepare 3 nodes on KTH Cloud:
 
1- Create 3 instances (cluster0,1,2)using Debian Os.
2- Prepare each node (cluster0,1,2) individually.
   - Define a new user (cmpiuser)  in each node.
     root@Cluster0$sudo adduser cmpiuser
     root@Cluster1$sudo adduser cmpiuser
     root@Cluster2$sudo adduser cmpiuser
 
   - Create a new directory (/mirror) in each node.
     root@Cluster0$sudo mkdir /mirror
     root@Cluster1$sudo mkdir /mirror
     root@Cluster2$sudo mkdir /mirror
 
   - Copy all mpi examples in /mirror directory in each node.
        root@Cluster0$ ls /mirror
 
        root@Cluster1$ ls /mirror
        
        root@Cluster2$ ls /mirror
 
 
   - Generate DSA key for cmpiuser in each node (ssh-keygen -t dsa).
   - Copy public keys (id_pub.dsa) to the node's authorized_keys
         
     cmpiuser@Cluster0:~$ ssh­-keygen ­-t dsa
     Leave passphrase empty. Next we add this key to authorized keys: 
     cmpiuser@Cluster0:~$ cd .ssh
     cmpiuser@Cluster0:$~/.ssh$ cat id_pub.dsa >> authorized_keys
 
     cmpiuser@Cluster1:~$ cd .ssh
     cmpiuser@Cluster1:$~/.ssh$ cat id_pub.dsa >> authorized_keys
 
     cmpiuser@Cluster2:~$ cd .ssh
     cmpiuser@Cluster2:$~/.ssh$ cat id_pub.dsa >> authorized_keys
 
     ssh-dss AAAAB3Nza ... etc Yr1FSq0s= cmpiuser@Cluster0
     ssh-dss AAAAB3Nza ... etc  lxm4JCpG cmpiuser@Cluster1
     ssh-dss AAAAB3Nza ... etc quTk9uQ== cmpiuser@Cluster2


   - Copy public keys (id_pub.dsa) from each node's cmpiuser to the
     main node's authorized_keys (cmpiuser@Cluster0:$~/.ssh$)
 
   - cmpiuser@Cluster0:$~/.ssh$ cat authorized_keys
 
   - Install build-essential package in each node (Cluster0,1,and 2).
     cmpiuser@Cluster0:$~ sudo apt-get install build-essential
     cmpiuser@Cluster1:$~ sudo apt-get install build-essential
     cmpiuser@Cluster2:$~ sudo apt-get install build-essential
 
   - Install MPICH2 package in each node (Cluster0,1,and 2).
     cmpiuser@Cluster0:$~ sudo apt-get upadet
     cmpiuser@Cluster0:$~ sudo apt-get install mpich2
     cmpiuser@Cluster1:$~ sudo apt-get upadet
     cmpiuser@Cluster1:$~ sudo apt-get install mpich2
     cmpiuser@Cluster2:$~ sudo apt-get upadet
     cmpiuser@Cluster2:$~ sudo apt-get install mpich2
 
   - Define hostnames in etc/hosts 
     root@Cluster0:$~ nano /etc/hosts
     add nodes address in cluster0's hosts file
     root@Cluster0:$~ cat /etc/hosts
        127.0.0.1 localhost
        130.237.221.253 repo.pdc2.pdc.kth.se
        192.168.2.220 nfscloud
        130.237.221.240 Cluster0
        130.237.221.253 Cluster1
        130.237.221.234 Cluster2

        # The following lines are desirable for IPv6 capable hosts
          ::1 ip6-localhost ip6-loopback
          fe00::0 ip6-localnet
          ff00::0 ip6-mcastprefix
          ff02::1 ip6-allnodes
          ff02::2 ip6-allrouters
          ff02::3 ip6-allhosts
 
3- Test our installation run for each node:
     cmpiuser@Cluster0:$~  which mpd
     cmpiuser@Cluster0:$~  which mpiexec
     cmpiuser@Cluster0:$~  which mpirun
 
     cmpiuser@Cluster1:$~  which mpd
     cmpiuser@Cluster1:$~  which mpiexec
     cmpiuser@Cluster1:$~  which mpirun
    
     cmpiuser@Cluster2:$~  which mpd
     cmpiuser@Cluster2:$~  which mpiexec
     cmpiuser@Cluster2:$~  which mpirun
 
4- Setting  up MPD:
    - Create mpd.hosts in cmpiuser's home directory with nodes names
       cmpiuser@Cluster0:$~ cat mpd.hosts
       Cluster0
       Cluster1
       Cluster2
 
   - Then run : 
       cmpiuser@Cluster0:~$ echo secretword=skloul >> ~/.mpd.conf
       cmpiuser@Cluster0:~$ chmod 600 ~/.mpd.conf
 
5- Test MPD by typing the following commands:
          
     cmpiuser@Cluster0:$~  mpd &
     cmpiuser@Cluster0:$~  mpdtrace 
     The output should be the current hostname
     cmpiuser@Cluster0:$~  mpdallexit 

After all run mpd daemon:

    cmpiuser@Cluster0:$~ mpdboot ­n 3  
 
We should be in home directory, where mpd.host file exists,
Otherwise, use mpdboot ­n 3 -f mpd.host 
    cmpiuser@Cluster0:$~ mpdtrace
 
The output should be name of all nodes. 
 
If this doesn't succeed try running the following linux commands 
to list any mpd currently running in any node else. 
mpd should be run (once) in the main node only (Cluster0).
 
ps su | grep mpd   ; run this on all hosts , 
kill -9  4323      ; then use kill to delete any running mpd 
                   ; where 4323 id # of running mpd

cmpiuser@Cluster0:~$ which mpd
/usr/bin/mpd
cmpiuser@Cluster0:~$ which mpicc
/usr/bin/mpicc
cmpiuser@Cluster0:~$ which mpirun
/usr/bin/mpirun
cmpiuser@Cluster0:~$ mpdtrace

mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_cmpiuser); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.

 
Note: This error happens, because of  many mpd are running, or the 
host name 
 
(Cluster0) is not defined in /etc/hosts.

cmpiuser@Cluster0:/mirror$ ssh cluster1

 

 

Week  35-36

Run MPI programs: Now we are ready to test some MPI programs on KTH Cloud:

   

1- The first program we have tested  hello2.c
cmpiuser@Cluster0:/mirror$ ls

hello.c    mpd.hosts   ParallelTotientRange3.c 
cmpiuser@Cluster0:/mirror$ mpcc -Wall -O -o /hello2 hello.c
 
Instead of compile the hello.c in each node, we can copy the binary 
file (hello), to the other nodes (cluste1, and 2).
 
cmpiuser@Cluster0:/mirror$ scp hello2  cmpiuser@Cluster1:/mirror
cmpiuser@Cluster0:/mirror$ scp hello2  cmpiuser@Cluster2:/mirror
 
Run hello in all the nodes parallel, from the main node (Cluster0)
 
cmpiuser@Cluster0:/mirror$ mpirun -n 1 ./hello2
Hello, I am 0 of 1 (hostname is Cluster0)

cmpiuser@Cluster0:/mirror$ mpirun -n 2 ./hello2
Hello, I am 0 of 2 (hostname is Cluster0)
Hello, I am 1 of 2 (hostname is Cluster2)

cmpiuser@Cluster0:/mirror$ mpirun -n 3 ./hello2
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 1 of 3 (hostname is Cluster2)
Hello, I am 2 of 3 (hostname is Cluster1)

cmpiuser@Cluster0:/mirror$ 
 
2- The first program we have tested  ParallelTotientRange3.c

cmpiuser@Cluster0:/mirror$ mpirun -np 1 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)

----------------------------------------------------
Sum of Totients between [1..1000] is 304191
Time: 0.055068 seconds
----------------------------------------------------

cmpiuser@Cluster0:/mirror$ mpirun -np 2 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)
(hostname ... Cluster2)

----------------------------------------------------
Sum of Totients between [1..1000] is 304191
Time: 0.028609 seconds
----------------------------------------------------
cmpiuser@Cluster0:/mirror$ mpirun -np 3 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)
(hostname ... Cluster2)
(hostname ... Cluster1)

Summary of KTH PDC2 Cloud speedup tests:
 
These are the TotientRange function speed up in 1, 2 , and 3 instances
(vms) on KTH PDC2 Cloud

 Tabel(1) Summary:
 -----------------------------------------------------------------
 Instance     Rang     Rang     Rang     Rang     Rang     Rang
  Vm        1-10000   1-15000  1-20000  1-25000  1-30000  1-50000
 -----------------------------------------------------------------
   1        7.6 Sec    18.1     33.5     54.0     79.7     236.5
 -----------------------------------------------------------------
   2        3.9 Sec    9.4      17.3     27.9     41.2     122.1
 -----------------------------------------------------------------
   3        2.6 Sec    6.1      11.4     18.4     27.1     80.6
 -----------------------------------------------------------------
Note: The above results, show that KTH pdc2 cloud (for 3 nodes) is faster
than the our private cloud (for 3 nodes).
 

 

Week 

 KTH  public cloud  v Our Private Cloud:

 
 NFA->DFA application Speedup:

Cloud

example

NFA size

Min DFA

Real run time

1 Node

Real run time

2 Nodes

Real run time

3 Nodes

Notes

Our

Private

A2

3541

1236

43m2

 

25m32

 

21m

 

 

KTH

A2

3541

1236

39m48

 

10m54

 

6m

 
 
Total, and available Storage:

Cloud

Total

 Main

Memory

   Mb

Used

 Main

Memory

   Mb

Free

 Main

Memory

   Mb

 

Total

 Disk

Size

   Gb

Used

 Disk

Size

   Gb

Free

 Disk

Size

   Gb

Notes

Our

Private

1024

403

621

7.4

 

4.2

 

2.7

 

 

KTH

1024

73

947

4.0

 

2.4

 

1.2

 

 

 

 

 

 

Week