COMPSs manual and other documentation
Project News:
|
First Part of the Project |
| Date | Descriptions | Comments |
| 1.09.2011 | The project (kick off) start | √ |
| Week 01 |
Overall Amazons and Windows Cloud Computing review. Windows Azure installation and testing. |
√ |
| Week 02 |
Overall Amazons and Windows Cloud Computing review. Windows Azure installation and testing. |
√ |
| Week 03 |
SymGrid-Par installation on x64 (lxpara3, bwlf01-32 )and x32 (lxpara02,Desktop Vm) bits machines |
√ |
|
Private cloud installation for three clusters in one machine. The used Tools are,Vmware, Ubuntu, OpenNebula and MPI. |
√ |
|
Week 04 |
SymGrid-Par under Vm and some MPI tests. ------------------------------------------------------------------------------------------------ During the MPI installation under Ubuntu: We have tested a hello message passing between 2-6 proc. These steps we have followed to run that test: mpicc -Wall -O -o hello hello.c mpirun -np 2 ./hello <------------------ 2 -4 num. of processors.
1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) In case 1, you can start an mpd on this host with: mpd & $ mpd& $ cd $HOME $ touch .mpd.conf $ chmod 600 .mpd.conf vi hostnames // add the cluster node name e.g cluster01....... cluster0n mpirun -machinefile hostnames ./hello mpirun -np 2 ./hello <------------------ 2 -4 num. of processors
telnet cluster03 80 ------------------------------------------------------------------------------------------------------------------------- During symGrid-Par installation, first we have to know if it is a x32 or x64 bits machine: Anyway, in both machines installation, we have faced some issues since not all the files are probably installed (may bcz of some ubuntu packages are not 'already' installed ). 1- We have upgraded the ubuntu to the lates version. 2- These files are missed and we have to copy them according to their paths: ~/SGP_v0.3.2/bin/sgp_admin.sh ~/SGP_v0.3.2/etc/sample-sgprc ~/SGP_v0.3.2/bin/CoordinationServer_pp 3- We have updated line 156 in CoordinationServer_pp file (the right path to LINUX/ hwloidl=CoordinationServer_pp )
~/SGPDir/SGP_v0.3.2/bin$ vi CoordinationServer_pp
#$executable = '/u1/staff/hwloidl/BUILDS/SGP/tx32b/SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/ hwloidl=CoordinationServer_pp'; $executable = '/var/lib/one/SGPDir/SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/hwloidl=CoordinationServer_pp'; 4- These files are missed too /SGP_v0.3.2_BUILDS/pvm3/bin/LINUX/hwloidl=CoordinationServer_pp . /SGP_v0.3.2/bin/testClient /SGP_v0.3.2_BUILDS/SymGrid-Par-v0.3.2/SCSCP/testClient 5- Finally, at this stage everything is running fine as we expect: The following tests are done: - Sequential Fibonacci computation on the GAP server - This uses the Karatsuba algorithm for performing (sequential) polynomial multiplication on 2 random polynomials of degree 10. - Skeletons:parMapFold: A range of skeletons is supported by the Coordination Server (see Deliverable 5.8 for details). This example computes sumEuler over the list [87,88,89], with 0 as neutral element |
√ Ok
√ Ok
|
| Week5 |
- GAP client test - Timings test - Parallel sumEuler test |
Suspended |
| Week6 |
Trying to create and deploy some applications in windows azure, using Windows Xp SP2 and V. Studio 2010.Unfortuantlay, we couldn’t enable the azures tools, because of the Windows XP O.S. In Microsoft windows azure homepage, we found that Windows XP sp2 is one of the operating systems that the windows azures supported, but in fact that is not true. Windows Xp Sp2 is not supported.
- Installing private cloud in three physical machines (3 clusters) using Vmware workstation, Ubuntu and OpenNebula. We have installed the OpenNebula package in the Host machine (cluster01), while opennebula-node package has been installed in Cluster02, and cluster03. During the test stage, we couldn’t make a successful connection despite changing the setting of the network connection to a bridge rather than NAT connection.
- We think that is a routing issue. It might be not possible to make routing from Vmware to another Vmware in a different machine!!!!!! Yes of course we could do routing between three nodes (clusters) in the Vmware, but that in one physical machine.
- Deploying a Vmware workstation image to Vm role in windows azure. Vmware image format is not supported (Some pages say, Vmware makes a conflict with windows azure). Therefore, VHD (virtual hard disk) is the only image that supported in windows azure. VHD can be created by one of the Windows server 2008, Windows 7 or virtual PC 2007 software under any other Microsoft O.s (e.g Windows XP).
- Also, we have applied to access the Vm role Beta Program in windows azure since 3 days, but we didn’t receive any feedback yet. At the time being the status shows that, it is pending.
VM Role Description: The VM Role Beta Program includes a new Windows Azure role that allows you to upload a custom virtual hard disk image of a Windows Server 2008 R2 virtual machine and run it in Windows Azure. By checking the box to opt-in to the VM Role Beta program, you accept the license terms for your use of the Windows Server 2008 R2 software in the VM Role Beta Program.
Status Pending |
Failed
Ok now is Solved see week 10-11
Ok Solved see week 10-11
√ Ok , this issue is solved now
|
| Week7 |
- Creating VHD using virtual PC 2007 software. At the moment, we are going to test the installation of Ubuntu and MPI as one image (VHD), then we will upload it to the Vm role. - Moving to windows 7 server. - Creating private cloud in three physical machines, without use of the Vmware.
|
PASS Not supported for Vm role See week 10-11 |
|
Week 8-9
|
During this time, we have successfully created two virtual hard disks (VHDs), one using Window7 server x64 bit, and the other using Microsoft free Virtual Pc 2007 (VPC2007). Each VHD includes some virtual nodes (Clusters), each Cluster has Ubuntu as an O.S + MPI user in the main node (Node0), and this user is accessible from each cluster. All the nodes are communicated to each other successfully. All these nodes (Clusters) were created as all-in-one image (VHD).
- We have created the required thumbprint and any needed certificates (e.g X. 509, X. 511) for deploying any VM role or other role in windows azure.
- After the needed certificates creation, we have deployed that two VHDs images, each VHD as all-in-one image in the windows azure platform (Vm Role).
- We have deployed the two images successfully, using this command from windows azure’s tool kit command line : C:\Program Files\Windows Azure SDK\v1.5>csupload Add-VMImage -Connection "Subscr iptionId=c4dee5b0-4b46-497a-a208-0ba23c76f851; CertificateThumbprint=3BCB0B8F7C2 F9CAB7085D16BC001A3C39BA0DB2E" -Description "CALcIUM VHD" -LiteralPath "D:\My Vi rtual Machines\Virtual Machines-HardDisks\VHDubu.vhd" -Location "North Europe" - TempLocation %TEMP% -SkipVerify
- The message below shows that the VHD is deployed successfully.
Created new VM image VHDubu.vhd in location North Europe. Creating new page blob of size 1259920896... Elapsed time for upload: 00:01:34 Successfully uploaded and committed the VM image. Name : VHDubu.vhd Label : VHDubu.vhd Description : CALcIUM VHD Location : North Europe Status : Committed Uuid : 8e550cc6-0b06-e111-818d-aed357bc6aa1 Timestamp : 2011-11-03T11:05:43Z MountSizeInBytes : 8388608000 CompressedSizeInBytes : 1259920896
- Despite the successful deploying of the two VHDs, the windows azure portal (Platform explorer) shows that, our VM role status is committed, and In use is False.
Issues, and the proposed solutions: - During the one day training (Windows Azure Bootcamp - Powered by Tech.Days on 11th November 2011 at John McIntyre Conference, Edinburgh First, Pollock Halls), I discussed CALcIUM pilot for VENUS-C project aims, and our plan for deploying a VHD includes (Virtual clusters, where Ubuntu is the main O.S for each node) as we mentioned above with Mr. Planky, who is the Windows Azure Bootcamp training presenter (Instructor). He is a cloud computing expert from Microsoft in London (http://plankytronixx.com/aboutus.aspx). He said that the idea (our proposed Ubuntu VHD) is not possible in windows azure, since that windows azure doesn’t support any Linux o.s or applications!. He suggested to try Amazon, or Vmware clouds. I’ve already contacted Amazon and VMware cloud support team, for any suggestion regarding.
- Also, we have received an answer (Email) from Microsoft windows azure support team regarding the Vm role issue (status is committed, and In use is False). He “ jaganathan” said that “Currently there is no support of Linux VHDs. The product team is looking at enhancing VM role and I will forward them your request for supporting linux VHD. “ jaganathan. - We have guessed! May be, if we create a VHD using Hyper-V under Window server 2008 R2, it may work probably, but we don’t know how much it will be reliable, and fast. We suggest that as a last try with Vm role in Windows azure.
|
√ Ok |
| Week 10-11 |
Finally, we have successfully installed our private cloud in 3 Physical machines. The issue that we faced during week 6, which we couldn’t mange a communication between the virtual nodes (Clusters). It was mainly because of the Networking 3 modes in the virtual machine s.w (Vmware and VPC2007). Both Vmware and VPC2007 software supports 3 modes for networking adapter. 1- Local Only: This mode enables the virtual nodes to communicate with each other if they all are installed (created) in one physical machine. But, they (nodes) can’t make any outside connection (e.g: No Internet access, No any other server access) unless, the mode changed to NAT for instant (to make such connection).
2- Intel(R) 82567LM-3 Gigabit Networking Connections: This mode enables the virtual nodes to communicate with each other, if each node is installed (created) in a different physical machine. They (nodes) can’t make any outside connection as the case in local only mode (e.g: No Internet access, No any other server access) unless, the mode changed to NAT for instant (to make such connection).
3- Shared Networking (NAT): This is the default mode at the node (cluster) booting time, start up. This mode enables the internet access and other servers access (e.g root@cluster1$ ssh user@macs.hw.ac.uk)
Note: Changing the networking Mode from any other mode to NAT and vice versa, doesn’t affect the network or the communication between the nodes (Clusters), provided that, in any network mode the NAT should be the cluster’s booting mode (Node should use NAT mode at booting time, start up). Otherwise, the cluster (virtual node) will given a different IP address, that will cause a conflict in the network, and made the communication between the nods (Clusters) impossible (e.g : root@cluster1$ ssh mpisuer@cluster2 -> error unknown host name, or time out error).
- At this stage, we can confirm that we can go ahead using our private cloud for CALcIUM project. Also, We still have a hope to continue using windows azure platform, if we could create VHDs using Hyper-V and windows server 2008 R2, this is what we are suggested for week 12.
- Also, we need to discuss the possibility of using MPI library, for C# to mange clustering in Microsoft cloud computing, since C# is a member of the .Net which is supported by windows azure platform and toolkit. |
√ Ok |
| Week12 - 13 |
- Since we have told by windows azure support team, that the only tool to create a successful VHD "supported by windows azure" is Hyper-v, server2008 R2, we have downloaded, and installed both Windows server 2008 R2 and Hyper-V (160 days free trial version) for test. The installation passed (successfully). Note: Important For use in Windows Azure VM role instances, the operating system that is installed on the base VHD must be an English edition of one of the following: Windows Server 2008 R2 Standard, Windows Server 2008 R2 HPC Edition, or Windows Server 2008 R2 Enterprise: For more details see this link: http://technet.microsoft.com/en-us/library/hh184311(WS.10).aspx - Creating a virtual machine or VHD by Hyper-V under Windows server 2008.We still facing a problem in creating either a Vm, and VHD. !! - Start of applications in our private cloud, we mainly focus on Finite state automata (NFS). Understanding and Installing the needed packages (s.w) for NFS. |
Ubuntu, or any other windows OS is not Supported
In Progress
In Progress |
|
Week 14 - 16
|
- We
have tested totient function
http://en.wikipedia.org/wiki/Euler%27s_totient_function
. These are some of the results: -We have successfully created a virtual machine, and VHD by Hyper-V under Windows server 2008. C:\Program Files\Windows Azure SDK\v1.5>csupload Add-VMImage -Connection "Subscr iptionId=c4dee5b0-4b46-497a-a208-0ba23c76f851; CertificateThumbprint=3BCB0B8F7C2 F9CAB7085D16BC001A3C39BA0DB2E" -Description "CALcIUM VHD" -LiteralPath "D:\My Vi rtual Machines\HyperV Vm.vhd" -Location "North Europe" -TempLocation %TEMP% -Sk ipVerify Windows(R) Azure(TM) Upload Tool version 1.5.0.0 for Microsoft(R) .NET Framework 3.5 Copyright (c) Microsoft Corporation. All rights reserved.
Using image name 'HyperVVm.vhd' Using temporary directory C:\Users\macsadmin\AppData\Local\Temp... Preparing VHD D:\My Virtual Machines\HyperV Vm.vhd... The mounted size of the VM image is 20 GB. This image can be used with the follo wing Windows Azure VM sizes: Small, Medium, Large, ExtraLarge
Windows(R) Azure(TM) VHD Preparation Tool. version 1.5.0.0 for Microsoft(R) .NET Framework 3.5 Copyright (c) Microsoft Corporation. All rights reserved.
Created new VM image HyperVVm.vhd in location North Europe. Creating new page blob of size 4460148224... Elapsed time for upload: 00:05:08 Successfully uploaded and committed the VM image. Name : HyperV Vm.vhd Label : HyperV Vm.vhd Description : CALcIUM VHD Location : North Europe Status : Committed Uuid : bbaa01d7-14ab-4eb9-869f-e866eeefb31b Timestamp : 2011-12-22T20:23:58Z MountSizeInBytes : 21474836480 CompressedSizeInBytes : 4460148224
C:\Program Files\Windows Azure SDK\v1.5>
|
Ok
Ok
|
|
Week 17 - 18
|
Break |
|
| Second Part of The Project |
| Date | Descriptions | Comments |
| 09.01.2012 |
The project 2nd Part start date |
√ |
| Week 19 |
- Testing the first stand alone project in windows azure Vm role with 2 instances. |
Ok |
| Week 20-21 |
After the first test for our cloud in week 14-16 ( totient function), we have decided to improve our cloud performance: - We have re-installed our private cloud network, by moving from MACS Network to a separate network. This network is a a switch contains 3 machines (Cluster0, Cluster1 and Cluster3), all are associating in one Monitor, keyboard, and mouse by using a Belkin Omnicube KVM 4 Port Switch for PS/2 Compone http://www.amazon.com/Belkin-Omnicube-Port-Switch-Components/dp/B00004Z84S. - Images of our Network (Image1, Image2) - In the previous network all the clusters are associated one hard disk (H:\ Drive). In the new network, we have installed each Cluster in a separate hard disk (D:\).
Setting Up an MPICH2 cluster in our private cloud: We have followed the Ubuntu Community documentation : 1- https://help.ubuntu.com/community/MpichCluster 1- Defining hostnames in each Cluster etc/hosts/ $nano /etc/hosts 127.0.0.1 localhost 192.168.131.65 Cluster0 192.168.133.66 Cluster1 192.168.133.67 Cluster2
Note: if the Virtual machine IP address was not set correctly, use the following command to set it according to its IP address in the /etc/hosts file: Ex: root@cluster:~$ ifconfig ethx 192.168.131.65 netmask 255.255.255.0 ethx (x is 0,1,2,..........etc). By the way; if the cluster's IP address changed (set) by this command (ifconfig), then the cluster may not be able to connect to the internet. If the internet is needed, you have to change the network mode in the Virtual PC 2007 to NAT, and then reboot (logout, and login aging). 2-Installing NFS: NFS (Network file System) allows us to create a folder (/mirror) on the master node (Cluster0) and have it synced on all the other nodes. This folder can be used to store programs. To Install NFS just run this in the master node's terminal: root@Cluster0:~$ sudo apt-get install nfs-kernel-server
3-Sharing Master Folder: Make a folder in all nodes (/mirror), we'll store our data and programs in this folder. root@Cluster0:~$ sudo mkdir /mirror root@Cluster1:~$ sudo mkdir /mirror root@Cluster2:~$ sudo mkdir /mirror And then we share the contents of this folder located on the master node to all the other nodes. In order to do this we first edit the /etc/exports file on the master node to contain the additional line /mirror *(rw,sync) This can be done using vim or by issuing this command: root@Cluster0:~$ sudo echo /mirror *(rw,sync) >> /etc/exports Note: Than we store out data and programs only in master node and other nodes will access them with NFS: 4-Defining a user for running MPI programs: We define a user with same name and same userid : root@Cluster0:~$ sudo adduser mpiuser root@Cluster1:~$ sudo adduser mpiuser root@Cluster2:~$ sudo adduser mpiuser
Notes: 1- We gave the same password 12345 to the user in all the Clusters (easy remember). 2- Change the mpiuser to be owned by itself to solve the privileges issue, (~mpiuser needs to be owned by mpiuser): root@Cluster0:~$ sudo chown –R mpiuser ~mpiuser 3- Also, We need to change the owner of /mirror to mpiuser. root@Cluster0:~$ sudo chown mpiuser /mirror 5-Mounting /master in nodes: Now all we need to do is to mount the folder on the other nodes. This can be done manually each time like this: root@Cluster1:~$ sudo mount Cluster0:/mirror /mirror ;During this step we have received this Error mount: wrong fs type, bad
option, bad superblock on Cluster0:/mirror, To solve this issue, we have followed the Setting Up NFS in Ubuntu Documentation Community : https://help.ubuntu.com/community/SettingUpNFSHowTo Steps for NFS Setting in Ubuntu:: Install the required packages... 1- NFSv4 server: Install the required packages for the Server: root@Cluster0:~$ sudo apt-get install nfs-kernel-server ;Already Done NFSv4 exports exist in a single pseudo filesystem, where the real directories are mounted with the --bind option. Let's say we want to export our files in /mirror directory. First we create the export filesystem: root@Cluster0:~$ mkdir -p /export/mirror and mount the real users directory with: root@Cluster0:~$ mount --bind /home/mpiuser /export/mirror
To save us from retyping this after every reboot we add the following line to /etc/fstab ; Only in Cluster0 /mirror /export/mirror none bind 0 0 There are three configuration files that relate to an NFSv4 server: /etc/default/nfs-kernel-server, /etc/default/nfs-common and /etc/exports.
In /etc/default/nfs-kernel-server we set: NEED_SVCGSSD=no # no is the default
because we are not activating NFSv4 security this time. In /etc/default/nfs-common we set:
NEED_IDMAPD=yesNEED_GSSD=no # no is the default because we want UID/GUID to be mapped from names. In order for the ID names to be automatically mapped, both the client (Cluster1, Cluster2) and server (Cluster0) require the /etc/idmapd.conf file to have the same contents with the correct domain names. Furthermore, this file should have the following lines in the Mapping section: · [Mapping] · · Nobody-User = nobody · Nobody-Group = nogroup Now restart the service: · root@Cluster0:~$/etc/init.d/nfs-kernel-server restart ;Cluster0 · Note: Also you can use stop, start and restart Commands to check the NFS in the server 2- NFSv4 client: Install the required packages for the Clients( ;Cluster1, and Cluster2): · root@Cluster1:~$ apt-get install nfs-common ;Cluster1 · root@Cluster2:~$ apt-get install nfs-common ;Cluster2 The client needs the same changes to /etc/default/nfs-common to connect to an NFSv4 server. · In /etc/default/nfs-common we set: · NEED_IDMAPD=yes NEED_GSSD=no # no is default On the client we can mount the complete export tree with one command: · root@Cluster2:~$ mount -t nfs4 -o proto=tcp,port=2049 nfs-server:/ /mnt load the nfs module by giving the command root@Cluster0:~$ modprobe nfs root@Cluster1:~$ modprobe nfs root@Cluster2:~$ modprobe nfs To make sure that the module is loaded at each boot, simply add nfs on the last line of /etc/modules. Mounting /master in nodes: Now we can mount the folder on the other nodes. root@Cluster0:~$ sudo mount Cluster0:/mirror /mirror root@Cluster1:~$ sudo mount Cluster0:/mirror /mirror root@Cluster2:~$ sudo mount Cluster0:/mirror /mirror
6-Installing SSH Server: SSH server (remote login program), if not installed you need to run this in all nodes (Cluster0, Cluster1 and Cluster2) in order to install OpenSSH Server root@Cluster0:~$ sudo apt-get install openssh-server 7-Setting up SSH with no pass phrase for communication between nodes (Clusters): First we login with our new user (mpiuser): root@Cluster0:~$ su - mpiuser ; Change to the MPIUSER account Then we generate DSA key for mpiuser: root@Cluster0:~$ ssh-keygen -t dsa Leave passphrase empty. Now, we need to add this key to authorized keys (copy user's key from id_pub.dsa file to authorized_keys file), these files are exist in .ssh directory (hidden directory). mpiuser@Cluster0:~$ cd .ssh mpiuser@Cluster0:~$~/.ssh$ cat id_pub.dsa >> authorized_keys As the home directory of mpiuser in all nodes is the same (/mirror/mpiuser) , there is no need to run these commands on all nodes. Note: We have copied the key of each cluster found in its own id_pub.dsa file, and pasted into each cluster authorized_keys file. So, the authorized_keys files in all the clusters are same, and contain all the clusters id_pub.dsa files. This gave us ability to login remotely from cluster to another. To test SSH run: mpiuser@Cluster0:~$ ssh Cluster1 hostname It should change to remote hostname account, returns remote hostname (mpiuser@Cluster0:~$) without asking for passphrase (without password). |
The Network reconfigured and tested
The Performance average was improved OK
Ok, this issue is solved later
|
| Week 22 - 23 |
8-Installing required packages (GCC and MPICH2) : Install build-essential package: Make sure that the network mode in the Virtual PC 2007 is set to NAT, to make a connection to the internet.
mpiuser@Cluster0:~$ sudo apt-get install build-essential mpiuser@Cluster0:~$ sudo apt-get install mpich2 Note: if the cluster's IP address changed (set) by this command (ifconfig) as we described previously, then the cluster may not be able to connect to the internet. If the internet is needed, you have to change the network mode in the Virtual PC 2007 to NAT, and then reboot (logout, and login aging). Now, you can install the required packages. For testing our installation we run the following commands: mpiuser@Cluster0:~$ which mpd >> /usr/bin/mpd mpiuser@Cluster0:~$ which mpiexec >> /usr/bin/mpiexec
mpiuser@Cluster0:~$ which mpirun >> /usr/bin/mpirun 9-Setting up MPD: We created mpd.hosts file in mpiuser's home directory with nodes names:
Cluster0
root@Cluster0:~$ echo secretword=test >> ~/.mpd.conf
root@Cluster0:~$ chmod 600 ~/.mpd.conf 10-Some Setting before Run MPI programs: Now we are ready to test some MPI programs, before that we have to run the following command : mpiuser@Cluster0:~$ mpd &
mpiuser@Cluster0:~$ mpdtrace mpiuser@Cluster0:~$ mpdallexit
After all run mpd daemon: mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3 where 3 is the clusters Number.mpiuser@Cluster0:~$ mpdtrace
It should return all the clusters hostnames without errors
Cluster0 Cluster1 Cluster2 11-Run MPI programs: Now we are ready to test some MPI programs: ============================================ ( hello2.c ) ============================== mpiuser@Cluster0:/mirror$ mpicc -Wall -O -o hell02 hello2.c
mpiuser@Cluster0:/mirror$ mpirun -np 3 hello2mpiexec_Cluster0: cannot connect to local mpd (/tmp/mpd2.console_mpiuser); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) In case 1, you can start an mpd on this host with: mpd & and you will be able to run jobs just on this host. For more details on starting mpds on a set of hosts, see the MPICH2 Installation Guide.
mpiuser@Cluster0:/mirror$ mpd &
[1] 1477
mpiuser@Cluster0:/mirror$ mpirun -np 3 hello2
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory
problem with execution of hello2 on Cluster0: [Errno 2] No such file or directory
mpiuser@Cluster0:/mirror$ mpirun -np 3 /mirror/hello2
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 1 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster0)
mpiuser@Cluster0:/mirror$ cd
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
[1]+ Done mpd (wd: /mirror)
(wd now: ~)
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
mpiuser@Cluster0:~$ mpirun -np 3 /mirror/hello2
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster2)
Hello, I am 1 of 3 (hostname is Cluster1)
mpiuser@Cluster0:/mirror$ cd
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
[1]+ Done mpd (wd: /mirror)
(wd now: ~)
mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3
mpiuser@Cluster0:~$ mpirun -np 3 /mirror/hello2
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 2 of 3 (hostname is Cluster2)
Hello, I am 1 of 3 (hostname is Cluster1)
=============================== ( ParallelTotientRange ) ========================mpiuser@Cluster0:/mirror/C+MPI$ ls
ParallelTotientRange2.c ParallelTotientRange3.c ParallelTotientRange.c
mpiuser@Cluster0:/mirror/C+MPI$ mpicc -Wall -O -o ParallelTotientRange3 ParallelTotientRange3.c
mpiuser@Cluster0:/mirror/C+MPI$ mpdboot -f mpd.hosts -n 3
unable to open (or read) hostsfile mpd.hosts
mpiuser@Cluster0:/mirror/C+MPI$ cd mpiuser@Cluster0:~$ mpdboot -f mpd.hosts -n 3 mpiuser@Cluster0:~$ mpdtrace
Cluster0
Cluster2
Cluster1
mpiuser@Cluster0:~$ cd /mirror/C+MPI mpiuser@Cluster0:/mirror/C+MPI$ mpirun -np 2 /mirror/C+MPI/ParallelTotientRange3 1 10000
(hostname ... Cluster0)
(hostname ... Cluster2)
----------------------------------------------------
Sum of Totients between [1..10000] is 30397485
Time: 4.959099 seconds
----------------------------------------------------
mpiuser@Cluster0:/mirror/C+MPI$ mpirun -np 3 /mirror/C+MPI/ParallelTotientRange3 1 10000
(hostname ... Cluster0)
(hostname ... Cluster2)
(hostname ... Cluster1)
----------------------------------------------------
Sum of Totients between [1..10000] is 30397485
Time: 3.317835 seconds
----------------------------------------------------
|
|
| Week 24 -25 |
Preparing the cloud for running (NFS->DFA) program: To run the NFS->DFA program, we need mpi (mpich2 is preferable) and Roomy ( http://roomy.sourceforge.net/ ). The
README files in thetwo tarballs, explain in details how to compile and run the code.
There are also 3 small sample inputs in parser/input .We also need a java installation
(the parser is written in java). So we would first have to install mpich2, then Roomy, then
run the parser on an slightly modified input file (the parser/README file shows how this
needs to be done) and then feed the file outputted by the parser in.
the input of the nfa-mindfa program (the nfa-mindfa/README file shows how to do this). Steps for Cloud Preparing to run (NFS->DFA) program:
1- Install Roomy. Roomy uses MPI for interprocess communication. Currently, Roomy has only been tested with MPICH2. http://sourceforge.net/apps/trac/roomy/wiki/RoomyInstall
2- Download and Install parser. A program that breaks large units of data into smaller, more easily interpreted pieces.For example, a will browser reads documents prepared
with a markup language (such as HTML). The markup language identiies the parts of the
document (such as document headings, bulleted lists, or body text), but says nothing about
how those portions of the document should appear on-screen. The parser reads the tagged
text and formats the various portions of the document for on-screen display. See Hypertext
Markup Language (HTML).A simple parser that translates a GAP-format file containing an
NFA into a format that can be passed as inputto the Roomy-based NFA to min-DFA
program. The implementation uses the java commons-lang-2.5 package.
3- Install java. The parser is written in java.Run and test (NFS->DFA) program:
Compile the parser:
make
To run the parser, for example for input file inputs/a1.g and output
file a1.out:
make run IN=inputs/a1.g OUT=a1.out
Then a1.out can be passed as input to the Roomy-based NFA to min-DFA
program.
1. cd to the nfa-mindfa folder
2. make ROOMY=/usr/local/ 3. modyfied params.in 4. mpd & 5. mpdboot -n 3 -f machines 6.mpdtrace shows the 3 clusters successfully: Cluster0 Cluster1 Cluster2 7. run the program (example1) in nfa-mindfa mpiuser@Cluster0:/mirror/nfa-mindfa$
mpiexec -n 3 /mirror/nfa-mindfa/example a1.out
We have solved this issue by modifying SHARED_DISK to 1 , in the params.in file . SHARED_DISK
mpiuser@Cluster0:/mirror/nfa-mindfa$ mpiexec -n 3 /mirror/nfa-mindfa/example a1.out Tue Feb 7 15:52:36 2012: minimal DFA
summary: |
All the steps
were fine
(successfully),
but I've
received
some errors.
Btw,
the errors are
not the
same every
run time:
The Problem solved by setting
PARAM SHARED_DISK
to 1
412 the
size of
the DFA for
the a1 example
|
|
Week 25 -26 |
Experimental Results: Parallel Disk-based Computations Parallel disk-based computations were carried out on our private cloud; it contains a 3 nodes, each node’s processors being an Intel Pentium dual core CPU E2160@ 1.8Ghz. Nodes had 1.2 GB of RAM at 300 Mb free, and 8GB of hard disk, with < 1 GB of hard disk free in Node0, and 4.7 GB free in Nodes 2 and 3. The 3 Nodes ran Ubuntu 10.04 under Virtual PC 2007 (Vm).
This Table shows the sizes of the NFA and Min DFA, for example A1, and A2. All nodes have access to a shared folder in the main node (Node 0) (/home/mirror/roomy-data)
Experimental Results Log File Output:
NFA->DFA
Results-Log
================ a1 2 Clusters ==========================
real
64m47.458s |
|
|
Week 26 -27 |
Speed up improving (NFS->DFA) program: During the test stage we ran different examples, we found that as the number of processors increases the runtime increases as well. Actually, the runtime increased because of the Barrier Time (overhead), the time it takes for all processes to synchronize for some operations (i.e. a parallel update, or parallel access).
There are 760 syncs for example A1 and 809 syncs for A2. So a sync means that all
MPI processes wait for the local operation to reach acertain point in the code
(like an MPI_Barrier).
We overcome this issue by using a non-shared disk, so each node can reads (parallel access), and writes (parallel update) without need to wait the other nodes to finish its local operations.
1- Modifying SHARED_DISK to 1 , in the params.in file. 2- Create a folder with the same name in each node (non-shared folder), /home/mpiuser/roomy-data
3- Modifying params.in file
PARAM DISK_DATA_PATH /home/mpiuser/roomy-data
|
|
|
Week 28 -29 |
Moving to real, public cloud:
At this stage we are ready to move to an OpenNebula-based cloud,since we found that the Window Azure is not the right choice for our application, we applied to access the KTH (OpenNebula-based)cloud. At the same time we are studding the COMPS . It is a framework in the VENUS-C Platform enabling e-Science applications on the cloud. COMP Superscalar is a new version of GRID Superscalar which aims to easing the development of Grid applications. COMP Superscalar exploits the inherent parallelism of applications when running them on the Grid. COMP Superscalar:
COMPSs manual and other documentation
|
|
|
Week 30 -32 |
Visualizing NFA and DFA:
It would be useful to tie in the NFA->DFA together with the visualisation tool into a COMPSs job,that would show our usage of COMPSs and provide a nice way to visualise the result of NFA->DFA program. We found that Graphviz is a good tool to visualise the NFA->DFA results. Therefore, we decided to draw a simple map for ting in the NFA->DFA together with the visualisation tool. At the beginning, we have a GAP format file (eg. a1.g), we used parser to convert it to another readable format(eg a1.in). We have wrote a java program (AutomataToGraphviz.java) reads, and converts NFA, and DFA files to Graphviz scripts. That scripts used to generated gif files representing the NFA, and DFA states by using a Graphviz tool. Parameters: a1.g (NFA in a GAP format ; complex, and unreadable format) a1.in (NFA in a simple, readable format) a1.dot (NFA in a Graphviz script) a1.gif (NFA in a gif format for visualization) mindfa.g (DFA in a GAP format ; complex, and unreadable format) mindfa.out (DFA in a simple, readable format) mindfa.dot (DFA in a Graphviz script) mindfa.gif (NFA in a gif format for visualization) Programs: nfa.mindfa (NFA->DFA program ) AutomataToGraphviz.java (NFA, and DFA files --> to Graphviz scripts) Our Flow Map: 1- a1.g --> Parser --> a1.in --> nfa.mindfa --> mindfa.g mindfa.g --> Parser --> mindfa.out 2- a1.in --> AutomataToGraphviz.java --> a1.dot a1.dot --> Graphviz ---> a1.gif 3- mindfa.out --> AutomataToGraphviz.java --> mindfa.dot mindfa.dot--> Graphviz ---> mindfa.gif At this stage, we could present any NFA, or DFA states in an image (visualisation format). |
|
|
Week 32-34 |
Start KTH, PDC2 with Command Line Interface (CLI ). - We have applied for PDC2 account on 26.Jan.2012 . - We have got account access on 25.April.2012. - After getting the PDC2 account, we start to launch our instances. PDC2 offers two client interfaces: 1- A web interface where users only need a browser. 2- Command Line Interface where the user has more control but have to install client libraries. General requirements : 1- Putty (on windows) 2- SSH (on linux) After creating a cloud instance, we will need private/public keys in place to login with admin access, For security reasons like other Cloud providers (eg. Amazon). Guide for Command Line interface (CLI)for PDC Cloud (PDC2). We have followed the steps provided by PDC centre in this link: 1- RUBY installation 2- PDC2 client libraries 3- Setting up end point 4- Setting up credentials PDC2 client libraries: Create a new user oneuser in our (PC) end point root@cluster0:~$ sudo adduser oneuser root@cluster0:~$ su oneuser root@cluster0:~$ pwd root@cluster0:~$ /home/oneuser root@cluster0:~$ su oneuser # or, replace with any other name oneuser@cluster0:~$ wget http://www.pdc.kth.se/resources /computers/pdc-cloud/pdc2client.tar.gz oneuser@cluster0:~$ tar zxvf pdc2client.tar.gz oneuser@cluster0:~$ cd pdc2client oneuser@cluster0:~$ pdc2client$ ls bin etc include lib share examples oneuser@cluster0:~$ cd pdc2client/examples oneuser@cluster0 pdc2client/examples :~$ ls base-centos5.one sercdev-centos5.one sercservices-centos5.one Setting up end point: oneuser@cluster0:~$export ONE_XMLRPC=http://front.pdc2.pdc.kth.se:2633/RPC2 oneuser@cluster0:~$export ONE_LOCATION=/home/oneuser/pdc2client oneuser@cluster0:~$export PATH=$ONE_LOCATION/bin:$PATH Setting up end credentials: Client libraries read credentials from the ~/.one/one_auth file by default. root@Cluster0:~$ mkdir ~/.one root@Cluster0:~$ cat ~/.one/one_auth skloul:urpass ;(note : Colon in between) Note : User name and password are given by KTH PDC, once the PDC cloud account application form successfully accepted. After a successfully installation, we able to execute the PDC virtual machines commands: A- The first command (onecluster list) shows clusters in KTH PDC. oneuser@Cluster0$ onecluster list ID NAME 0 default 1 pdc B- The 2nd command (onecluster list) shows images and OS available in KTH PDC. oneuser@Cluster0$ oneimage list ID USER NAME TYPE REGTIME PUB PER STAT #VMS 38 livenson cdmi-template OS Jun 03, 2011 09:22 No No disa 0 13 oneadmin base-centos5 OS Feb 25, 2011 09:04 Yes No used 5 14 oneadmin serc-services OS Feb 25, 2011 09:50 Yes No rdy 0 34 chgustaf RoboCloud_Tomcat OS May 17, 2011 11:55 Yes No rdy 0 39 oneadmin Debian-Squeeze OS Jun 07, 2011 14:12 No No rdy 0 28 oneadmin Ubuntu-10.04 OS May 04, 2011 14:47 No No rdy 0 26 oneadmin Ubuntu-Hardy OS Apr 12, 2011 14:14 Yes No used 1 33 chgustaf RoboCloud_HAProxy OS May 17, 2011 11:42 Yes No used 1 40 oneadmin debiansqueeze OS Jun 07, 2011 15:09 Yes No used 5 41 oneadmin cdmi-v1 OS Jun 10, 2011 10:02 Yes No rdy 0 42 livenson cdmi-v2 OS Jul 01, 2011 10:27 Yes No used 1 43 oneadmin VenusC_Debian_Base OS Nov 28, 2011 14:52 Yes No rdy 0 47 oneadmin venuscdebianbase OS Mar 01, 2012 14:35 Yes No rdy 0 12 oneadmin sercdev-centos5 OS Feb 25, 2011 08:46 Yes No rdy 0 48 oneadmin ttylinux OS Mar 21, 2012 13:15 Yes No rdy 0 52 oneadmin venuscdebian2 OS Apr 19, 2012 17:48 No No rdy 0 SSH Keys to Cloud instance: We need private/public keys for login to a PDC Cloud (PDC2) Instance - Generate ssh keys for login on PDC2 instance root@cluster0$ su oneuser oneuser@Cluster0$ ssh-keygen Generating public/private rsa key pair. Enter a file in which to save the key (/home/oneuser/.ssh/id_rsa): Note: # Hit enter or write alternate path Enter passphrase (empty for no passphrase): Note: # write strong/easily memorisable password After that: oneuser@Cluster0$ cat /home/oneuser/.ssh/id_rsa.pub send the output of above to PDC support along with your PDC2 username. Creating the 1st instance: oneuser@Cluster0:~/pdc2client$ cd examples oneuser@Cluster0:~/pdc2client/examples$ ls base-centos5.one baseubuntu.one list.txt onevm-list sercservices-centos5.one base-debian.one error.txt list-txt.txt sercdev-centos5.one We need to edit, and make the following changes in the base-debian.one file for each instance: instance 1: NAME = Cluster0 MEMORY = 1024 PDC"USER = skloul oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one instance 2: NAME = Cluster1 MEMORY = 1024 PDC"USER = skloul oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one instance 3: NAME = Cluster2 MEMORY = 1024 PDC"USER = skloul oneuser@Cluster0:~/pdc2client/examples$ create onevm base-debian.one C- The 3rd command (onevm list m) shows our instances in KTH PDC.
oneuser@Cluster0$
onevm
list m
ID USER NAME STAT CPU MEM HOSTNAME TIME 545 skloul Cluster0 runn 99 1020.3M nebula11 06 05:33:52 573 skloul Cluster1 runn 0 1020.3M nebula10 02 01:05:06 576 skloul Cluster2 runn 99 1020.3M nebula6 00 01:36:15 |
|
|
Week 34-35 |
Prepare 3 nodes on KTH Cloud:
1- Create 3 instances (cluster0,1,2)using Debian Os. 2- Prepare each node (cluster0,1,2) individually. - Define a new user (cmpiuser) in each node. root@Cluster0$sudo adduser cmpiuser root@Cluster1$sudo adduser cmpiuser root@Cluster2$sudo adduser cmpiuser - Create a new directory (/mirror) in each node. - Copy all mpi examples in /mirror directory in each node. root@Cluster0$ ls /mirror hello2.c ParallelTotientRange3
- Generate DSA key for cmpiuser in each node (ssh-keygen -t dsa). - Copy public keys (id_pub.dsa) to the node's authorized_keys cmpiuser@Cluster0:~$ ssh-keygen -t dsa Leave passphrase empty. Next we add this key to authorized keys: cmpiuser@Cluster0:~$ cd .ssh cmpiuser@Cluster0:$~/.ssh$ cat id_pub.dsa >> authorized_keys cmpiuser@Cluster1:~$ cd .ssh cmpiuser@Cluster1:$~/.ssh$ cat id_pub.dsa >> authorized_keys cmpiuser@Cluster2:~$ cd .ssh cmpiuser@Cluster2:$~/.ssh$ cat id_pub.dsa >> authorized_keys ssh-dss AAAAB3Nza ... etc Yr1FSq0s= cmpiuser@Cluster0 ssh-dss AAAAB3Nza ... etc lxm4JCpG cmpiuser@Cluster1 ssh-dss AAAAB3Nza ... etc quTk9uQ== cmpiuser@Cluster2 - Copy public keys (id_pub.dsa) from each node's cmpiuser to the main node's authorized_keys (cmpiuser@Cluster0:$~/.ssh$) - cmpiuser@Cluster0:$~/.ssh$ cat authorized_keys - Install build-essential package in each node (Cluster0,1,and 2). cmpiuser@Cluster0:$~ sudo apt-get install build-essential cmpiuser@Cluster1:$~ sudo apt-get install build-essential cmpiuser@Cluster2:$~ sudo apt-get install build-essential - Install MPICH2 package in each node (Cluster0,1,and 2). cmpiuser@Cluster0:$~ sudo apt-get upadet cmpiuser@Cluster0:$~ sudo apt-get install mpich2 cmpiuser@Cluster1:$~ sudo apt-get upadet cmpiuser@Cluster1:$~ sudo apt-get install mpich2 cmpiuser@Cluster2:$~ sudo apt-get upadet cmpiuser@Cluster2:$~ sudo apt-get install mpich2 - Define hostnames in etc/hosts root@Cluster0:$~ nano /etc/hosts add nodes address in cluster0's hosts file root@Cluster0:$~ cat /etc/hosts 127.0.0.1 localhost
130.237.221.253 repo.pdc2.pdc.kth.se
192.168.2.220 nfscloud
130.237.221.240 Cluster0
130.237.221.253 Cluster1
130.237.221.234 Cluster2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
3- Test our installation run for each node: cmpiuser@Cluster0:$~ which mpd cmpiuser@Cluster0:$~ which mpiexec cmpiuser@Cluster0:$~ which mpirun cmpiuser@Cluster1:$~ which mpd cmpiuser@Cluster1:$~ which mpiexec cmpiuser@Cluster1:$~ which mpirun cmpiuser@Cluster2:$~ which mpd cmpiuser@Cluster2:$~ which mpiexec cmpiuser@Cluster2:$~ which mpirun 4- Setting up MPD: - Create mpd.hosts in cmpiuser's home directory with nodes names cmpiuser@Cluster0:$~ cat mpd.hosts Cluster0
Cluster1
Cluster2
- Then run :
cmpiuser@Cluster0:~$ echo secretword=skloul >> ~/.mpd.conf cmpiuser@Cluster0:~$ chmod 600 ~/.mpd.conf 5- Test MPD by typing the following commands:
cmpiuser@Cluster0:$~ mpd & cmpiuser@Cluster0:$~ mpdtrace The output should be the current hostname
cmpiuser@Cluster0:$~ mpdallexit
After all run mpd daemon: cmpiuser@Cluster0:$~ mpdboot n 3 We should be in home directory, where mpd.host file exists, Otherwise, use mpdboot n 3 -f mpd.host cmpiuser@Cluster0:$~ mpdtrace The output should be name of all nodes. If this doesn't succeed try running the following linux commands to list any mpd currently running in any node else. mpd should be run (once) in the main node only (Cluster0). ps su | grep mpd ; run this on all hosts , kill -9 4323 ; then use kill to delete any running mpd ; where 4323 id # of running mpd Note: This error happens, because of many mpd are running, or the host name (Cluster0) is not defined in /etc/hosts. cmpiuser@Cluster0:/mirror$ ssh cluster1
|
|
|
Week 35-36 |
Run MPI programs: Now we are ready to test some MPI programs on KTH Cloud: 1- The first program we have tested hello2.c cmpiuser@Cluster0:/mirror$ ls hello.c mpd.hosts ParallelTotientRange3.c cmpiuser@Cluster0:/mirror$ mpcc -Wall -O -o /hello2 hello.c Instead of compile the hello.c in each node, we can copy the binary
file (hello), to the other nodes (cluste1, and 2).
cmpiuser@Cluster0:/mirror$ scp hello2 cmpiuser@Cluster1:/mirror cmpiuser@Cluster0:/mirror$ scp hello2 cmpiuser@Cluster2:/mirror Run hello in all the nodes parallel, from the main node (Cluster0)
cmpiuser@Cluster0:/mirror$ mpirun -n 1 ./hello2
Hello, I am 0 of 1 (hostname is Cluster0)
cmpiuser@Cluster0:/mirror$ mpirun -n 2 ./hello2
Hello, I am 0 of 2 (hostname is Cluster0)
Hello, I am 1 of 2 (hostname is Cluster2)
cmpiuser@Cluster0:/mirror$ mpirun -n 3 ./hello2
Hello, I am 0 of 3 (hostname is Cluster0)
Hello, I am 1 of 3 (hostname is Cluster2)
Hello, I am 2 of 3 (hostname is Cluster1)
cmpiuser@Cluster0:/mirror$
2- The first program we have tested ParallelTotientRange3.c
cmpiuser@Cluster0:/mirror$ mpirun -np 1 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)
----------------------------------------------------
Sum of Totients between [1..1000] is 304191
Time: 0.055068 seconds
----------------------------------------------------
cmpiuser@Cluster0:/mirror$ mpirun -np 2 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)
(hostname ... Cluster2)
----------------------------------------------------
Sum of Totients between [1..1000] is 304191
Time: 0.028609 seconds
----------------------------------------------------
cmpiuser@Cluster0:/mirror$ mpirun -np 3 ./ParallelTotientRange3 1 1000
(hostname ... Cluster0)
(hostname ... Cluster2)
(hostname ... Cluster1)
Summary of KTH PDC2 Cloud speedup tests:
These are the TotientRange function speed up in 1, 2 , and 3 instances (vms) on KTH PDC2 Cloud Tabel(1) Summary: ----------------------------------------------------------------- Instance Rang Rang Rang Rang Rang Rang Vm 1-10000 1-15000 1-20000 1-25000 1-30000 1-50000 ----------------------------------------------------------------- 1 7.6 Sec 18.1 33.5 54.0 79.7 236.5 ----------------------------------------------------------------- 2 3.9 Sec 9.4 17.3 27.9 41.2 122.1 ----------------------------------------------------------------- 3 2.6 Sec 6.1 11.4 18.4 27.1 80.6 ----------------------------------------------------------------- Note: The above results, show that KTH pdc2 cloud (for 3 nodes) is faster
than the our private cloud (for 3 nodes).
|
|
|
Week |
KTH public cloud v Our Private Cloud:
NFA->DFA
application Speedup:
Total, and available Storage:
|
|
|
Week |
|
|