{"id":2289,"date":"2023-12-18T15:03:54","date_gmt":"2023-12-18T06:03:54","guid":{"rendered":"https:\/\/gotocloud.co.kr\/?p=2289"},"modified":"2023-12-18T15:03:54","modified_gmt":"2023-12-18T06:03:54","slug":"build-your-own-hpc-cluster-based-on-ubuntu-22-04","status":"publish","type":"post","link":"https:\/\/gotocloud.co.kr\/?p=2289","title":{"rendered":"Build Your Own HPC Cluster based on Ubuntu 22.04"},"content":{"rendered":"<h1>Typical HPC configuration<\/h1>\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"491\" src=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1-1024x491.png\" alt=\"\" class=\"wp-image-2290\" srcset=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1-1024x491.png 1024w, https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1-300x144.png 300w, https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1-768x368.png 768w, https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/image-1-1.png 1302w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n<h1>Test Configuration<\/h1>\n<p>\u00a0<\/p>\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/\ud654\uba74-\ucea1\ucc98-2023-12-18-150022.png\"><img loading=\"lazy\" decoding=\"async\" width=\"425\" height=\"193\" data-id=\"2294\" src=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/\ud654\uba74-\ucea1\ucc98-2023-12-18-150022.png\" alt=\"\" class=\"wp-image-2294\" srcset=\"https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/\ud654\uba74-\ucea1\ucc98-2023-12-18-150022.png 425w, https:\/\/gotocloud.co.kr\/wp-content\/uploads\/2023\/12\/\ud654\uba74-\ucea1\ucc98-2023-12-18-150022-300x136.png 300w\" sizes=\"(max-width: 425px) 100vw, 425px\" \/><\/a><\/figure>\n<\/figure>\n\n\n<ul>\n<li>All nodes have two NICs (In AWS, one NIC is used)\n<ul>\n<li>eth0: compute network for MPI and NFS<\/li>\n<li>eth1: Internet access for update<\/li>\n<\/ul>\n<\/li>\n<li>node001 : login, NFS and compute node<\/li>\n<li>OS : Ubuntu 22.04 server minimal<\/li>\n<\/ul>\n<h1>Launch instances in AWS<\/h1>\n<ul>\n<li>Launch Ubuntu 22.04 instances with public IP address<\/li>\n<li>Edit inbound rules in the security group\n<ul>\n<li>Allow ssh access from the internet<\/li>\n<li>Allow all inbound traffic between private IP address range<\/li>\n<\/ul>\n<\/li>\n<li>\n<table>\n<thead>\n<tr>\n<th style=\"text-align: center;\">Record hostnames and IP addresses<\/th>\n<th style=\"text-align: center;\">hostname<\/th>\n<th style=\"text-align: center;\">private IP<\/th>\n<th>public IP<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align: center;\">node001<\/td>\n<td style=\"text-align: center;\">&#8211;<\/td>\n<td style=\"text-align: center;\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">node002<\/td>\n<td style=\"text-align: center;\">&#8211;<\/td>\n<td style=\"text-align: center;\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<\/ul>\n<h1>Headnode setup<\/h1>\n<h2>Enable Password login for <em>root<\/em><\/h2>\n<ul>\n<li>Login to headnode using id <em>ubuntu<\/em> and setup <em>root<\/em> password\n<pre><code>$ sudo passwd<\/code><\/pre>\n<\/li>\n<li>Change account to <em>root<\/em> and enable password login for <em>root<\/em>\n<pre><code>$ apt-get -y install no vim csh\n$ su -\n\n# vi \/etc\/ssh\/sshd_config\nPasswordAuthentication yes\nPermitRootLogin yes\n\n# systemctl restart sshd<\/code><\/pre>\n<\/li>\n<li>Check that <em>root<\/em> login to headnode using <strong>ssh<\/strong><\/li>\n<\/ul>\n<h2><strong>btools<\/strong> for Cluster management<\/h2>\n<ul>\n<li>Install <strong>btools<\/strong> script in the headnode\n<ul>\n<li><a href=\"https:\/\/github.com\/zachsnoek\/btools\"><strong>btools<\/strong><\/a> is a series of scripts to automate the execution of commands\n<pre><code>$ su -\nroot@node001:~# apt-get -y install git\nroot@node001:~# cd \/root\nroot@node001:~# git clone https:\/\/github.com\/zachsnoek\/btools \nroot@node001:~# cd btools\nroot@node001:~\/btools# .\/install-btools.sh\nroot@node001:~\/btools# cd \/usr\/local\/sbin\nroot@node001:\/usr\/local\/sbin# sed -i \"s\/bin\/sh\/bin\/bash\/g\" *<\/code><\/pre>\n<\/li>\n<li>In the ubuntu OS, <code>#!\/bin\/sh<\/code> command in the <strong>btools<\/strong> files does not work. <code>#!\/bin\/sh<\/code> to <code>#!\/bin\/bash<\/code> using <strong>sed<\/strong> command.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Hostname setup<\/h2>\n<ul>\n<li>Add all hostnames in <strong>\/usr\/local\/sbin\/bhosts<\/strong>\n<pre><code>root@node001:~# vi \/usr\/local\/sbin\/bhosts\n\nnode002\nnode003\n...<\/code><\/pre>\n<\/li>\n<li>Append all nodes&#8217; ip addresses in <strong>\/etc\/hosts<\/strong>\n<pre><code>root@node001:~# vi \/etc\/hosts\n\n127.0.0.1 localhost\n192.168.200.1 node001\n192.168.200.2 node002 \n<\/code><\/pre>\n<h2><strong>root<\/strong> login without asking password<\/h2>\n<\/li>\n<li>Create a ssh key and copy to all compute nodes for <em>root<\/em> login without password\n<pre><code>root@node001:~# ssh-keygen -t rsa\n\nroot@node001:~# ssh-copy-id root@node002\nroot@node001:~# ssh-copy-id root@node003<\/code><\/pre>\n<\/li>\n<li>Execute <strong>btools<\/strong> commands without asking <em>root<\/em> password\n<pre><code>root@node001:~# bexec hostname\n\n***** node002 *****\nnode002\n***** node003 *****\nnode003<\/code><\/pre>\n<\/li>\n<\/ul>\n<h2>NFS server setup<\/h2>\n<ul>\n<li>Head node <strong>\/home<\/strong> is shared to all compute nodes by NFS<\/li>\n<li>Install NFS server package and start NFS service in headnode\n<pre><code>root@node001:~# apt install -y nfs-kernel-server nfs-common\nroot@node001:~# systemctl enable nfs-server\nroot@node001:~# systemctl start nfs-server\nroot@node001:~# systemctl status nfs-server\n\u25cf nfs-server.service - NFS server and services<\/code><\/pre>\n<\/li>\n<li>Export <strong>\/home<\/strong> to all compute nodes\n<pre><code>root@node001:~# vi \/etc\/exports\n\/home 192.168.200.0\/24(rw,no_root_squash)\nroot@node001:~# exportfs -a<\/code><\/pre>\n<p><em>192.168.200.0\/24<\/em> is the ip address range of NFS network. Change your IP range<\/p>\n<\/li>\n<\/ul>\n<h1>Compute nodes setup<\/h1>\n<h2>Sync headnode file to compute nodes<\/h2>\n<ul>\n<li><strong>bpush<\/strong> command copies headnode file to all compute nodes\n<pre><code>bpush &lt;headnode file&gt; &lt;destiation directory&gt;<\/code><\/pre>\n<\/li>\n<li>Copy headnode <strong>\/etc\/hosts<\/strong> file to all compute node using <strong>bpush<\/strong> command\n<pre><code>root@node001:~# bpush \/etc\/hosts \/etc\/\n***** node002 *****\n***** node003 *****\n***** node004 *****<\/code><\/pre>\n<\/li>\n<li>Check <strong>\/etc\/hosts<\/strong> file is sync to all compute nodes using <strong>bexec<\/strong> command\n<pre><code>root@node001:~# bexec \"cat \/etc\/hosts\"<\/code><\/pre>\n<\/li>\n<\/ul>\n<h2>NFS client setup<\/h2>\n<ul>\n<li>Install NFS client package in all compute nodes using <strong>bexec<\/strong>\n<pre><code>root@node001:~# bexec \"apt-get install -y nfs-common\"<\/code><\/pre>\n<\/li>\n<li>Check the NFS setup by mount <strong>\/home<\/strong> of headnode\n<pre><code>root@node001:~# bexec \"mount -t nfs node001:\/home \/home\"\nroot@node001:~# bexec \"df | grep home\"\n***** node002 *****\nnode001:\/home  3844551680        0 3649184768   0% \/home\n***** node003 *****\nnode001:\/home  3844551680        0 3649184768   0% \/home<\/code><\/pre>\n<\/li>\n<li>Edit <strong>\/etc\/fstab<\/strong> of all compute nodes to mount at boot time using <strong>bexec<\/strong>\n<pre><code>root@node001:~# bexec \"sed -i -e '$a node001:\/home \/nome nfs defaults 0 0' \/etc\/fstab\"<\/code><\/pre>\n<\/li>\n<\/ul>\n<h1>Additional works<\/h1>\n<ul>\n<li><strong>\/etc\/bash.bashrc<\/strong> of Ubuntu disables non-interactive shell commands by default\n<ul>\n<li><strong>mpirun<\/strong> can not be run in compute nodes<\/li>\n<li><strong>[ -z &#8220;$PS1&#8221; ] &amp;&amp; return<\/strong> of <strong>\/etc\/bash.bashrc<\/strong> should be commented out<\/li>\n<li>Edit <strong>\/etc\/bash.bashrc<\/strong> to enable remote command to be executed\n<pre><code>root@node001:~# sed -i '\/&amp;&amp; return\/s\/^\/#\/' \/etc\/bash.bashrc\nroot@node001:~# bexec \"sed -i '\/&amp;&amp; return\/s\/^\/#\/' \/etc\/bash.bashrc\"<\/code><\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Disable <strong>StrictHostKeyChecking<\/strong> in all compute nodes\n<pre><code>\nroot@node001:~# vi \/etc\/ssh\/ssh_config\nStrictHostKeyChecking no\nroot@node001:~# bpush \/etc\/ssh\/ssh_config \/etc\/ssh\/<\/code><\/pre>\n<\/li>\n<\/ul>\n<hr \/>\n<h1>Final work<\/h1>\n<ul>\n<li>Update and install packages in all nodes\n<pre><code>root@node001:~# apt-get -y update\nroot@node001:~# apt-get -y install net-tools iputils-ping wget git vim build-essential flex libz-dev csh rsync\nroot@node001:~# bexec \"apt-get update\"\nroot@node001:~# bexec \"apt-get -y install net-tools iputils-ping wget git vim build-essential flex libz-dev csh rsync\"<\/code><\/pre>\n<\/li>\n<li>Install <a href=\"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/toolkits.html#gs.1mrz63\">Intel OneAPI<\/a> for compilers and MPI for all nodes\n<pre><code># wget -O- https:\/\/apt.repos.intel.com\/intel-gpg-keys\/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \n| gpg --dearmor | sudo tee \/usr\/share\/keyrings\/oneapi-archive-keyring.gpg &gt; \/dev\/null \n# echo \"deb [signed-by=\/usr\/share\/keyrings\/oneapi-archive-keyring.gpg] \nhttps:\/\/apt.repos.intel.com\/oneapi all main\" | sudo tee \/etc\/apt\/sources.list.d\/oneAPI.list\n# apt update\n# apt install -y intel-basekit intel-hpckit<\/code><\/pre>\n<ul>\n<li>Execute above commands using <strong>bexec<\/strong> for all nodes<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h1>User Creation<\/h1>\n<ul>\n<li>Create a user account in head node and assign initial password\n<pre><code>root@node001:~# adduser nextfoam<\/code><\/pre>\n<\/li>\n<li>Sync the account information to all compute nodes using <strong>bsync<\/strong>\n<pre><code>root@node001:~# bsync<\/code><\/pre>\n<\/li>\n<li>Create a MPI hostfile and copy to user&#8217;s home directory\n<pre><code># vi \/root\/mpihosts\nnode001:32\nnode002:32\n# cp \/root\/mpihosts \/home\/nextfoam\n# chown -R nextfoam.nextfoam \/home\/nextfoam\/mpihosts<\/code><\/pre>\n<\/li>\n<li>Send account information and initial password to user by e-mail etc.<\/li>\n<\/ul>\n<h2>What users need to do after login<\/h2>\n<ul>\n<li>Change password after login\n<pre><code>nextfoam@node001:~$ passwd\nChanging password for nextfoam.\nCurrent password: \nNew password: \nRetype new password: \npasswd: password updated successfully<\/code><\/pre>\n<\/li>\n<li>Create a ssh key to access compute nodes\n<pre><code>nextfoam@node001:~$ ssh-keygen -t rsa<\/code><\/pre>\n<\/li>\n<li>Copy public key <strong>id_rsa.pub<\/strong> to <strong>authorized_keys<\/strong> for not password asking\n<pre><code>nextfoam@node001:~$ cp ~\/.ssh\/id_rsa.pub ~\/.ssh\/authorized_keys<\/code><\/pre>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Typical HPC configuration Test Configuration \u00a0 All nodes have two NICs (In AWS, one NIC is used) eth0: compute network for MPI and NFS eth1: [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2294,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[405],"tags":[413,414,415],"_links":{"self":[{"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/2289"}],"collection":[{"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2289"}],"version-history":[{"count":3,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/2289\/revisions"}],"predecessor-version":[{"id":2295,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/2289\/revisions\/2295"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=\/wp\/v2\/media\/2294"}],"wp:attachment":[{"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2289"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2289"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gotocloud.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2289"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}