前回からの続き
14. MPIクラスターを作ろう! - qn24bを動かしてみる - 電子計算記
スパコンと言えばTOP500!
Home | TOP500 Supercomputer Sites
TOP500と言えばLinpack!
ということで、LinpackのMPIによる並列実装のHPLを動かしてTOP500のようなベンチマークスコアをはじき出しましょう。
ただ、本気でやるとキリがないほど奥深いので、時短で簡単に実行するやりかたを紹介します。
Linpackの根幹である行列演算、このライブラリとしてHPLはBLASを使うのでまずはその準備から。
本気で性能を出したい場合は、利用している環境に合わせてビルドするわけですが、ここは時短でUbuntu標準パッケージを利用します。
BLASの最近のOSS実装だと、OpenBLASかATLASが定番。Ubuntu16.04だと両方パッケージあるのでどっち使ってもよいですが、ここではOpenBLASの例ですすめます。
まずは、ビルド環境としてcompute-1を使うとして、OpenBLASをインストールします。
root@compute-1:~# apt install libopenblas-dev -y
これだけで/usr/lib配下にすぐに使えるOpenBLASのライブラリがインストールされます。
では下準備が整ったので、HPLをビルドしていきます。まずはファイルのダウンロード。オフィシャルからとってきます。
mpiuser@compute-1:~$ wget http://www.netlib.org/benchmark/hpl/hpl-2.2.tar.gz mpiuser@compute-1:~$ tar zxf hpl-2.2.tar.gz -C /nfs/ mpiuser@compute-1:~$ cd /nfs/hpl-2.2/
環境ごとのビルド用のMakefileのサンプルがsetupディレクトリの中にありますので、ここでは一番変更の少ないMake.Linux_PII_CBLAS_gmをベースとして編集します。
mpiuser@compute-1:/nfs/hpl-2.2$ cp ./setup/Make.Linux_PII_CBLAS_gm ./ mpiuser@compute-1:/nfs/hpl-2.2$ vi Make.Linux_PII_CBLAS_gm
編集箇所としては、
70行目のTOPdirをHPLを展開したディレクトリパスを指定
TOPdir = /nfs/hpl-2.2
95行目のLAdirをBLASをインストールしたライブラリのディレクトリパスを指定
LAdir = /usr/lib
97行目のLAlibをBLASをインストールしたライブラリのファイル自身を指定
LAlib = $(LAdir)/libopenblas.a
これだけです。(前回までの流れで、OpenMPI、gfortranをaptインストールしている前提)
わかりにくいかもなので全部のっけておきます。
/nfs/hpl-2.2/Make.Linux_PII_CBLAS_gm
# # -- High Performance Computing Linpack Benchmark (HPL) # HPL - 2.2 - February 24, 2016 # Antoine P. Petitet # University of Tennessee, Knoxville # Innovative Computing Laboratory # (C) Copyright 2000-2008 All Rights Reserved # # -- Copyright notice and Licensing terms: # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions, and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # 3. All advertising materials mentioning features or use of this # software must display the following acknowledgement: # This product includes software developed at the University of # Tennessee, Knoxville, Innovative Computing Laboratory. # # 4. The name of the University, the name of the Laboratory, or the # names of its contributors may not be used to endorse or promote # products derived from this software without specific written # permission. # # -- Disclaimer: # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY # OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # ###################################################################### # # ---------------------------------------------------------------------- # - shell -------------------------------------------------------------- # ---------------------------------------------------------------------- # SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch # # ---------------------------------------------------------------------- # - Platform identifier ------------------------------------------------ # ---------------------------------------------------------------------- # ARCH = Linux_PII_CBLAS_gm # # ---------------------------------------------------------------------- # - HPL Directory Structure / HPL library ------------------------------ # ---------------------------------------------------------------------- # TOPdir = /nfs/hpl-2.2 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # ---------------------------------------------------------------------- # - Message Passing library (MPI) -------------------------------------- # ---------------------------------------------------------------------- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir = MPinc = MPlib = # # ---------------------------------------------------------------------- # - Linear Algebra library (BLAS or VSIPL) ----------------------------- # ---------------------------------------------------------------------- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # LAdir = /usr/lib LAinc = LAlib = $(LAdir)/libopenblas.a # # ---------------------------------------------------------------------- # - F77 / C interface -------------------------------------------------- # ---------------------------------------------------------------------- # You can skip this section if and only if you are not planning to use # a BLAS library featuring a Fortran 77 interface. Otherwise, it is # necessary to fill out the F2CDEFS variable with the appropriate # options. **One and only one** option should be chosen in **each** of # the 3 following categories: # # 1) name space (How C calls a Fortran 77 routine) # # -DAdd_ : all lower case and a suffixed underscore (Suns, # Intel, ...), [default] # -DNoChange : all lower case (IBM RS6000), # -DUpCase : all upper case (Cray), # -DAdd__ : the FORTRAN compiler in use is f2c. # # 2) C and Fortran 77 integer mapping # # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. # # 3) Fortran 77 string handling # # -DStringSunStyle : The string address is passed at the string loca- # tion on the stack, and the string length is then # passed as an F77_INTEGER after all explicit # stack arguments, [default] # -DStringStructPtr : The address of a structure is passed by a # Fortran 77 string, and the structure is of the # form: struct {char *cp; F77_INTEGER len;}, # -DStringStructVal : A structure is passed by value for each Fortran # 77 string, and the structure is of the form: # struct {char *cp; F77_INTEGER len;}, # -DStringCrayStyle : Special option for Cray machines, which uses # Cray fcd (fortran character descriptor) for # interoperation. # F2CDEFS = # # ---------------------------------------------------------------------- # - HPL includes / libraries / specifics ------------------------------- # ---------------------------------------------------------------------- # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) # # - Compile time options ----------------------------------------------- # # -DHPL_COPY_L force the copy of the panel L before bcast; # -DHPL_CALL_CBLAS call the cblas interface; # -DHPL_CALL_VSIPL call the vsip library; # -DHPL_DETAILED_TIMING enable detailed timers; # # By default HPL will: # *) not copy L before broadcast, # *) call the BLAS Fortran 77 interface, # *) not display detailed timing information. # HPL_OPTS = -DHPL_CALL_CBLAS # # ---------------------------------------------------------------------- # HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) # # ---------------------------------------------------------------------- # - Compilers / linkers - Optimization flags --------------------------- # ---------------------------------------------------------------------- # CC = mpicc CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall # # On some platforms, it is necessary to use the Fortran linker to find # the Fortran internals used in the BLAS library. # LINKER = mpif77 LINKFLAGS = $(CCFLAGS) # ARCHIVER = ar ARFLAGS = r RANLIB = echo # # ----------------------------------------------------------------------
Makefileができあがればあとはmakeするだけです。
mpiuser@compute-1:/nfs/hpl-2.2$ make arch=Linux_PII_CBLAS_gm mpiuser@compute-1:/nfs/hpl-2.2$ cd bin/Linux_PII_CBLAS_gm/ mpiuser@compute-1:/nfs/hpl-2.2/bin/Linux_PII_CBLAS_gm$ ls -alh total 22M drwxrwxr-x 2 mpiuser mpiuser 4.0K Jan 10 00:35 . drwxrwxr-x 3 mpiuser mpiuser 4.0K Jan 10 00:35 .. -rw-r--r-- 1 mpiuser mpiuser 1.2K Jan 10 00:35 HPL.dat -rwxrwxr-x 1 mpiuser mpiuser 22M Jan 10 00:35 xhpl
無事ビルドに成功するとbin/(arch)/の中にxhplというバイナリが出来上がっています。
では最後にREADMEにあるとおりテストしてみましょう。
mpiuser@compute-1:/nfs/hpl-2.2/bin/Linux_PII_CBLAS_gm$ mpirun -np 4 ./xhpl 〜省略〜 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00R2R2 35 4 4 1 0.00 4.775e-02 HPL_pdgesv() start time Wed Jan 10 00:38:25 2018 HPL_pdgesv() end time Wed Jan 10 00:38:25 2018 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0247304 ...... PASSED ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00R2R4 35 4 4 1 0.00 4.915e-02 HPL_pdgesv() start time Wed Jan 10 00:38:25 2018 HPL_pdgesv() end time Wed Jan 10 00:38:25 2018 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0199397 ...... PASSED ================================================================================ Finished 864 tests with the following results: 864 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================
テストでは4プロセス以上が必要でノード内でまずは動かしてみましょう。
failedやskippedがなければ成功です。
Light.S1でも1秒程度で実行完了できると思います。
しかし、複数ノードでLight.S1を4台で動かすと同じテストを終えるまでに30分以上かかります。
ということで、次回はHPLの実行パラメータのチューニングにせまっていきたいと思います。