電子計算記

個人的な検証を

15. MPIクラスターを作ろう! - HPLを動かしてみる

前回からの続き
14. MPIクラスターを作ろう! - qn24bを動かしてみる - 電子計算記

スパコンと言えばTOP500!

Home | TOP500 Supercomputer Sites

TOP500と言えばLinpack!

LINPACK - Wikipedia

ということで、LinpackのMPIによる並列実装のHPLを動かしてTOP500のようなベンチマークスコアをはじき出しましょう。

ただ、本気でやるとキリがないほど奥深いので、時短で簡単に実行するやりかたを紹介します。


Linpackの根幹である行列演算、このライブラリとしてHPLBLASを使うのでまずはその準備から。
本気で性能を出したい場合は、利用している環境に合わせてビルドするわけですが、ここは時短でUbuntu標準パッケージを利用します。

BLASの最近のOSS実装だと、OpenBLASATLASが定番。Ubuntu16.04だと両方パッケージあるのでどっち使ってもよいですが、ここではOpenBLASの例ですすめます。

まずは、ビルド環境としてcompute-1を使うとして、OpenBLASをインストールします。

root@compute-1:~# apt install libopenblas-dev -y

これだけで/usr/lib配下にすぐに使えるOpenBLASのライブラリがインストールされます。

では下準備が整ったので、HPLをビルドしていきます。まずはファイルのダウンロード。オフィシャルからとってきます。

mpiuser@compute-1:~$ wget http://www.netlib.org/benchmark/hpl/hpl-2.2.tar.gz
mpiuser@compute-1:~$ tar zxf hpl-2.2.tar.gz -C /nfs/
mpiuser@compute-1:~$ cd /nfs/hpl-2.2/

環境ごとのビルド用のMakefileのサンプルがsetupディレクトリの中にありますので、ここでは一番変更の少ないMake.Linux_PII_CBLAS_gmをベースとして編集します。

mpiuser@compute-1:/nfs/hpl-2.2$ cp ./setup/Make.Linux_PII_CBLAS_gm ./
mpiuser@compute-1:/nfs/hpl-2.2$ vi Make.Linux_PII_CBLAS_gm

編集箇所としては、
70行目のTOPdirをHPLを展開したディレクトリパスを指定
TOPdir = /nfs/hpl-2.2
95行目のLAdirをBLASをインストールしたライブラリのディレクトリパスを指定
LAdir = /usr/lib
97行目のLAlibをBLASをインストールしたライブラリのファイル自身を指定
LAlib = $(LAdir)/libopenblas.a
これだけです。(前回までの流れで、OpenMPI、gfortranをaptインストールしている前提)

わかりにくいかもなので全部のっけておきます。
/nfs/hpl-2.2/Make.Linux_PII_CBLAS_gm

#  
#  -- High Performance Computing Linpack Benchmark (HPL)                
#     HPL - 2.2 - February 24, 2016                          
#     Antoine P. Petitet                                                
#     University of Tennessee, Knoxville                                
#     Innovative Computing Laboratory                                 
#     (C) Copyright 2000-2008 All Rights Reserved                       
#                                                                       
#  -- Copyright notice and Licensing terms:                             
#                                                                       
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:                                                             
#                                                                       
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.        
#                                                                       
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution. 
#                                                                       
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:                 
#  This  product  includes  software  developed  at  the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratory.             
#                                                                       
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.                                                          
#                                                                       
#  -- Disclaimer:                                                       
#                                                                       
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
# ######################################################################
#  
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = Linux_PII_CBLAS_gm
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = /nfs/hpl-2.2
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a 
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        =
MPinc        =
MPlib        =
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = /usr/lib
LAinc        =
LAlib        = $(LAdir)/libopenblas.a
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      =
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           = mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = mpif77
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------

Makefileができあがればあとはmakeするだけです。

mpiuser@compute-1:/nfs/hpl-2.2$ make arch=Linux_PII_CBLAS_gm
mpiuser@compute-1:/nfs/hpl-2.2$ cd bin/Linux_PII_CBLAS_gm/
mpiuser@compute-1:/nfs/hpl-2.2/bin/Linux_PII_CBLAS_gm$ ls -alh
total 22M
drwxrwxr-x 2 mpiuser mpiuser 4.0K Jan 10 00:35 .
drwxrwxr-x 3 mpiuser mpiuser 4.0K Jan 10 00:35 ..
-rw-r--r-- 1 mpiuser mpiuser 1.2K Jan 10 00:35 HPL.dat
-rwxrwxr-x 1 mpiuser mpiuser  22M Jan 10 00:35 xhpl

無事ビルドに成功するとbin/(arch)/の中にxhplというバイナリが出来上がっています。
では最後にREADMEにあるとおりテストしてみましょう。

mpiuser@compute-1:/nfs/hpl-2.2/bin/Linux_PII_CBLAS_gm$ mpirun -np 4 ./xhpl
〜省略〜
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00R2R2          35     4     4     1               0.00              4.775e-02
HPL_pdgesv() start time Wed Jan 10 00:38:25 2018

HPL_pdgesv() end time   Wed Jan 10 00:38:25 2018

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0247304 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00R2R4          35     4     4     1               0.00              4.915e-02
HPL_pdgesv() start time Wed Jan 10 00:38:25 2018

HPL_pdgesv() end time   Wed Jan 10 00:38:25 2018

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0199397 ...... PASSED
================================================================================

Finished    864 tests with the following results:
            864 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

テストでは4プロセス以上が必要でノード内でまずは動かしてみましょう。
failedやskippedがなければ成功です。
Light.S1でも1秒程度で実行完了できると思います。
しかし、複数ノードでLight.S1を4台で動かすと同じテストを終えるまでに30分以上かかります。

ということで、次回はHPLの実行パラメータのチューニングにせまっていきたいと思います。

fujish.hateblo.jp