13. MPIクラスターを作ろう！ - 姫野ベンチをもう少し動かしてみる

前回からのつづきです。
12. MPIクラスターを作ろう！ - 姫野ベンチを今度こそ動かす - 電子計算記

姫野ベンチ難しいですね。
前回は C + MPI, static allocate version を使ってましたが、
今回は Fortran90 + MPI をやってみましょう。

これまでのインストールの流れだと、Fortranの実行環境やビルド環境はインストールされていないのでまずはそこから。全ノードでインストールするか1台に入れてテンプレートから複製してください。

root@compute-1:~# apt install gfortran -y

Cのときと同じようにもってきて展開します。

mpiuser@compute-1:~$ wget http://accc.riken.jp/wp-content/uploads/2015/07/f90_xp_mpi.zip
mpiuser@compute-1:~$ unzip f90_xp_mpi.zip 
mpiuser@compute-1:~$ lha xw=/nfs/himeno-f90 f90_xp_mpi.lzh

コンパイルは、ここではCのときにならって-O3の最適化だけ入れます。

mpiuser@compute-1:~$ cd /nfs/himeno-f90/
mpiuser@compute-1:/nfs/himeno-f90$ mpif90 -O3 himenoBMTxpr.f90

実行もCのときと同じです。ただ、今回はstatic allocate versionではないので実行後に入力します。ここでは、Mサイズの4並列の例。

mpiuser@compute-1:/nfs/himeno-f90$ mpirun -np 4 --hostfile ~/my_hosts /nfs/himeno-f90/a.out 
 For example:
 Grid-size= 
            XS  (64x32x32)
            S   (128x64x64)
            M   (256x128x128)
            L   (512x256x256)
            XL  (1024x512x512)
  Grid-size = 
M

 For example: 
 DDM pattern= 
      1 1 2
      i-direction partitioning : 1
      j-direction partitioning : 1
      k-direction partitioning : 2
  DDM pattern = 
1 2 2

 Sequential version array size
  mimax=         257  mjmax=         129  mkmax=         129
 Parallel version  array size
  mimax=         257  mjmax=          66  mkmax=          66
  imax=         256  jmax=          65  kmax=          65
  I-decomp=            1  J-decomp=            2  K-decomp=            2

  Start rehearsal measurement process.
  Measure the performance in 3 times.
   MFLOPS:   9023.0592584404149        time(s):   4.5584917068481445E-002   1.70304556E-03
 Now, start the actual measurement process.
 The loop will be excuted in        3948  times.
 This will take about one minute.
 Wait for a while.
  Loop executed for         3948  times
  Gosa :   2.27055614E-04
  MFLOPS:   7828.2245009397411        time(s):   69.146085023880005     
  Score based on Pentium III 600MHz :   94.4981308

では、前回と同じように、HighCPU.M4を16台ならべた結果のグラフです。

f:id:fujish:20180105234752p:plain