alt

Как установить SHOC для GPU?

INSTALL SHOC 1.1.5

Подробнее об установке OpenMPI, CUDA.

$ module add openmpi/v4.0.3
$ module add cuda/v10.1
$ git clone https://github.com/vetter/shoc.git
$ cd shoc
$ cat config/conf-test.sh
  #!/bin/sh
  sh ./configure \
  CPPFLAGS="-I/nfs/software/cuda/v10.1/include" \
  CUDA_CPPFLAGS="-gencode=arch=compute_70,code=sm_70"
$ sh ./config/conf-test.sh
$ make
$ make install
$ perl tools/driver.pl -cuda -s 4 -d 0
  --- Welcome To The SHOC Benchmark Suite version 1.1.5 ---
  Hostname: hostname
  Platform selection not specified, default to platform #0
  Number of available platforms: 1
  Number of available devices on platform 0 : 4
  Device 0: 'Tesla V100-SXM2-32GB'
  Device 1: 'Tesla V100-SXM2-32GB'
  Device 2: 'Tesla V100-SXM2-32GB'
  Device 3: 'Tesla V100-SXM2-32GB'
  Specified 4 device IDs: 0
  Using size class: 4

  --- Starting Benchmarks ---
  Running benchmark BusSpeedDownload
      result for bspeed_download:                 12.3180 GB/sec
  Running benchmark BusSpeedReadback
      result for bspeed_readback:                 13.1676 GB/sec
  Running benchmark MaxFlops
      result for maxspflops:                    15548.4000 GFLOPS
      result for maxdpflops:                    7837.9000 GFLOPS
  Running benchmark DeviceMemory
      result for gmem_readbw:                    790.3370 GB/s
      result for gmem_readbw_strided:            469.8600 GB/s
      result for gmem_writebw:                   726.6530 GB/s
      result for gmem_writebw_strided:            53.4400 GB/s
      result for lmem_readbw:                   9527.5400 GB/s
      result for lmem_writebw:                  10578.7000 GB/s
      result for tex_readbw:                    1580.6200 GB/sec
  Skipping non-cuda benchmark KernelCompile
  Skipping non-cuda benchmark QueueDelay
  Running benchmark FFT
      result for fft_sp:                        2299.0700 GFLOPS
      result for fft_dp:                        1146.0300 GFLOPS
  Running benchmark GEMM
      result for sgemm_n:                       14587.1000 GFlops
      result for dgemm_n:                       6432.5500 GFlops
  Running benchmark MD
      result for md_sp_flops:                    938.3610 GFLOPS
      result for md_dp_flops:                    846.9970 GFLOPS
  Running benchmark MD5Hash
      result for md5hash:                         34.5492 GHash/s
  Running benchmark Reduction
      result for reduction:                      303.6210 GB/s
      result for reduction_dp:                   513.4440 GB/s
  Running benchmark Scan
      result for scan:                           174.1060 GB/s
      result for scan_dp:                        175.7370 GB/s
  Running benchmark Sort
      result for sort:                            20.0892 GB/s
  Running benchmark Spmv
      result for spmv_csr_scalar_sp:              63.3591 Gflop/s
      result for spmv_csr_vector_sp:             151.2000 Gflop/s
      result for spmv_ellpackr_sp:                80.3836 Gflop/s
      result for spmv_csr_scalar_dp:              45.5877 Gflop/s
      result for spmv_csr_vector_dp:             111.4730 Gflop/s
      result for spmv_ellpackr_dp:                65.9179 Gflop/s
  Running benchmark Stencil2D
      result for stencil:                        643.0910 GFLOPS
      result for stencil_dp:                     522.7100 GFLOPS
  Running benchmark Triad
      result for triad_bw:                        16.3517 GB/s
  Running benchmark S3D
      result for s3d:                            428.6150 GFLOPS
      result for s3d_dp:                         225.6930 GFLOPS
  Running benchmark QTC
      result for qtc:                              5.5839 s
      result for qtc_kernel:                       4.8583 s

1-2. Подключим нужные модули
3-4. Скачаем SHOC и перейдем в директорию
5. Создадим конфигурационный файл, где укажем версию Compute Capability для GPU
10-12. Установим SHOC
13. Запустим SHOC