ぴあぴあゆーとぴあ

雑記ブログ

GPU版TensorFlowの環境構築

前回に引き続いて、pythonの開発環境について書いていこうかなと思います。
今回はTensorFlowの導入について書いていきたいなと思います。
と言っても、ほとんどは引用ですが。。。

マシンのスペック

筆者は研究室のGPUを使わせてもらってます。
製品一覧| UNIV|大学・研究機関向けオーダーメイドPC
詳しくはあんまりわからないのですが、MAS-XE5-SV1U/4X K80搭載モデルと同等のもののようです。
OS Ubuntu 14.04 x86_64
CUDA V8.0.44
cuDNN 5

tensorflowの導入

上記のマシンにGPU版Tensorflowを導入しました。
Python: Keras/TensorFlow の学習を GPU で高速化する (Ubuntu 16.04 LTS) - CUBE SUGAR CONTAINER
基本的には上記のページと公式サイトを見ながらやっていくだけです。
引っかかった部分としては、「pip3 install tensorflow」をしたあとに
URLを指定して「pip3 install --upgrade 'URL'」とするのですがcuDNN5対応のURLが見つからなくて苦労しました。
cuDNN6へアップグレードするのが良さそうなのですが、共有のGPUでその辺をいじるとめんどくさそうだったため一旦諦めました。
公式ホームページをもうちょっと探して見たらところ下のURLにありました。
Download and Setup  |  TensorFlow

tensorflowの性能比較

GPU版と、筆者の持っているmac book pro(2.8 GHz Intel Core i7)でMNISTの学習時間を比べて見ました。
https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_cnn.py
使用したプログラムはこいつです。kerasですがバックエンドはtensorflowです。

CPU

Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2017-10-26 15:09:42.378154: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 15:09:42.378197: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 15:09:42.378206: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 15:09:42.378213: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
60000/60000 [==============================] - 96s - loss: 0.3160 - acc: 0.9054 - val_loss: 0.0741 - val_acc: 0.9760
Epoch 2/12
60000/60000 [==============================] - 94s - loss: 0.1083 - acc: 0.9682 - val_loss: 0.0501 - val_acc: 0.9835
Epoch 3/12
60000/60000 [==============================] - 101s - loss: 0.0821 - acc: 0.9750 - val_loss: 0.0439 - val_acc: 0.9856
Epoch 4/12
60000/60000 [==============================] - 100s - loss: 0.0695 - acc: 0.9787 - val_loss: 0.0387 - val_acc: 0.9870
Epoch 5/12
60000/60000 [==============================] - 97s - loss: 0.0598 - acc: 0.9815 - val_loss: 0.0389 - val_acc: 0.9871
Epoch 6/12
60000/60000 [==============================] - 100s - loss: 0.0549 - acc: 0.9837 - val_loss: 0.0322 - val_acc: 0.9889
Epoch 7/12
60000/60000 [==============================] - 108s - loss: 0.0490 - acc: 0.9852 - val_loss: 0.0306 - val_acc: 0.9900
Epoch 8/12
60000/60000 [==============================] - 97s - loss: 0.0441 - acc: 0.9872 - val_loss: 0.0334 - val_acc: 0.9894
Epoch 9/12
60000/60000 [==============================] - 98s - loss: 0.0422 - acc: 0.9872 - val_loss: 0.0286 - val_acc: 0.9902
Epoch 10/12
60000/60000 [==============================] - 98s - loss: 0.0406 - acc: 0.9882 - val_loss: 0.0283 - val_acc: 0.9903
Epoch 11/12
60000/60000 [==============================] - 98s - loss: 0.0381 - acc: 0.9885 - val_loss: 0.0288 - val_acc: 0.9898
Epoch 12/12
60000/60000 [==============================] - 97s - loss: 0.0361 - acc: 0.9889 - val_loss: 0.0271 - val_acc: 0.9908
Test loss: 0.0271335351378
Test accuracy: 0.9908

real    20m0.348s
user    98m0.099s
sys 14m46.822s

GPU

Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:81:00.0
Total memory: 22.38GiB
Free memory: 949.94MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x5440c20
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: 
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:82:00.0
Total memory: 22.38GiB
Free memory: 960.94MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P40, pci bus id: 0000:82:00.0)
60000/60000 [==============================] - 8s - loss: 0.3210 - acc: 0.9032 - val_loss: 0.0768 - val_acc: 0.9752
Epoch 2/12
60000/60000 [==============================] - 5s - loss: 0.1149 - acc: 0.9661 - val_loss: 0.0543 - val_acc: 0.9824
Epoch 3/12
60000/60000 [==============================] - 5s - loss: 0.0842 - acc: 0.9750 - val_loss: 0.0441 - val_acc: 0.9852
Epoch 4/12
60000/60000 [==============================] - 5s - loss: 0.0710 - acc: 0.9788 - val_loss: 0.0426 - val_acc: 0.9851
Epoch 5/12
60000/60000 [==============================] - 5s - loss: 0.0622 - acc: 0.9815 - val_loss: 0.0371 - val_acc: 0.9871
Epoch 6/12
60000/60000 [==============================] - 5s - loss: 0.0554 - acc: 0.9835 - val_loss: 0.0331 - val_acc: 0.9882
Epoch 7/12
60000/60000 [==============================] - 5s - loss: 0.0513 - acc: 0.9848 - val_loss: 0.0347 - val_acc: 0.9875
Epoch 8/12
60000/60000 [==============================] - 5s - loss: 0.0480 - acc: 0.9858 - val_loss: 0.0325 - val_acc: 0.9887
Epoch 9/12
60000/60000 [==============================] - 5s - loss: 0.0448 - acc: 0.9871 - val_loss: 0.0339 - val_acc: 0.9889
Epoch 10/12
60000/60000 [==============================] - 5s - loss: 0.0424 - acc: 0.9875 - val_loss: 0.0309 - val_acc: 0.9901
Epoch 11/12
60000/60000 [==============================] - 5s - loss: 0.0379 - acc: 0.9884 - val_loss: 0.0329 - val_acc: 0.9896
Epoch 12/12
60000/60000 [==============================] - 5s - loss: 0.0380 - acc: 0.9883 - val_loss: 0.0302 - val_acc: 0.9904
Test loss: 0.0302223094942
Test accuracy: 0.9904

real    1m14.527s
user    1m37.863s
sys 0m22.903s

感想

CPU版では20分かかった処理が、GPUでは1分ちょっとで終わってますね。
これは色々捗りそうです。
なんだかんだ3日ほどかかって環境構築が終わり、ほっとしました。
次はjupyter notebookについて記事を書くかもです。