GPU版TensorFlowの環境構築
前回に引き続いて、pythonの開発環境について書いていこうかなと思います。
今回はTensorFlowの導入について書いていきたいなと思います。
と言っても、ほとんどは引用ですが。。。
マシンのスペック
筆者は研究室のGPUを使わせてもらってます。
製品一覧| UNIV|大学・研究機関向けオーダーメイドPC
詳しくはあんまりわからないのですが、MAS-XE5-SV1U/4X K80搭載モデルと同等のもののようです。
OS Ubuntu 14.04 x86_64
CUDA V8.0.44
cuDNN 5
tensorflowの導入
上記のマシンにGPU版Tensorflowを導入しました。
Python: Keras/TensorFlow の学習を GPU で高速化する (Ubuntu 16.04 LTS) - CUBE SUGAR CONTAINER
基本的には上記のページと公式サイトを見ながらやっていくだけです。
引っかかった部分としては、「pip3 install tensorflow」をしたあとに
URLを指定して「pip3 install --upgrade 'URL'」とするのですがcuDNN5対応のURLが見つからなくて苦労しました。
cuDNN6へアップグレードするのが良さそうなのですが、共有のGPUでその辺をいじるとめんどくさそうだったため一旦諦めました。
公式ホームページをもうちょっと探して見たらところ下のURLにありました。
Download and Setup | TensorFlow
tensorflowの性能比較
GPU版と、筆者の持っているmac book pro(2.8 GHz Intel Core i7)でMNISTの学習時間を比べて見ました。
https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_cnn.py
使用したプログラムはこいつです。kerasですがバックエンドはtensorflowです。
CPU
Using TensorFlow backend. x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2017-10-26 15:09:42.378154: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-26 15:09:42.378197: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-10-26 15:09:42.378206: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-26 15:09:42.378213: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 60000/60000 [==============================] - 96s - loss: 0.3160 - acc: 0.9054 - val_loss: 0.0741 - val_acc: 0.9760 Epoch 2/12 60000/60000 [==============================] - 94s - loss: 0.1083 - acc: 0.9682 - val_loss: 0.0501 - val_acc: 0.9835 Epoch 3/12 60000/60000 [==============================] - 101s - loss: 0.0821 - acc: 0.9750 - val_loss: 0.0439 - val_acc: 0.9856 Epoch 4/12 60000/60000 [==============================] - 100s - loss: 0.0695 - acc: 0.9787 - val_loss: 0.0387 - val_acc: 0.9870 Epoch 5/12 60000/60000 [==============================] - 97s - loss: 0.0598 - acc: 0.9815 - val_loss: 0.0389 - val_acc: 0.9871 Epoch 6/12 60000/60000 [==============================] - 100s - loss: 0.0549 - acc: 0.9837 - val_loss: 0.0322 - val_acc: 0.9889 Epoch 7/12 60000/60000 [==============================] - 108s - loss: 0.0490 - acc: 0.9852 - val_loss: 0.0306 - val_acc: 0.9900 Epoch 8/12 60000/60000 [==============================] - 97s - loss: 0.0441 - acc: 0.9872 - val_loss: 0.0334 - val_acc: 0.9894 Epoch 9/12 60000/60000 [==============================] - 98s - loss: 0.0422 - acc: 0.9872 - val_loss: 0.0286 - val_acc: 0.9902 Epoch 10/12 60000/60000 [==============================] - 98s - loss: 0.0406 - acc: 0.9882 - val_loss: 0.0283 - val_acc: 0.9903 Epoch 11/12 60000/60000 [==============================] - 98s - loss: 0.0381 - acc: 0.9885 - val_loss: 0.0288 - val_acc: 0.9898 Epoch 12/12 60000/60000 [==============================] - 97s - loss: 0.0361 - acc: 0.9889 - val_loss: 0.0271 - val_acc: 0.9908 Test loss: 0.0271335351378 Test accuracy: 0.9908 real 20m0.348s user 98m0.099s sys 14m46.822s
GPU
Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: Tesla P40 major: 6 minor: 1 memoryClockRate (GHz) 1.531 pciBusID 0000:81:00.0 Total memory: 22.38GiB Free memory: 949.94MiB W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x5440c20 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: name: Tesla P40 major: 6 minor: 1 memoryClockRate (GHz) 1.531 pciBusID 0000:82:00.0 Total memory: 22.38GiB Free memory: 960.94MiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:81:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P40, pci bus id: 0000:82:00.0) 60000/60000 [==============================] - 8s - loss: 0.3210 - acc: 0.9032 - val_loss: 0.0768 - val_acc: 0.9752 Epoch 2/12 60000/60000 [==============================] - 5s - loss: 0.1149 - acc: 0.9661 - val_loss: 0.0543 - val_acc: 0.9824 Epoch 3/12 60000/60000 [==============================] - 5s - loss: 0.0842 - acc: 0.9750 - val_loss: 0.0441 - val_acc: 0.9852 Epoch 4/12 60000/60000 [==============================] - 5s - loss: 0.0710 - acc: 0.9788 - val_loss: 0.0426 - val_acc: 0.9851 Epoch 5/12 60000/60000 [==============================] - 5s - loss: 0.0622 - acc: 0.9815 - val_loss: 0.0371 - val_acc: 0.9871 Epoch 6/12 60000/60000 [==============================] - 5s - loss: 0.0554 - acc: 0.9835 - val_loss: 0.0331 - val_acc: 0.9882 Epoch 7/12 60000/60000 [==============================] - 5s - loss: 0.0513 - acc: 0.9848 - val_loss: 0.0347 - val_acc: 0.9875 Epoch 8/12 60000/60000 [==============================] - 5s - loss: 0.0480 - acc: 0.9858 - val_loss: 0.0325 - val_acc: 0.9887 Epoch 9/12 60000/60000 [==============================] - 5s - loss: 0.0448 - acc: 0.9871 - val_loss: 0.0339 - val_acc: 0.9889 Epoch 10/12 60000/60000 [==============================] - 5s - loss: 0.0424 - acc: 0.9875 - val_loss: 0.0309 - val_acc: 0.9901 Epoch 11/12 60000/60000 [==============================] - 5s - loss: 0.0379 - acc: 0.9884 - val_loss: 0.0329 - val_acc: 0.9896 Epoch 12/12 60000/60000 [==============================] - 5s - loss: 0.0380 - acc: 0.9883 - val_loss: 0.0302 - val_acc: 0.9904 Test loss: 0.0302223094942 Test accuracy: 0.9904 real 1m14.527s user 1m37.863s sys 0m22.903s
感想
CPU版では20分かかった処理が、GPUでは1分ちょっとで終わってますね。
これは色々捗りそうです。
なんだかんだ3日ほどかかって環境構築が終わり、ほっとしました。
次はjupyter notebookについて記事を書くかもです。