Not able to see gpu on the UI while training


#1

/home/infra# docker logs -f acf2319f6ef3
Starting redis-server: redis-server.

  • Starting nginx nginx
    …done.
    Checking GPU support…GPU supported
    skipping cifar dataset download!
    skipping IMDB dataset download!
    skipping MNIST dataset download!
    skipping reuters dataset download!
    No changes detected
    Using MXNet backend.
    /usr/local/lib/python3.5/dist-packages/allauth/account/templatetags/account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
    DeprecationWarning)
    /usr/local/lib/python3.5/dist-packages/allauth/socialaccount/templatetags/socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
    " {% load socialaccount %}", DeprecationWarning)
    Using MXNet backend.
    /usr/local/lib/python3.5/dist-packages/allauth/account/templatetags/account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
    DeprecationWarning)
    /usr/local/lib/python3.5/dist-packages/allauth/socialaccount/templatetags/socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
    " {% load socialaccount %}", DeprecationWarning)
    Operations to perform:
    Apply all migrations: account, admin, auth, authtoken, automl, contenttypes, environments, project, projects, reversion, sessions, sites, socialaccount
    Running migrations:
    No migrations to apply.
    ln: failed to create symbolic link ‘/var/www/files/data’: File exists
    [2019-01-22 11:34:51 +0000] [211] [INFO] Starting gunicorn 19.6.0
    [2019-01-22 11:34:51 +0000] [211] [INFO] Listening at: http://127.0.0.1:8000 (211)
    [2019-01-22 11:34:51 +0000] [211] [INFO] Using worker: threads
    [2019-01-22 11:34:51 +0000] [226] [INFO] Booting worker with pid: 226
    [I 11:34:52.396 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.5/dist-packages/jupyterlab
    [I 11:34:52.396 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
    [W 11:34:52.399 LabApp] JupyterLab server extension not enabled, manually loading…
    [I 11:34:52.399 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.5/dist-packages/jupyterlab
    [I 11:34:52.399 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
    [I 11:34:52.401 LabApp] Serving notebooks from local directory: /data/1
    [I 11:34:52.402 LabApp] The Jupyter Notebook is running at:
    [I 11:34:52.402 LabApp] http://(acf2319f6ef3 or 127.0.0.1):8888/?token=…
    [I 11:34:52.402 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [W 11:34:52.402 LabApp] No web browser found: could not locate runnable browser.
    Using MXNet backend.
  • Running on http://127.0.0.1:6666/ (Press CTRL+C to quit)

^C

root@acf2319f6ef3:/home/app# nvidia-smi
Tue Jan 22 11:40:24 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | 0 |
| N/A 41C P0 53W / 300W | 510MiB / 32510MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | 0 |
| N/A 41C P0 58W / 300W | 510MiB / 32510MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+


#2

Some more observations.

This seems more like a UI bug. If my nic name is : eno1, UI shows available GPUs. But if nic name is br0 or enp3s0f1 no GPUs shown.

However, we can see gpu inside container, and jupyter notebook.