[solved] DLS 2.0.7 GPU not supported, DLS does not train models and internal Error

Hello everybody, I want to congratulate you for this wonderful deep learning application. However, I have some problems running it, and I need your help. Although I have installed Anaconda and tensorflow-gpu via pip and cuda 9 + cudnn and in fact can run deep learning in python, I cant execute anything on Deep Learning Studio. At first, the log file says GPU not supported and on top of that when I try to train a model from the samples provided by the installation I got a popup message which says internal error. Before this message emerges, a message appears which says “Connecting to compute server …” I use windows 10 OS

Please, any help would be very appreciated to start working with Deep Learning Studio because I find it very useful to teach myself Deep Learning.

Thank you in advance.

Regarding the GPU not supported message, couple of other users have also reported that same issue. We have a new release coming up in next few days which should address this issue.

In the meantime, you can download the following zip file.

MxNet.zip

Extract it and copy the two DLLs files included in this zip to <DLS_INSTALL_LOCATION>\mxnet.

Regarding the 2nd issue, it seemed like the compute server process didn’t start. Can you provide logs from the log tab ?

Hello rajendra1,

Thank you for the quick reply! I used the MxNet.zip, and I pasted the two files at the respective location. The first issue is resolved that means I have GPU support. However, the problem with the popup error still continues. I got a message which says; “Connecting to compute server …”.

Just a quick note, I used in the past a previous version I think it was the 1.5.1 with cuda8 support. That version worked seamlessly. But a couple of weeks ago I formatted my pc and I downloaded the 2.0.7 version and I stumbled across these two issues. It is positive that the GPU support is back :slight_smile:

I provide the log file as well:

Starting Deep Learning Studio…
Checking GPU support…
GPU supported
skipping cifar dataset download!
skipping IMDB dataset download!
skipping MNIST dataset download!
skipping reuters dataset download!
Using MXNet backend.
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)
No changes detected
Operations to perform:
Apply all migrations: account, admin, auth, authtoken, automl, contenttypes, environments, project, projects, reversion, sessions, sites, socialaccount
Running migrations:
No migrations to apply.
Using MXNet backend.
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)
Using MXNet backend.
[I 09:01:41.363 NotebookApp] Serving notebooks from local directory: C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\data\1
[I 09:01:41.363 NotebookApp] 0 active kernels
[I 09:01:41.366 NotebookApp] The Jupyter Notebook is running at:
[I 09:01:41.366 NotebookApp] http://127.0.0.1:8886/?token=
[I 09:01:41.367 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Starting server worker…

Using MXNet backend.

  • Running on http://127.0.0.1:6666/ (Press CTRL+C to quit)
    Server worker has been started. [ID: 1 | PID: 13588]

Server worker is ready to accept messages. [ID: 1 | PID: 13588]

[2018-05-12T06:01:46.631Z] INFO: Theia/13588 on DESKTOP-FBN3V48: Theia app listening on http://0.0.0.0:8899. []

Received message which is neither a response nor a notification message:
“8899”

[2018-05-12T06:01:47.493Z] INFO: Theia/13588 on DESKTOP-FBN3V48:
[nsfw-watcher: 10328] Started watching: c:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\home\theia\examples\browser\package.json
[]

[12/May/2018:09:01:49] ENGINE Bus STARTING
[12/May/2018:09:01:50] ENGINE Serving on http://0.0.0.0:8000
[12/May/2018:09:01:50] ENGINE Bus STARTED

After a fresh install and pasting the 2 dll MxNet files, still, I get the same error: Connecting to compute server …

I attach the log file during the first execution of the Deep Learning Studio 2.0.7:

Starting Deep Learning Studio…
Checking GPU support…
GPU supported
[19:55:54] downloading CIFAR dataset…
[19:59:47] Downloaded CIFAR dataset.
[19:59:48] downloading IMDB dataset…
[19:59:56] Downloaded IMDB dataset.
initiated datasets repo at: C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\home\app.pydataset/
Generated dataset: titanic
Generated dataset: iris
Pydatasets installed
[20:00:30] downloading MNIST dataset…
[20:03:42] Downloaded MNIST dataset.
[20:03:47] downloading reuters dataset…
[20:03:50] Downloaded reuters dataset.
Using MXNet backend.
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)
Migrations for ‘projects’:
projects\migrations\0001_initial.py:
- Create model Project
Migrations for ‘project’:
project\migrations\0001_initial.py:
- Create model dataModel
- Create model deepModel
- Create model importModel
- Create model paramsModel
- Create model testModel
- Create model trainingModel
Migrations for ‘environments’:
environments\migrations\0001_initial.py:
- Create model EnvironmentModel
Migrations for ‘automl’:
automl\migrations\0001_initial.py:
- Create model AutoMLModel
Operations to perform:
Apply all migrations: account, admin, auth, authtoken, automl, contenttypes, environments, project, projects, reversion, sessions, sites, socialaccount
Running migrations:
Applying contenttypes.0001_initial… OK
Applying auth.0001_initial… OK
Applying account.0001_initial… OK
Applying account.0002_email_max_length… OK
Applying admin.0001_initial… OK
Applying admin.0002_logentry_remove_auto_add… OK
Applying contenttypes.0002_remove_content_type_name… OK
Applying auth.0002_alter_permission_name_max_length… OK
Applying auth.0003_alter_user_email_max_length… OK
Applying auth.0004_alter_user_username_opts… OK
Applying auth.0005_alter_user_last_login_null… OK
Applying auth.0006_require_contenttypes_0002… OK
Applying auth.0007_alter_validators_add_error_messages… OK
Applying auth.0008_alter_user_username_max_length… OK
Applying authtoken.0001_initial… OK
Applying authtoken.0002_auto_20160226_1747… OK
Applying automl.0001_initial… OK
Applying environments.0001_initial… OK
Applying project.0001_initial… OK
Applying projects.0001_initial… OK
Applying reversion.0001_squashed_0004_auto_20160611_1202… OK
Applying sessions.0001_initial… OK
Applying sites.0001_initial… OK
Applying sites.0002_alter_domain_unique… OK
Applying socialaccount.0001_initial… OK
Applying socialaccount.0002_token_max_lengths… OK
Applying socialaccount.0003_extra_data_default_dict… OK
Using MXNet backend.
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)
loading initial db
Using MXNet backend.
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)
Installed 2 object(s) from 1 fixture(s)
Using MXNet backend.
Using MXNet backend.

Server worker has been started. [ID: 1 | PID: 13580]

[12/May/2018:20:05:34] ENGINE Bus STARTING
[12/May/2018:20:05:34] ENGINE Serving on http://0.0.0.0:8000
[12/May/2018:20:05:34] ENGINE Bus STARTED
[I 20:05:39.677 NotebookApp] Writing notebook server cookie secret to C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\home\app.jupyter\data\runtime\notebook_cookie_secret
Server worker is ready to accept messages. [ID: 1 | PID: 13580]

[2018-05-12T17:05:46.615Z] INFO: Theia/13580 on DESKTOP-FBN3V48: Theia app listening on http://0.0.0.0:8899. []

Received message which is neither a response nor a notification message:

“8899”

[2018-05-12T17:05:47.722Z] INFO: Theia/13580 on DESKTOP-FBN3V48:
[nsfw-watcher: 2600] Started watching: c:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\home\theia\examples\browser\package.json
[]

[I 20:05:51.053 NotebookApp] Serving notebooks from local directory: C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\data\1
[I 20:05:51.054 NotebookApp] 0 active kernels
[I 20:05:51.054 NotebookApp] The Jupyter Notebook is running at:
[I 20:05:51.054 NotebookApp] http://127.0.0.1:8896/?token=
[I 20:05:51.055 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\account\templatetags\account_tags.py:4: DeprecationWarning: {% load account_tags %} is deprecated, use {% load account %}
DeprecationWarning)
C:\Users\praxitelis\AppData\Local\Programs\DeepLearningStudio\conda3\lib\site-packages\allauth\socialaccount\templatetags\socialaccount_tags.py:4: DeprecationWarning: {% load socialaccount_tags %} is deprecated, use {% load socialaccount %}
" {% load socialaccount %}", DeprecationWarning)

The log is not showing any error.

Can you type http://127.0.0.1:8891/status in the browser (assuming you are running Deep Learning Studio on port 8890) and check if there is any output received?

This will confirm if compute server process is up or down?

Hello Rajendra, and thank again for the support!

Unfortunately in my chrome browser I typed the address 127.0.0.1:8891/status and I got a message which says: 504 Gateway Time-out, nginx/1.13.7 and the Deep Learning Studio runs on 8890 port :frowning:

ok. this confirms the compute server is not running. You can try running it manually to see what error you are getting.

To run the compute server manually:

  1. start bash shell (double click on bash.exe in <DLS_INSTALL_FOLDER>\usr\bin)

  2. export few environment variables:

    export PATH=/usr/bin:/conda3:$PATH
    export PYTHONHOME=/conda3
    export GPU_ENABLED=1
    export HOME=/home/app
    export PYTHONPATH=/home/app

  3. start compute server:
    cd /home/app
    ./app.so &

Hello Rajendra, I thank you again for the help and I owe you one! I followed your instructions and the steps you mentioned above, I started the compute server and a message appeared inside the Bash which says
"$ (10224) wsgi starting up on http://0.0.0.0:5000".

While I kept opened the bash I started the Deep Learning Studio application and unfortunately still I can’t train a model and the same error appears, I stuck at the phase where it says "Connecting to compute Server"

Moreover, the link http://127.0.0.1:8891/status still provides the same message:
504 Gateway Time-out nginx/1.13.7

Thanks for trying out the steps. So there is no error in starting the server. I am wondering if it is silently existing after some time.

could you do following:

  1. run the following in bash terminal to check if app.so is still running

ps ax |grep app.so

  1. If it is running, go to http://127.0.0.1:5000/status in browser to check its status.

Dear Rajerdra, I finally found the culprit! It was Avast free antivirus all the time. Somehow silently avast blocked the “compute Server”. I deactivated avast’s shields and after an uninstall-reinstall of DLS, and now I can use the Deep Learning Studio! I trained some models and they work fine now! :smiley:

I deeply apologize for the inconvenience and for ruining your weekend :blush:

There is a small issue :slight_smile: I pasted the MxNet .dll files and in the log file is appeared that GPU support is activated, however the “Training” tab shows that the nvidia’s memory is reserved for this process,
but the GPU load is 0%. Do you think that my gpu actually is used? In task manager at the Nvidia GPU tab also shows 0% load, but its compute_0 window shows 60-70% load. Do you believe this may be a driver issue? Should I update my Nvidia driver?

I attach for help an image during training with DLS:

I am glad you have resolved your problem. And you didn’t ruin my weekend. I love helping out and to understand what issues our users are facing so that we can do a better job.

On the next issue, do you see number of GPU dropdown on this training tab ?

Couple of possibilities:

1.Training load is small causing GPU usage to be low. GPU load can be increased by increasing the batch size.

2.Check in windows task manager, if GPU usage is correctly shown in task manager.

  1. It is possible driver is not reporting the usage properly. You can either update your driver or use some utility which can monitor gpu load better

Hello Rajendra, long time no see! :smiley:

I want to thank you again for all the help and support. Finally, I certainly can say that my GPU works with Deep Learning Studio 2.0.7. I have downloaded 2 monitor programs: gpu-z and open-hardware-monitor and both show that during training the gpu is actually used by the application. Moreover, I updated my Nvidia driver to the latest release.

I attach the results from both monitoring tools:

Just the DLS 2.0.7 does not show that the GPU is used. It is stuck to 0.00%. Maybe it is just a minor bug.

Nevertheless the bug, I have to congratulate the community of people behind the development of Deep Learning Studio! Keep going with the exceptional services.

Praxitelis-Nikolaos Kouroupetroglou.

Thanks for your comments!

Glad to hear that GPU issue is resolved. Regarding the usage not being shown, it could be the library we are using (provided by NVIDIA) has an issue (it is little outdated). We have not seen this issue ourselves on latest GPUs.