Problem with dataset


#1

Problema%20dataset

Hello, as you can see, I’m having a problem using a custom dataset.

I followed the instructions for creating a custom dataset in youtube. Then i created a project, selected the new dataset, and when i switch between tabs in the project, this error pops up.

I have checked the file in the file browser and i see no problem. :frowning:


#2

Hi Chema,

From the error message, it looks that you are using ; as separator in the CSV file. You need to use , as the field separator in your CSV file.

Regards
Rajendra


#3

Ok, that was it :sweat_smile:

Now I’m facing another problem when I try to train the project:

Problema%20project

The model I’m using is based on the MNIST project.


#4

Are all the images of same size ? Could you try enabling resize option (in data tab, click on the image column to see resize option)?

resize option will make sure that all images are resized to given size before feeding it to the network.


#5

All the images are 780x450. Anyway, I enabled the resize option and didn’t work.

These are my data and model tabs:
Problema%20project2
Problema%20project3


#6

I don’t see any issue with your config. What is your batch size ? Is there any error message in Logs tab when you start the training ?

Also let me know your system spec (RAM and GPU).

Could you try by resizing it to small value (lets says 32x32) to rule out system running out of resources?


#7

My batch size is the default I think, 32:

Problema%20project4

CPU: Intel Core i7-6700 @ 3.4
RAM: 8GB
GPU: Nvidia GeForce GT 730

I tried with 32x32 size and still fails.

I don’t know where is the Logs tab


#8

I found the logs tab :sweat_smile:, but no message appears after starts training.


#9

Is it possible for you to upload your project (export/import model) and dataset in the cloud version? That would help use investigate this issue better.


#10

Ok, i did already and the same error happens. I also tried with the the AutoML model with no success.


#11

Hi Chema,

It seems that your train.csv has filepaths which do not exist in the filesystem. For example, following file mentioned in train.csv is not found in the dataset.

“./9001 39-5 Izquierda/Suela702 caramelo 39-5 3 4.bmp”

Can you make sure each and every file mentioned in the train.csv is uploaded/present in the filesystem?

Regards
Rajendra


#12

Hi,

It seems my project is finally working. Thanks for the help.

I get after training a 0.9 accuracy,which is ok taking into account my dataset is still very small.

image

However, in the dataset inference i get always the same prediction :frowning:

image

Any idea?


#13

You may want to use a CNN based network. Either design it yourself or use AutoML/Pretrained models. Your image size is big so you may have to resize it to fit on GPU.

Enable augmentation since you have small dataset. Also turn on shuffle since it seems your validation dataset has mostly different classes than training.


#14

Ok, enabling augmentation and taking a pretrained preprocess, I can get a wider range of results. This is completely coherent, because my current dataset is in fact, too small.

So the question is, is training accuracy value useless when my dataset is small?


#15

For small dataset, a network can learn to remember all the input (if network is big enough) so you will get very good training accuracy but bad accuracy on data which network has not seen. Which is what is happening in your case.

In general, you should look at validation accuracy as a measure of if network has learned the features or not.


#16

I see…, that explains it all. I’m new in this world d: