IDC Breast cancer data upload to csv

I’m working on the example of IDC Breast cancer using deepcognition.ai by Favio but after downloading the image dataset to my desktop DLS. I am not sure how to load it into the desktop software. The 1.48 GB dataset has many folders and I need to use a train.csv file to manually enter the relative path of each image?

Hi @Just4jcgeorge

I was preparing csv file using python scripting, this is common scenario in ML practice. Take a look here:

  1. Listing files with Python
    https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
  2. Working with scv files in Python
    https://docs.python.org/3/library/csv.html

DLS inslalls python 3.5.2 in your system, it located (under Win10) here:
C:\users\YOUR_USERNAME\AppData\Local\Programs\DeepLearningStudio\conda3\python.exe

Consider the dataset with labels of two classes having String names ‘LabelClassA’ and ‘LabelClassB’:

dataset root folder MyDataset:

file: train.csv
folder: Images:

  • folder A:
    • img000.jpg
    • img001.jpg
  • folder B:
    • folder C000:
      • img111.jpg
      • img432.jpg
    • folder D:
      • 4.jpg
      • image.99.jpg

train.csv content:

Image,Label
./Images/A/img000.jpg,LabelClassA
./Images/A/img001.jpg,LabelClassB
./Images/B/C000/img111.jpg,LabelClassA
./Images/B/C000/img432.jpg,LabelClassB
./Images/B/D/4.jpg,LabelClassB
./Images/B/D/image99.jpg,LabelClassA

So correct csv file would be placed in the dataset’s root folder and encode relative path to images, whatever deep sub-directory structure ad filenames.

1 Like