IDC Breast cancer data upload to csv

I’m working on the example of IDC Breast cancer using by Favio but after downloading the image dataset to my desktop DLS. I am not sure how to load it into the desktop software. The 1.48 GB dataset has many folders and I need to use a train.csv file to manually enter the relative path of each image?

Hi @Just4jcgeorge

I was preparing csv file using python scripting, this is common scenario in ML practice. Take a look here:

  1. Listing files with Python
  2. Working with scv files in Python

DLS inslalls python 3.5.2 in your system, it located (under Win10) here:

Consider the dataset with labels of two classes having String names ‘LabelClassA’ and ‘LabelClassB’:

dataset root folder MyDataset:

file: train.csv
folder: Images:

  • folder A:
    • img000.jpg
    • img001.jpg
  • folder B:
    • folder C000:
      • img111.jpg
      • img432.jpg
    • folder D:
      • 4.jpg
      • image.99.jpg

train.csv content:


So correct csv file would be placed in the dataset’s root folder and encode relative path to images, whatever deep sub-directory structure ad filenames.

1 Like