Question - Does the dataset spitting respects class representation in the dataset?


#1

After setting the train/ validation/test spit parameter to say 80% - 10% - 10% does the dataset get split in fathion to make sure that the n class in the dataset are represented with roughly the same percentage in each set 100/n ? Same question if the Shuffle Data option is On ?


#2

We do not do any balancing during splitting the dataset. So 80% - 10% - 10% means first 80% of the dataset would be the training data, next 10% would be the validation data and remaining 10% would be the test data.

With Shuffle Data option on, indexes are first shuffled before dividing them.


#3

Well maybe that should be an option. Because class imbalance is likely to impact the training for small dataset with many classes.