Question - Does the dataset spitting respects class representation in the dataset?


After setting the train/ validation/test spit parameter to say 80% - 10% - 10% does the dataset get split in fathion to make sure that the n class in the dataset are represented with roughly the same percentage in each set 100/n ? Same question if the Shuffle Data option is On ?


We do not do any balancing during splitting the dataset. So 80% - 10% - 10% means first 80% of the dataset would be the training data, next 10% would be the validation data and remaining 10% would be the test data.

With Shuffle Data option on, indexes are first shuffled before dividing them.


Well maybe that should be an option. Because class imbalance is likely to impact the training for small dataset with many classes.