Using mixed data sets

Is there a way to input a set of images and texts to the same model? as in one input data point would be an image and a sentence corresponding to that image.

This is not currently possible to input image and text together. We have multiple input/output support in our roadmap. With that feature, it will be possible in input different types of data as the input to DLS.