Importing Embedding matrices


#1

Hi,

Wanted to ask if there is any way to import embedding matrices, e.g. created from Glove or fasttext embeddings. A more general question would be if there is a possibility to import data into DLS with other formats than .csv (e.g. npy, npz, etc.).

Thanks in advance.

Kind regards,
Theodore.


Support of Volumetric Data
#2

Hi Theodore,

If your dataset can not be imported with normal method (.csv with values/filenames), then we support a catchall mechanism by taking your dataset in numpy format.

You can first convert your dataset in the numpy format and save each row data in a numpy file (using below code). Create train.csv with one column with numpy filenames and other column for output data.

e.g.
input_data, output_data
a.npz, label1
b.npz, label2

Following is a sample code of how we save numpy data and load it in DLS.

import numpy as np

a = np.zeros((2, 1))
np.savez_compressed("a.npz",a)

npzfile = np.load("a.npz")
x = npzfile[npzfile.files[0]]

Does this work for your use case ?


#3

Thanks Rajendra,

This makes the whole process easier, for certain models at least.

Will give it a try.

Kind regards,
Theodore.


#4

Hi Theodore, Are you able to import the pretrained wordembeddings like GloVe/Word2Vec? Please let me know the process.


#5

yes for that either save your embedded vector as numpy array or save them into a csv file values separated by semicolon (:wink:
Yes you can import npy file just store the path of the numpy files in the csv similar we do for images.
Thanks


#6

Hi can you give the link of the example “import npy file just store the path of the numpy files in the csv similar we do for images”


#7

Yes


In this article I am saving each 3-d image into a numpy file.
This is the zip file structure

In this dataset I am having 10 categories so 10 different folders.
Inside every folder i am having 1 numpy file for 1 row (sample)

If you follow this file structure then DLS will automatically generate a csv file for your dataset with label as the name of the folders

Below is code to save file as numpy file.


In this labels is the list of 10 folder name
and X is the array in which dataset is stored.
Regards
Rajat