Data format for input to RNN layer / Embedding layer


#1

Hello. I have a dataset with 1500 time series with 900 individual time steps and 300 features for each time step. I would like to use the data for a network that starts with an embedding layer that then feeds into a RNN for sequence learning (option 1).

Alternatively (option 2) I could train a standard NLP with an embedding layer and then feed that output into a different network with RNN layers if the first option is not possible. Would you be able to descrive what shape/format the data needs to have for this to work?

Many thanks


#2

Hi tmontana,

Were you able to resolve this ? If yes, would you mind sharing your approach for others to benefit.

If it is still unresolved, here is one possible way to format your data:

  1. Load data using Pandas into Numpy array. current shape = (15000,300)

  2. Divide your time-series data into sub-sequences (lets say sequences of X time steps). shape = (15000//X, X, 300)

3.Save each sub-sequence using numpy.savez_compressed()

  1. Create a train.csv file with all the .npz files

  2. While modeling, you can use following layers:
    a. Input (shape=(None, X, 300))
    b. Flatten (shape=(None, X300))
    c. Embedding(shape=(None,X
    300, Y))
    d. Reshape(shape=(None, X, 300*Y))
    e. LSTM (shape=(None, Z))

I hope this is useful.

Regards
Rajendra