Hello. I have a dataset with 1500 time series with 900 individual time steps and 300 features for each time step. I would like to use the data for a network that starts with an embedding layer that then feeds into a RNN for sequence learning (option 1).
Alternatively (option 2) I could train a standard NLP with an embedding layer and then feed that output into a different network with RNN layers if the first option is not possible. Would you be able to descrive what shape/format the data needs to have for this to work?
Many thanks
1 Like
Hi tmontana,
Were you able to resolve this ? If yes, would you mind sharing your approach for others to benefit.
If it is still unresolved, here is one possible way to format your data:
-
Load data using Pandas into Numpy array. current shape = (15000,300)
-
Divide your time-series data into sub-sequences (lets say sequences of X time steps). shape = (15000//X, X, 300)
3.Save each sub-sequence using numpy.savez_compressed()
-
Create a train.csv file with all the .npz files
-
While modeling, you can use following layers:
a. Input (shape=(None, X, 300))
b. Flatten (shape=(None, X300))
c. Embedding(shape=(None,X300, Y))
d. Reshape(shape=(None, X, 300*Y))
e. LSTM (shape=(None, Z))
I hope this is useful.
Regards
Rajendra
1 Like