I have a sequence data like

```
X1, X2, X3, X4, X5 -> y1,y2,y3,y4,y5
X6,X7,X8 -> y6, y7, y8
```

Where Xi is m x n dimension matrix, n is the number of columns (features in my data, which is same in all matrices Xi), whereas m is no of rows that varies. yi is the class label of size 1 x 1. I try to elaborate my problem with the help of diagram, in which X is 4D input matrix and Y is 2D output matrix.

It seems to me many-to-one architecture, where a matrix Xi outputs one class label yi, e.g (X1 -> y1). Length of sequence also varies, e,g first sequence has length 5 (X1, X2, X3, X4, X5), and second sequence has length 3 (X6,X7,X8). No of examples (rows) also varies in each instance of a sequence. E,g X1 has 20 rows, X2 has 30 rows etc.

My questions is

1- Do I need make the lengths of all sequences equals? If maximum sequence has length 5, then I should make all sequences of lengths 5 by padding with 0 value. Can this be solved using DLS?

2- What would be shape of input and output for LSTM?

3- As the nature of data is sequential, so I though LSTM is more useful for this purpose. Can this problem be handled only using LSTM, or I need to used other architectures as well?