Cardiac arrhythmia is a group of pathological conditions related to abnormalities in the cardiac electrical conduction system which are usually detected by analysing the 12-lead ECG signal. ECG analysis is usually visually performed by an expert physician. However, interpretation is time-consuming and challenging, not only due to the variability between individuals, but also due to the multiple sources of noise, such as breath and motion artefacts. Many automatic algorithms relying to handcrafted features and traditional machine learning classifiers were developed to recognize cardiac diseases. However, a large a priori knowledge about the characteristics of the signals is exploited; furthermore, separate feature extraction, selection and classification steps are performed. In order to overcome these limitations and provide higher decoder performance, recently, deep neural networks (DNNs) were designed, providing an end-to-end framework where the most relevant features are automatically learned. In this study, we designed a lightweight DNN including a convolutional module which learns spatio-temporal features with optimized (in terms of trainable parameters introduced) convolutional layers, a single-layer gate recurrent unit (GRU), an attention module and a fully-connected layer to finalize the 9-way classification. Different convolutional modules were investigated, including single- and multi-branched modules, extracting single- and multi-scale temporal features on the input. The proposed architectures were trained and tested on the dataset of the PhysioNet/Computing in Cardiology Challenge 2020. The training dataset provided by the organizers was divided into a training set (80%) and a validation set (20%) to perform early stopping, by maintaining the same class-frequency of the original data distribution. In the non-official phase of the competition, the best-performing architecture included a single-branch convolutional module, scoring a F2 score of 0.782 and a G2 score of 0.566. Thus, the proposed approach represents a good compromise between complexity and performance for 12-lead ECG classification.