Aims: ResNet, ResNeXt, ResNeSt, selective kernel (SK) blocks and squeeze & excitation (SE) blocks are architectures of convolutional neural networks (CNNs), which has exhibited robust performance on several image classification tasks. This study aims to design an ensemble classifier in arrhythmia classification based on these architectures. Methods: Data pertaining to 12-lead ECGs for arrhythmia classification were retrieved from the China Physiological Signal Challenge(CPSC), 2018. Each ECG recording were labeled based on the respective clinical diagnoses of the normal sinus rhythm type and 8 other abnormal types (atrial fibrillation, first-degree atrioventricular block, left bundle branch block, premature atrial complex, premature ventricular complex, right bundle branch block, ST-segment depression, and ST-segment). 3 1-dimensional CNNs based on a ResNet, ResNeXt and ResNeSt backbones were designed respectively, incorporated with SE and SK blocks. Each CNN was trained and validated on the CPSC dataset, accompanied by tuning of different hyper-parameter settings. Finally, an XGBoost classifier was trained to ensemble the outputs of all CNNs. The performance of the ensemble classifier was evaluated on a hidden dataset obtained from the PhysioNet Challenge, 2020. Results: The overall scores (geometric mean of F-2 score and G-2 score) of the ResNet-based, ResNeXt-based and ResNeSt-based CNNs were observed to be 0.7021, 0.7195 and 0.7265 respectively on the 10%-split testing set from CPSC. The best performing ensemble classifier achieved an overall score of 0.744 and 0.708, respectively on a 10%-split testing set obtained from the CPSC data and the hidden dataset obtained from the PhysioNet Challenge, 2020. Conclusion: Our study highlight the potential of applying Image-based architectures, such as ResNet, ResNeXt, ResNeSt, SE blocks and SK blocks, to the area of arrhythmia classification. Moreover, the proposed ensemble classifier exhibited superior performance in arrhythmia classification, compared to the performance of each component CNN.