Aims: Catheter ablation is often used to treat patients with ventricular tachycardia (VT) in clinic, but in the course of operation, we must know the precise location of the abnormal potential point in advance. This study aimed to develop a novel encoder-decoder architecture to identify the exit site of VT by using only 12-lead ECGs. Methods: In our algorithm, we introduced 1D convolution layers as the encoder to extract and encode the local feature representations of ECG signals. After this, we decoded the local features in the global temporal dimension with the bi-directional long short-term memory layer and obtained the global features. Then, we combined local features with global features and utilized the self-attention module to add new non-local information. Finally, we introduced the multi-layer perceptron to map the non-local decoded ECG feature information to the cardiac three-dimensional coordinate system to predict the coordinates of VT exits. Results: a dataset of 16000 records was collected from 40 patients with scar-related VT in 1000 distinctive pacing sites on the left ventricular endocardium during routine pace-mapping. Each record contains a 12-lead ECG within one QRS interval and the corresponding coordinates of exit site. The whole dataset was split into training, validation and test sets by 4:3:3. Our obtained testing performances for coordinate positioning in 95% confidence interval were: mean absolute bias (5.21 ± 0.23mm), mean Euclidean distance error (9.65 ± 0.31mm). Conclusion: Based upon both CNN and RNN with the attention mechanism to extract the local-level, global-level and non-local information, we provides insights into non-invasive identification exit sites of VTs with the aid of 12-lead ECGs only.