Aims: Cardiac auscultation represents arguably the most cost-effective screen-ing method for a number of heart diseases. Precious diagnostic information, such as indicators of pulmonary arterial hypertension, can be retrieved from the separation of the second heart sound (S2) into its two constituent components: one generated by the closure of the aortic valve (A2) and one generated by the closure of the pulmonary valve (P2). However, the separation of these components is very challenging, due to the large time-frequency overlap of the two components, as well as their similar morphological signatures. In contrast with previous attempts appeared in the literature, which leverage predetermined waveform models to separate A2 and P2, the proposed approach learns the shape of A2 and P2 from training data and uses such knowledge to separate the constituents of unseen S2 sounds. Methods: The proposed method is based on the observation of multiple S2 sounds from different heartbeats of a single recording. A joint Gaussian mixture model (GMM) for A2 and P2 components is learnt from training data. Then, separation of the two components is allowed by their different dynamical behavior: the A2 is static along different heart beats, whereas the P2 has moving onsets in different respiration phases. This implies that the marginal GMMs modeling A2 and P2 present class-conditioned covariances with different structure, thus allowing efficient separation via a closed-form conditional mean estimator. Results: The proposed approach was tested over synthetic data and compared with a recent approach that estimates A2s by averaging over different S2 sounds in a given recording (Tang et al. (2017)). A reduction of the normalized root mean-squared error exceeding 20% was observed over different operational regimes. The robustness of the proposed method was also tested against deviations from the trained signal model and synchronization inaccuracies in the acquisition of S2 sounds.