Research on human speech production is highly dependent upon information about the position and movements of the speech articulators. Dynamic magnetic resonance imaging (MRI) has been the main tool to support this process. With this technique, image sequences can be acquired in the act of speech, which allows identifying shapes of the vocal tract in real time. However, the spatial and temporal resolution requirements are not known a priori and are expected to vary according to the speech task. Several available approaches enhance resolution by either changing the acquisition process of current devices, or by trading the acquisition devices themselves by more powerful ones. Both solutions involve additional hardware costs. In this paper, we propose an evolution of an approach to enhance spatio-temporal resolution of MRI image sequences of the vocal tract using only digital image processing techniques. On one hand, temporal resolution is increased by generating intermediate images according to the movement present in an observed sequence. On the other hand, spatial resolution is increased by applying a novel approach to super-resolution image reconstruction based on the Wiener filter. To evaluate the proposed approach, we processed a set of five simulated low resolution images in a sequence. Compared to available methods, results provide evidence of the effectiveness of the proposed method.