The level of participation in social interactions has been shown to have an impact on various health outcomes, while it also reflects the overall wellbeing status. In health sciences the standard practice for measuring the amount of social activity relies on periodical self-reports that suffer from memory dependence, recall bias and the current mood. In this regard, the use of sensor-based detection of social interactions has the potential to overcome the limitations of self-reporting methods that have been used for decades in health related sciences. However, the current systems have mainly relied on external infrastructures, which are confined within specific location or on specialized devices typically not-available off the shelf. On the other hand, mobile phone based solutions are often limited in accuracy or in capturing social interactions that occur on small time and spatial scales. The work presented in this paper relies on widely available mobile sensing technologies, namely smart phones utilized for recognizing spatial settings between subjects and the accelerometer used for speech activity identification. We evaluate the two sensing modalities both separately and in fusion, demonstrating high accuracy in detecting social interactions on small spatio-temporal scale.