This work presents an approach to model daily life contexts from web-collected audio data. Being available in vast quantities from many different sources, audio data from the web provides heterogeneous training data to construct recognition systems. Crowd-sourced textual descriptions (tags) related to individual sound samples were used in a configurable recognition system to model 23 sound context categories. We analysed our approach using different outlier filtering techniques with dedicated recordings of all 23 categories and in a study with 230 hours of full-day recordings of 10 participants using smart phones. Depending on the outlier technique, our system achieved recognition accuracies between 51% and 80%.