Skip to Main Content
Many of today's large-scale scientific projects attempt to collect data from a diverse set of sources. The traditional campaign-style approach to Â¿synthesisÂ¿ efforts gathers data through a single concentrated effort, and the data contributors know in advance exactly who will use their data and why. At even moderate scales, the cost and time required to find, gather, collate, normalize, and customize data in order to build a synthesis dataset can quickly outweigh the value of the resulting dataset. By explicitly identifying and addressing the different requirements for each data role (author, publisher, curator, and consumer), our data management architecture for large-scale shared scientific data enables the creation of such synthesis datasets that continue to grow and evolve with new data, data annotations, participants, and use rules. We show the effectiveness of our approach in the context of the FLUXNET Synthesis Dataset, one of the largest ongoing biogeophysical experiments.