We propose a framework, illustrated in Fig. 1, to support researchers in identifying, extensively comparing and benchmarking multiple workflows from individual bioinformatics tools. It is beneficial when working with a well-annotated tool collection leading to workflows that give similar but not identical results, and opens the door to determine operational or functional bottlenecks in proteomics data analysis as well as generally in any use case including multiple bioinformatics operations. Concretely, we used the PROPHETS automatic workflow composition platform [1], [2] to explore the workflows that could be composed from a selection of public domain analysis tools listed on ms-utils.org (https://ms-utils.org/) or registered in the ELIXIR Tools and Data Services Registry (https://bio.tools) [3]. Composition was facilitated by the tools' semantic annotation using terms from the EDAM ontology [4].
Schematic outline of the workflow composition: PROPHETS suggests workflows for a selection of tools from ms-utils.org and bio.tools, annotated with terms from the EDAM ontology, under constraints such as requested operations or input and output formats. The resulting workflows are implemented and tested on public data.
To demonstrate the practical use of our framework, we implemented, executed and compared a number of logically and semantically equivalent workflows addressing four use cases representing frequent tasks in MS-based proteomics: peptide retention time prediction, protein identification and enrichment analysis, localization of phosphorylation and protein quantitation using isotopic labeling. For all use cases, we found at least slightly different results when comparing the different workflows. Our assessment of reproducibility tested the robustness of data (do different experimental measurements lead to the same end results?) and analysis (do different tool combinations give similar end results?). This strongly suggests that as many workflows as feasible should be benchmarked on “ground truth” data to identify optimal tool combinations. With the approach presented here, it will now be possible to compare many new pipeline instances to commonly used workflows on the basis of benchmarking data sets and therefore identify best-suited alternatives for specific groups of operations as well as for data types (e.g. different MS instruments). Benchmarking however might suffer from distinct performance when considering different data types such as different experimental setups or different biological sources. Thus still big community efforts will be required to create sets of different ground truth data that allow for generalized conclusions.
The work is described in greater detail in [5]. The project files and workflows are available from [6].