Zoom*UserViews: Objectives
Scientific experiments are becoming increasingly large and complex, with a commensurate increase in the amount and complexity of data generated. Data, both intermediate and final results, is derived by chaining and nesting together multiple database searches and analytical tools. In many cases, the means by which the data are produced is not known, making the data difficult to interpret and the experiment impossible to reproduce. Provenance in scientific workflows is thus of paramount importance.
Objectives
- Zooming in on provenance through user views
- Constructing relevant user views
- Designing a workflow generator
- [NEW!] Computing workflow difference
Zooming in on provenance through user views
We present a formal model of provenance for scientific workflows which is general (i.e. can be used with existing workflow systems, such as Kepler, myGrid and Chimera) and sufficiently expressive to answer the provenance queries encountered in a number of case studies. Interestingly, our model not only takes into account the chained and complex structure of scientific workflows, but allows users to see the workflow at different levels of abstraction by means of user views. User views can be used to vary the level of detail presented in response to provenance queries. Based on this model, we have developed a prototype, ZOOM*UserViews. We used this prototype in the first Provenance Challenge: We discussed the design and implementation of ZOOM in the context of the queries posed by the challenge, and showed how user views affect the level of granularity at which provenance information can be seen and reasoned about.
Constructing relevant user views
In this project, our goal is to help scientists construct user views so that reformatting tasks within the workflow are hidden and tasks in which they are interested – “relevant” tasks -- are seen. Provenance information can then be examined centered around relevant tasks. We have developed a notion of what a good user view is with respect to a given set of relevant tasks within a workflow, and an algorithm for generating a good user view.
This is also included in the Zoom*UserViews prototype.
Designing a workflow generator
Evaluating techniques involving scientific workflows is challenging since it requires realistic workflow specifications and runs on which to base the experiments. However, as with database schemas, scientific workflow specifications are often confidential and shared only with a small group of collaborators. Furthermore, there are no incentives to motivate scientists to share their specifications outside of publications (text) which describe the scientific result and loosely describe the means by which results were obtained. It is consequently rare to find a publicly available, well defined scientific workflow. We have therefore collected and analyzed roughly 30 workflow specifications to extract common patterns – such as looping or sequential execution – and use these patterns to develop a synthetic generator which allows the user to generate arbitrarily complex scientific workflow specifications.
Computing workflow difference
Since a scientific workflow is an in-silico experiment, there are typically two phases to its use: an initial phase in which the workflow specification evolves, and a second phase in which the specification is stable but many runs are made using different inputs and parameters. In both these phases, it is important to understand the difference between two runs – i.e. what parts of the executions were different, as well as the difference in parameters –to understand why results differed between the runs. We are developing a notion of edit operations between runs, and algorithms to find the smallest edit script between two runs. 'Go to to know more about this project!'
The worfklow generator is available from the Tools menu of the prototype.
People
Current team
- Zhuowei Bao
- Olivier Biton
- Shirley Cohen
- Sarah Cohen Boulakia
- Susan Davidson
- Anat Eyal
- Carmem Hara
- Sanjeev Khanna
Previous members
- Thunyarat (Bam) Amornpetchkul
Sponsors

This work supported by the National Science Foundation* under Grant No. 0612177.
*Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.