University of Massachusetts, Amherst
Mount Holyoke College
Technology continues to change the way that scientists work. Ubiquitous sensors and wireless networks enable the collection of vast quantities of data at a very fast rate. Scientific programs, ranging from Excel spreadsheets to supercomputer applications, manipulate the collected data to produce scientific results. Scientists can then disseminate both the raw and processed data quickly and to a broad, unknown audience by publishing it on their websites.
Good science requires more than results. It requires reproducibility, verifiability and authentication. Reproducibility is necessary to ensure that the results are not an accidental outcome, but the result of genuine, carefully-performed experimentation and analysis. Verifiability is necessary to assure that the results really did derive from the data, even if reproducing the experiment is not a viable option. Finally, authentication is necessary to believe that the raw data used in the scientific work is itself valid. Without confidence in these issues, the credibility of data posted on the Internet has the same level as the typical Wikipedia article. With the pace at which sensors produce data and programs manipulate data, it is clear that documentation of the data's provenance itself must be automated.
We are working with researchers at Harvard Forest in Petersham, Massachusetts to explore how to capture and query data they collect from a variety of sensors to allow hydrologists to measure the movement of water through an ecosystem, accounting for precipitation, evaporation and stream flow. We are using Little-JIL to improve the credibility of the scientific data on the Internet. A scientific process is described in Little-JIL, a process language. Information is captured during execution of the process to document the data's provenance: where did the data come from, how has it been manipulated prior to its dissemination, who was involved and when.
A wireless sensor network is currently under development for measuring real-time ecosystem water flux at the Harvard Forest Long-Term Ecological Research (LTER) site in Petersham, Massachusetts, USA. This system will integrate ongoing meteorological, hydrological, eddy flux, and tree physiological measurements. Simultaneous measurements in adjoining small watersheds will enable researchers to study variations in water flux caused by differences in topography, soils, vegetation, land use, and natural disturbance history. Frequent sampling will enable study of water flux dynamics at a wide range of temporal scales, from minutes to observe the response of evapotranspiration to light, to days to observe the response of ground water to precipitation and snow melt, to years to observe the response of an ecosystem to climate, reforestation, land use, and natural disturbance.
Little-JIL, a process programming language developed in the LASER research lab at the University of Massachusetts, Amherst, is being used to provide the coordination of the various people and software tools involved in the collection, processing and dissemination of the sensor data. Little-JIL is a graphical language designed to integrate tools and people working in a distributed computing environment, with strong support for abstraction, exception handling and resource management. Support for the collection and querying of provenance data is underway. With this support, the data published to the Internet can be backed up with provenance data that can be examined to further enhance and validate the scientific results.
API for working with DDGs
Barbara Lerner, Emery Boose, Leon Osterweil, Aaron Ellison and Lori Clarke, "Provenance and Quality Control in Sensor Networks", Environmental Information Managemet 2011 Conference, Santa Barbara, California, September 2011. (Abstract) (Paper (pdf))
Xiang Zhao, Barbara Lerner, Leon Osterweil, Emery Boose, Aaron Ellison, "Provenance Support for Rework", 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP '12), Cambridge, Massachusetts, June 2012. (Abstract) (Paper (pdf))
This project was presented at the New England Undergraduate Computing Symposium (NEUCS'10).
Corietta L. Teshera-Sterne, A Software Engineering Approach to Scientfic Data Management, May 2010.
Sofiya Taskova, Capturing, Persisting and Querying the Provenance of Scientific Data, Honors Thesis, May 2012.
Miruna Oprescu, Visualization Tools for Digital Dataset Derivation Graphs, Summer 2012 REU Student
Yujia Zhou, Trees and Bugs in Computers , Summer 2012 REU Student
Snickers, The Blog of an Ecologist Dog, Summer 2012 REU Mascot
If you are an undergraduate interested in an interdisciplinary project involving computer science and ecology, join us for the 2012 REU at Harvard Forest!
This material is based upon work supported by the National Science Foundation under Awards No. CCR-0205575, CCR-0427071, and IIS-0705772, the National Science Foundation REU grant DBI-0452254 and also the Mount Holyoke Center for the Environment Summer 2009 Leadership Felloship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or Mount Holyoke College.