Instructors: Dr. Laura Bright
Course Description:
Scientists today face an avalanche of data. Oceanographers generate Terabytes with daily forecasts of temperature, elevation, and velocity. Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing Petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the Human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today.What do these applications have in common, and what new challenges do they present? In this course, we will investigate this question from the perspective of modern database research. We will survey the literature in this area and work with practical tools, such as the Kepler workflow system, the Visualization Toolkit, and relational databases.
The instructors will emphasize case studies from specific domains. We will work with online resources such as the Sloan Digital Sky Survey (SDSS), the Northwest Association of Networked Ocean Observing Systems (NANOOS), and biological databases such as Swiss-prot/TrEMBL.
The instructors will also emphasize overarching computer science themes that stand out in each of these domains: relational, object-relational, and non-relational databases; grid computing; workflow systems; data models; visualization; metadata management.
No comments:
Post a Comment