Thursday, 15 December 2011

Scientific Data Management

Instructors:  Dr. Laura Bright

 Course Description:

Scientists today face an avalanche of data.  Oceanographers generate Terabytes with daily forecasts of temperature, elevation, and velocity. Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing Petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the Human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today.
What do these applications have in common, and what new challenges do they present? In this course, we will investigate this question from the perspective of modern database research. We will survey the literature in this area and work with practical tools, such as the Kepler workflow system, the Visualization Toolkit, and relational databases.
The instructors will emphasize case studies from specific domains. We will work with online resources such as the Sloan Digital Sky Survey (SDSS), the Northwest Association of Networked Ocean Observing Systems (NANOOS), and biological databases such as Swiss-prot/TrEMBL.
The instructors will also emphasize overarching computer science themes that stand out in each of these domains: relational, object-relational, and non-relational databases; grid computing; workflow systems; data models; visualization; metadata management.

Schedule (subject to change!)

Week/Day Lecture Reading Due Assignment Handed Out Assignment Due
1/Thur: 7/6 Lecture 1 (ppt) None None None
2/Tue: 7/11 Intro to Relational Databases (ppt) Ramakrishnan and Gehrke 3.1, 3.2, 3.4, 4.1, 4.2, 5.1, 5.2 (Handout) Homework 1 (sample answers) None
2/Thur: 7/13 Relational 2(ppt)Lecture 3 (ppt) Working with Scientists; Section 9 from SkyServer None Study Questions 1
3/Tue: 7/18 Web Services(ppt) Service-Oriented Science Paper Milestone 1 Homework 1 (sample answers)
3/Thur: 7/20 Introduction to XML (ppt) XML Tutorial,XSLT vs XQuery

4/Tue: 7/25 Geographic Information Systems and Spatial Databases (ppt) A Survey on Multidimensional Access Methods
Study Questions 2
4/Thur: 7/27 Introduction to Computational Biology(ppt) Homework 1 review (ppt) O'Reilly BLAST C2 (handout) Paper Milestone 2 Paper Milestone 1
5/Tue: 8/1 Biology Applications with Web Services(ppt) O'Reilly BLAST C3,C5 (handout)
Study Questions 3
5/Thur: 8/3 Scientific Workflows(ppt) Kepler User Guide Homework 2(Kepler Exercise)(html)
6/Tue: 8/8 Scientific Workflows(ppt) Kepler
Study Questions 4
6/Thur: 8/10 Grid Computing(ppt) Computational Grids Kepler Mini-Project(html) Homework 2
7/Tue: 8/15 Data Grids(ppt) Taxonomy
Study Questions 5
7/Thur: 8/17 Images and Multidimensional Arrays Iterator-based Prefetching
OR
Tiling Arrays
Paper Milestone 3Paper Milestone 2
8/Tues: 8/22 Scientific Visualization (ppt) VisTrails Homework 3 (Visualization with VTK and Python) Study Questions 6
8/Thurs: 8/24 High-dimensional Data (3MB) (ppt - 13MB) HD Access Methods
Kepler Mini-project
9/Tue: 8/29 Resource Description Framework (RDF)


9/Thur: 8/31 Ontologies in Science(ppt) Science and the Semantic Web Brief Owl Intro
Homework 3
10/Tue: 9/5 Metadata
Final Paper; Homework 4 (Protege Exercise)Paper Milestone 3
10/Thur: 9/7 Provenance and Lineage Provenance Survey
Study Questions 7
11/Tue: 9/12 TBA TBA TBA Homework 4
11/Thur: 9/14 TBA TBA
Final Paper

 

 

No comments:

Post a Comment