TOP READERS PPT: Scientific Data Management

Instructors: Dr. Laura Bright

Course Description:

Scientists today face an avalanche of data. Oceanographers generate Terabytes with daily forecasts of temperature, elevation, and velocity. Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing Petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the Human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today.
What do these applications have in common, and what new challenges do they present? In this course, we will investigate this question from the perspective of modern database research. We will survey the literature in this area and work with practical tools, such as the Kepler workflow system, the Visualization Toolkit, and relational databases.
The instructors will emphasize case studies from specific domains. We will work with online resources such as the Sloan Digital Sky Survey (SDSS), the Northwest Association of Networked Ocean Observing Systems (NANOOS), and biological databases such as Swiss-prot/TrEMBL.
The instructors will also emphasize overarching computer science themes that stand out in each of these domains: relational, object-relational, and non-relational databases; grid computing; workflow systems; data models; visualization; metadata management.

Schedule (subject to change!)

Week/Day	Lecture	Reading Due	Assignment Handed Out	Assignment Due
1/Thur: 7/6	Lecture 1 (ppt)	None	None	None
2/Tue: 7/11	Intro to Relational Databases (ppt)	Ramakrishnan and Gehrke 3.1, 3.2, 3.4, 4.1, 4.2, 5.1, 5.2 (Handout)	Homework 1 (sample answers)	None
2/Thur: 7/13	Relational 2 (ppt)Lecture 3 (ppt)	Working with Scientists; Section 9 from SkyServer	None	Study Questions 1
3/Tue: 7/18	Web Services (ppt)	Service-Oriented Science	Paper Milestone 1	Homework 1 (sample answers)
3/Thur: 7/20	Introduction to XML (ppt)	XML Tutorial,XSLT vs XQuery
4/Tue: 7/25	Geographic Information Systems and Spatial Databases (ppt)	A Survey on Multidimensional Access Methods		Study Questions 2
4/Thur: 7/27	Introduction to Computational Biology (ppt) Homework 1 review (ppt)	O'Reilly BLAST C2 (handout)	Paper Milestone 2	Paper Milestone 1
5/Tue: 8/1	Biology Applications with Web Services (ppt)	O'Reilly BLAST C3,C5 (handout)		Study Questions 3
5/Thur: 8/3	Scientific Workflows (ppt)	Kepler User Guide	Homework 2(Kepler Exercise)(html)
6/Tue: 8/8	Scientific Workflows (ppt)	Kepler		Study Questions 4
6/Thur: 8/10	Grid Computing (ppt)	Computational Grids	Kepler Mini-Project (html)	Homework 2
7/Tue: 8/15	Data Grids (ppt)	Taxonomy		Study Questions 5
7/Thur: 8/17	Images and Multidimensional Arrays	Iterator-based Prefetching OR Tiling Arrays	Paper Milestone 3	Paper Milestone 2
8/Tues: 8/22	Scientific Visualization (ppt)	VisTrails	Homework 3 (Visualization with VTK and Python)	Study Questions 6
8/Thurs: 8/24	High-dimensional Data (3MB) (ppt - 13MB)	HD Access Methods		Kepler Mini-project
9/Tue: 8/29	Resource Description Framework (RDF)
9/Thur: 8/31	Ontologies in Science (ppt)	Science and the Semantic Web Brief Owl Intro		Homework 3
10/Tue: 9/5	Metadata		Final Paper; Homework 4 (Protege Exercise)	Paper Milestone 3
10/Thur: 9/7	Provenance and Lineage	Provenance Survey		Study Questions 7
11/Tue: 9/12	TBA	TBA	TBA	Homework 4
11/Thur: 9/14	TBA	TBA		Final Paper

TOP READERS PPT

Thursday, 15 December 2011

Scientific Data Management

Instructors: Dr. Laura Bright

Course Description:

Schedule (subject to change!)

No comments:

Post a Comment

About Me

Blog Archive