This course will introduce graduate students in molecular biology and life sciences to major concepts statistical analysis and machine learning applied to large biological datasets. We will cover algorithms for clustering, learning binary classifiers, regression, and testing for enrichment of gene/protein features within experimentally-defined gene lists. The class will be organized around four problem sets, each covering one of these four topics. These problem sets will require programming in the R statistical language, students without programming experience are advised to take "A practical course in programming for biologists". Classes and problem sets will focus on the practical aspects of applying these algorithms to data -- in most cases students will be employing pre-existing versions of these algorithms rather than re-implementing them. However, some theoretical concepts will be discussed in class.
Evaluation: Students will be graded based on four problem sets.
Coordinator: Dr. Quaid Morris and Dr. Alan Moses
Enrollment: Class size is limited 24 students (12 from Molecular Genetics and 12 from Cell & Systems Biology)
Pre-requisite: Programming experience (or willingness to learn R programming outside of class), a course in statistics