The field of biological sciences is becoming increasingly information-intensive and data-rich. For example, the growing availability of DNA sequence data or clinical measurements from humans promises a better understanding of the important questions in biology. However, the complexity and high-dimensionality of these biological data make it difficult to pull out mechanisms from the data. Machine learning techniques promise to be useful tools for resolving such questions in biology because they provide a mathematical framework to analyze complex and vast biological data. In turn, the unique computational and mathematical challenges posed by biological data may ultimately advance the field of machine learning as well.
This course will cover basics of the Python programming language as well as the pandas and sklearn Python libraries for data wrangling and machine learning.
By the end of this course, participants will understand:
- How to input and clean data in Python using the pandas library
- How to perform exploratory data analysis in Python
- How to use the sklearn library in Python for machine learning workflows
- How to choose an appropriate machine learning model for the task
- How to use supervised machine learning models (SVM, Decision Trees, Neural Networks, etc.) for classification tasks
- How to use unsupervised machine learning models for clustering tasks
- How to evaluate machine learning models and interpret their results
This course is intended to give participants a conceptual overview of machine learning algorithms and an intuition for the mathematics underlying them, equipping participants to be able to choose and implement appropriate models for biological datasets.