• Advanced Courses in Life Sciences

    Header of Statistics at Transmitting Science

Live Online Course – 1st Edition

Introduction to Machine Learning with R

February 7th-17th, 2023

Live sessions will be recorded

Course Introduction to Machine Learning with R

Course overview & Programme

This course introduces students to the most advanced tools in Artificial Intelligence (AI); machine learning methods that make data mining and data processing a fascinating topic.

Obtaining and analyzing data is currently a very well developed field in computer science. Finding patterns in these data, or processing this information, is less straightforward and is sometimes subjected to biases. Data Mining has recently given way to Process Mining, in which powerful statistical and software tools are used in combination to correctly detect patterns and make reliable classifications of customers or products and make accurate predictions. These tools provide the most advanced computing technique for accurate classification and prediction.

Obtaining data currently is a very developed field in computer science. Finding patterns in these data, or processing this information, is less straightforward and is sometimes subjected to biases. Data Mining has recently given way to Process Mining, in which powerful statistical and software tools are used in combination to correctly detect patterns and make reliable classifications and make accurate predictions.

This course offers a practical introduction to Machine Learning. From class One, students will learn the use of these information-managing tools on their computers. After its completion, students will be prepared to understand the patterns hidden in any database, regardless of its size and complexity.

This course offers a practical introduction to Process and Data Mining oriented to all sciences. It is eminently practical.  From class One, students will have to learn the use of these information-managing tools on their computers. After its completion, students will be prepared to understand the patterns hidden in any data base, regardless of its size and complexity.

  • Introduction to Machine Learning. Practicum: an introduction to R:
    • Introducing students to Big Data and the various ways data are generated and handled. Describing data volume, velocity, and veracity methods. Differentiating between Data obtainment, Data Mining and Data Processing. Introduction to R: vectors, matrices, data frames and data classes. Inspection and preparation of samples.
  • Simple prediction. Practicum: Simple regression:
    • Seeking measurable patterns in variables. Differentiating among variable types, covariance and variable correlation. How to estimate the influence of variables on each other and predict values from one dependent variable from another explanatory variable. 
  • Complex prediction. Practicum: Multiple regression:
    • Expand the predictions of estimates of one dependent variable from a set of multiple variables. Analyze covariance and interactions between variables. Combine different types of explanatory variables. Students will analyze profit predictions of one company based on investment on several types of advertising media.
  • Big data prediction (I). Practicum: Regression trees:
    • Teach a very powerful analytical tool (trees), which can use combinations of various types of variables and do not require data to follow any specific distribution pattern. Trees are powerful for numeric prediction. It allows the use of very large number of variables. Students will learn how to predict numerical target variables.
  • Big data prediction and classification on categorical and mixed sets (II). Practicum: Decision Trees:
    • These two machine learning methods identify patterns that can be used for predictive classification. Information is structured in logical trees which result in all-purpose classifiers. They use categorical dependent variables. Students will learn how to apply these tools to a large array of examples. Students will apply powerful algorithms such as C5.0, one-rule algorithm (such as ZeroR) or error-reducing algorithms such as RIPPER.
  •  Big data classification on categorical and mixed sets (III). Practicum: Mixture Discriminant Analysis:
    • Introduce students a powerful machine learning methods for identifying associations among items through reduced dimensionality. 
  • Identifying associations among objects and patterns. Practicum:  K-means clustering:
    • To teach methods to address the machine learning task of clustering, which consists of finding natural groupings of data. This method is used for knowledge discovery instead of prediction. It provides powerful insights into groupings found in natural data.
  • Big data classification on categorical and mixed sets (IV). Practicum: Naïve Bayes:
    • This machine learning method uses principles of probability for classification. It easily provides the estimated probability for any given prediction. 
  • Introduction to the powerful classification and predictive algorithms I: K-nearest neighbour:
    • Modelling of cases and variables using K-nearest neighbour (KNN) algorithms that build upon regression and classification using distance matrices. These are some of the more advanced procedures for data mining.  Students will apply these techniques on a diversity of paleobiological databases.
  • Introduction to the powerful classification and predictive algorithms II: Partial Least Square Discriminant Analysis:
    • Modeling of cases and variables using Partial Least Square Discriminant Analysis (PLSDA) algorithms that build upon regression and classification. These are some of the more advances procedures for data mining.  Students will apply these techniques on a diversity of data bases.
  • Introduction to the powerful classification and predictive algorithms III: Support Vector Machines:
    • Support Vector Machines (SVM) are some of the most advanced non-linear classifiers that can be used for dichotomous target variables or multi-group categorical variables. They are used for classification and prediction and are one of the three most powerful classifiers in ML. Students will apply these techniques on a diversity of databases.
  • Introduction to the powerful classification and predictive algorithms IV: Neural Networks:
    • Neural networks are the most computing-demanding algorithms, but also some of the most advanced in detecting features and generating both predictions and classifications. The basic structure and concepts of neural networks and perceptions will be learnt, as well as additional methods of controlling for training models and learning rates.
  • Boosting, Bagging and Cross-Validation:
    • Introduce students to inference reliability methods which can guarantee the correctness or high confidence (>95% of cases) in the classification of data or in numeric predictions. Students will use several of the previous databases and others from (James et al., An Introduction to Statistical Learning, Springer).
  • Random Forests:
    • In this section, students will apply some of the previous analyses in a more intuitive way, and they will also learn how to make Random Forests, which are a combination of boosting and bagging applied to regression and decision trees for the selection of variables that most accurately help in making the right classification or prediction.
  • Machine learning white-box or glass-box methods:
    • This last module will focus on practical applications of all the software tools learnt and with several cases for data mining. Students will have to work on a personal supervised project with data sets most adequate to their professional interests. Both taphonomic data sets generated on BSM and Bone breakage can be used.
  • Ensemble learning:
    • This last module will focus on practical applications of all the software tools learnt and with several cases for data mining. Students will learn how all the previous algorithms can be integrated into ensemble multi-layered methods that are more powerful in discriminating classes and in predicting outcomes. Special attention will be paid to Stchaking ensemble and majority voting techniques. 

Basic knowledge of R is strongly recommended. If you are not familiar with R, you can learn it through the package Swirl.

All participants must have a personal computer (Windows, Macintosh), with webcam and headphone if possible, and access to a good internet connection.

Required text book:

James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer.
A pdf version is available for free (
http://www-bcf.usc.edu/~gareth/ISL/getbook.html).

Instructor

Manuel Domínguez-Rodrigo instructor for Transmitting Science

Dr. Manuel Domínguez-Rodrigo
Complutense University
Spain

Dates & Schedule

February 7th-17th, 2023

14:00-18:00 (Madrid time zone)

Online live sessions on February 7th, 9th, 13th, 15th and 17th; 14:00 to 18:00 (Madrid time zone).

The rest of the time will be taught with recorded classes and assignments, to be done between the live sessions.

Total course hours: 40

20 hours of online live lessons, plus 20 hours of assignments.

This course is equivalent to 2 ECTS (European Credit Transfer System) at the Life Science Zurich Graduate School.

The recognition of ECTS by other institutions depends on each university or school.

Language

English

This course will be delivered live online

This course will be taught using a combination of live (synchronous) sessions on Zoom and tasks to be completed in between live sessions on the Slack platform.

Live sessions will be recorded. Recordings will be made available to participants for a limited period of time. However, attendance to the live sessions is required.

Places

Places are limited to 15 participants and will be occupied by strict registration order.

Participants who have completed the course will receive a certificate at the end.

Haris Saslis coordinator for Transmitting Science

Dr. Haris Saslis
Transmitting Science
Greece

Soledad De Esteban-Trivigno Transmitting Science coordinator

Dr. Soledad De Esteban-Trivigno
Transmitting Science
Spain

Fees & Discounts

  • Course Fee
  • Early bird (until December 31st, 2022):
  • 560 €
    (448 € for Ambassador Institutions)
  • Regular (after December 31st, 2022):
  • 670 €
    (536 € for Ambassador Institutions)
  • Prices include VAT.
    After registration you will receive confirmation of your acceptance on the course.
    Payment is not required during registration.

We offer discounts on the Course Fee.

Discounts are not cumulative. Participants receive the highest appropriate discount.

We also offer the possibility of paying in two instalments. Please contact us to request this.

Former participants of Transmitting Science courses receive a 5% discount on the Course Fee.

20% discount on the Course Fee is offered to members of certain organisations (Ambassador Institutions). If you wish to apply for this discount, please indicate it in the Registration form (proof will be asked later). If you would like your institution to become a Transmitting Science Ambassador Institution, please contact us at communication@transmittingscience.com

Unemployed scientists, as well as PhD students without any grant or scholarship to develop their PhD, can benefit from a 40% discount on the Course Fee. This applies only to participants based in Spain. If you wish to ask for this discount, please contact us. The discount may apply for a maximum of 2 places, which will be covered by strict registration order.

Registration

Terms and Conditions and Privacy Policy: In order to process the registration, management and maintenance of your profile as course participant on our website, we need your personal data. If you want more information about the processing of your data and your rights, we recommend that you read our privacy policy carefully (below).

Related Courses