• Advanced Courses in Life Sciences

    Header of Palaeontology and Archaeology at Transmitting Science

Online Course -1st Edition

Introduction to Machine Learning applied to Palaeontology and Archaeology

September 7th-11th, 2020

Palaeontology and Archaeology

Palaeontology and Archaeology

This course will be delivered ONLINE: 20 hours of online live lessons,  plus 10 hours of recorded classes and assignments. A good internet connection is required to follow the course.

Introduction to Machine Learning and Deep Learning applied to Taphonomy

Course overview

This course introduces students to the most advanced tools in Artificial Intelligence (AI); machine learning methods that make data mining and data processing a fascinating topic.

Obtaining and analyzing data is currently a very well developed field in computer science. Finding patterns in these data, or processing this information, is less straightforward and is sometimes subjected to biases. Data Mining has recently given way to Process Mining, in which powerful statistical and software tools are used in combination to correctly detect patterns and make reliable classifications of customers or products and make accurate predictions.  For Paleobiology, these tools provide the most advanced computing technique for accurate classification and prediction.

This course offers a practical introduction to Machine Learning applied to Palaeontology and Archaeology. From class One, students will learn the use of these information-managing tools on their computers. After its completion, students will be prepared to understand the patterns hidden in any database, regardless of its size and complexity. For a practical demonstration, two types of taphonomic fields will be provided.

The study of bone surface modifications (BSM) has been one of the most difficult and controversial areas in taphonomic research. Only AI has provided a way to understands the subtleties of this type of analysis by yielding systematic identification rates of BSM with accuracy higher than 90% of the cases. This constitutes a major revolution in this field.

The second taphonomic field is biometric. As a practicum, metric properties of broken bones will be used to discern process (dry and green breaking) and agency (human or carnivore) in bone fragmentation.

Teaching will be done using R. In the last module involving computer vision and deep learning, both R and Python will be used.

Requirements

Basic knowledge of R is strongly recommended. If you are not familiar with R, you can learn it using the package Swirl.

Although students will benefit from having prior knowledge on statistics (namely, univariate and bivariate or multivariate statistics), the teaching system will not require them to have any statistical basis. Concepts will be explained from their basic foundation so that they are fully understood by students with different backgrounds.

All participants must have a personal computer (Windows, Macintosh), with webcam if possible, and a good internet connection.

Required textbook: James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer. A pdf version is available for free HERE.

Contact

courses@transmittingscience.com

LOCATION

This course will be delivered online.

Please check the schedule for the live online part, and be aware that it is GMT+1.

DATE

September 7th-11th, 2020

LANGUAGE

English

COURSE LENGTH & ECTS

30 hours online

This course is equivalent to 1 ECTS (European Credit Transfer System) at the Life Science Zurich Graduate School.

The recognition of ECTS by other institutions depends on each university or school.

PLACES

Places are limited to 15 participants and will be occupied by strict registration order. If the course fills up there will be an assistant instructor to help during the practise time.

Participants who have completed the course will receive a certificate at the end of it.

Manuel Domínguez-Rodrigo instructor for Transmitting Science

Dr. Manuel Domínguez-Rodrigo
Complutense University
Spain

Ana Rosa Gomez-Cano coordinator at Transmitting Science

Dr. Ana Rosa Gómez-Cano
Transmitting Science
Spain

Soledad De Esteban-Trivigno Transmitting Science coordinator

Dr. Soledad De Esteban-Trivigno
Transmitting Science
Spain

Program

Monday

  • Microscopic characteristics of Bone Surface Modifications (BSM):
    • Compiling all the microscopic characteristics that identify the different types of BSM; tooth marks (by all bone-modifying biotic agents), percussion marks, stone-tool and metal marks, trampling marks, biochemical marks. Practicum: microscopic observation of referential collections.
  • Comparison of traditional techniques to identify and quantify BSM:
    • Showing the advantages and disadvantages of all the BSM tallying methods. Practicum: microscopic observation of referential collections II.
  • Introduction to Machine Learning. Practicum: an introduction to R:
    • Introducing students to Big Data and the various ways data are generated and handled. Describing data volume, velocity, and veracity methods. Differentiating between Data obtainment, Data Mining and Data Processing. Introduction to R: vectors, matrices, data frames and data classes.
  • Simple prediction. Practicum: Simple regression:
    • Seeking measurable patterns in variables. Differentiating among variable types, covariance and variable correlation. How to estimate the influence of variables on each other and predict values from one dependent variable from another explanatory variable. We will start using paleobiological examples.
  • Complex prediction. Practicum: Multiple regression:
    • Expand the predictions of estimates of one dependent variable from a set of multiple variables. Analyze covariance and interactions between variables. Combine different types of explanatory variables. We will continue with paleobiological examples. Students will analyze profit predictions of one company based on investment on several types of advertising media.

Tuesday

  • Advanced techniques to identify BSM: 3D and geometric morphometric (GM) approaches:
    • Learn the metric approach to the study of BSM and how it compares with other methods. Use of  GM to identify not only the type of BSM, but also variability according to tool type and raw material type. Practicum: Learning the technique of image capturing and the use of specific software for 3D reconstruction. Use of GM statistics.
  • Big data prediction (I). Practicum: Regression trees:
    • Teach a very powerful analytical tool (trees), which can use combinations of various types of variables and do not require data to follow any specific distribution pattern. Trees are powerful for numeric prediction. It allows the use of very large number of variables. Students will learn how to predict numerical target variables.
  • Big data prediction and classification on categorical and mixed sets (II). Practicum: Decision Trees:
    • These two machine learning methods identify patterns that can be used for predictive classification. Information is structured in logical trees which result in all-purpose classifiers. They use categorical dependent variables. Students will learn how to apply these tools to a large array of examples. Students will apply powerful algorithms such as C5.0, one-rule algorithm (such as ZeroR) or error-reducing algorithms such as RIPPER.
  •  Big data classification on categorical and mixed sets (III). Practicum: Mixture Discriminant Analysis:
    • Introduce students a powerful machine learning methods for identifying associations among items through reduced dimensionality. Paleobiological examples will be used for practice.
  • Identifying associations among objects and behavioural patterns. Practicum:  K-means clustering:
    • To teach methods to address the machine learning task of clustering, which consists of finding natural groupings of data. This method is used for knowledge discovery instead of prediction. It provides powerful insights into groupings found in natural data.
  • Big data classification on categorical and mixed sets (IV). Practicum: Naïve Bayes:
    • This machine learning method uses principles of probability for classification. It easily provides the estimated probability for any given prediction. Paleobiological examples will be used for practice.

Wednesday 

  • Introduction to the powerful classification and predictive algorithms I: K-nearest neighbour:
    • Modelling of cases and variables using K-nearest neighbour (KNN) algorithms that build upon regression and classification using distance matrices. These are some of the more advanced procedures for data mining.  Students will apply these techniques on a diversity of paleobiological databases.
  • Introduction to the powerful classification and predictive algorithms II: Partial Least Square Discriminant Analysis:
    • Modeling of cases and variables using Partial Least Square Discriminant Analysis (PLSDA) algorithms that build upon regression and classification. These are some of the more advances procedures for data mining.  Students will apply these techniques on a diversity of paleobiological data bases.
  • Introduction to the powerful classification and predictive algorithms III: Support Vector Machines:
    • Support Vector Machines (SVM) are some of the most advanced non-linear classifiers that can be used for dichotomous target variables or multi-group categorical variables. They are used for classification and prediction and are one of the three most powerful classifiers in ML. Students will apply these techniques on a diversity of paleobiological databases.
  • Introduction to the powerful classification and predictive algorithms IV: Neural Networks:
    • Neural networks are the most computing-demanding algorithms, but also some of the most advanced in detecting features and generating both predictions and classifications. The basic structure and concepts of neural networks and perceptions will be learnt, as well as additional methods of controlling for training models and learning rates.
  • Boosting, Bagging and Cross-Validation:
    • Introduce students to inference reliability methods which can guarantee the correctness or high confidence (>95% of cases) in the classification of data or in numeric predictions. Students will use several of the previous databases and others from (James et al., An Introduction to Statistical Learning, Springer).

Thursday

  • Rattle and Random Forests:
    • In this section, students will use a GUI in R to apply some of the previous analyses in a more intuitive way, and they will also learn how to make Random Forests, which are a combination of boosting and bagging applied to regression and decision trees for the selection of variables that most accurately help in making the right classification or prediction.
  • Introduction to H2O and CARET:
    • Here, a special mono-thematic session will be devoted to two of the most advanced R libraries for Machine learning: H2O and Caret. Comparative exercises will be carried out with previous algorithms to show the power of each of them on solving the same problems.

Friday 

  • Introduction to Deep Learning and Computer Vision: Convolutional Neural Networks:
    • Provide all the theoretical tools to understand the most powerful mathematical algorithms that exist for prediction and classification with a clear focus on image detection and classification. Neural networks will be explained and some of their most advanced algorithms, like convoluted neural networks, will be used. For this last module, the sessions involved will require learning some basics of Python and for that purpose the frameworks Anaconda and Jupiter books will be used. The use of Neural networks will be carried out using both R and Python.  The depth of this module, by far the most complex of the course, will depend on the learning rate of students.
  • Practicum:
    • This last module will focus on practical applications of all the software tools learnt and with several cases for data mining. Students will have to work on a personal supervised project with data sets most adequate to their professional interests. Both taphonomic data sets generated on BSM and Bone breakage can be used.

Required textbook: James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer. A pdf version is available for free HERE.

Fees

  • Course Fee
  • Early bird (until June 30th, 2020):
  • 596 € *
    (476.8 € for Ambassador Institutions)
  • Regular (after June 30th, 2020):
  • 725 € *
    (580 € for Ambassador Institutions)
  • This includes course material (VAT included).
    * Participants from companies/industry will have an extra charge of 100 €.

You can check the list of Ambassador Institutions. If you want your institution to become a Transmitting Science Ambassador please contact us at communication@transmittingscience.com

Schedule

Course Schedule
  • Monday to Friday (GMT+1):
    • 14:00 to 18:00 online live lessons.

The rest of the time will be taught with recorded classes and assignments, to be done between the live sessions.

Funding

Discounts are not cumulative and apply only on the Course Fee. We offer the possibility of paying in two instalments (contact courses@transmittingscience.com).

Former participants will have a 5 % discount on the Course Fee.

20 % discount on the Course Fee is offered for members of some organizations (Ambassador Institutions). If you want to apply to this discount please indicate it in the Registration form (proof will be asked later).

Unemployed scientists living in the country were the course will be held, as well as PhD students based in that country without any grant or scholarship to develop their PhD, could benefit from a 40 % discount on the Course Fee. If you want to ask for this discount, please contact the course coordinator. That would apply for a maximum of 2 places and they will be covered by strict inscription order.

Registration

Terms and Conditions and Privacy Policy: In order to process the registration, management and maintenance of your profile as course participant on our website, we need your personal data. If you want more information about the processing of your data and your rights, we recommend that you read our privacy policy carefully (below).