Scientific Write-a-thon - Russel Hodge, MDC
A day devoted to scientific communication and scientific writing. At the end of the workshop, which will include a great deal of hands-on group work, you will have an up-to-date project description for the HEIBRiDS webpage.
Research Data Management - Sara El-Gebali, MDC
This one-day workshop will introduce you to FAIR data, FAIR software and metadata, will discuss good/bad code and will give you tips on how to create the perfect README file.
In this workshop you will be shown how to run, create and publish Docker containers to a registry, as well as how to use them in HPC. Moreover, you will learn about common pitfalls and how to avoid them. The day will end with a tutorial on how to package your own application into a container.
With this single day introduction, we want to take your HPC cluster skills to the next level. We plan to introduce automated pipelines and parallelization suited to our learners. We assume that learners are able to submit single jobs to a SLURM based scheduler and have a basic understanding of the UNIX shell and Python. For parallelization, we aim to provide a thorough introduction on how to approach implementing data parallelism in Python. For this, we will use shared memory parallelisation using multiprocessing and distributed memory parallelisation using message-passing-interface for a compute intensive problem. The day will be concluded with an introduction on how to automate pipelines on a cluster - which are typically found with data intensive tasks. All teaching will be performed hands-on on a custom cluster provided to the students.
Data Cleaning - Ihab Francis Ilyas, University of Waterloo, Canada
This half-day seminar on data cleaning will include topics such as: outlier detection, data deduplication and transformation, data quality rule definition and discovery, rule-based data cleaning, ML and probabilistic data cleaning.