What is needed to become an bioinformatics Data Scientist?
This is an page in progress, I’ll update and complete as time follows
On this page, I’ll share an roadmap as for what it does take for getting ready for doing work and science on bioinformatics through the data analysis side of things. Here’s what:
Beginner level
Skills
- Superficial knowledge of the genetics zoo (DNA, RNA, proteins and metabolites)
- Knowledge of the genetics data structure (FASTQ / FASTA)
- Basic Python knowledge
- Basic Pandas, Numpy and Matplotlib technique
Guides, books and tutorials
- Python Data Science Handbook, by Jake VanderPlas. This is an interactive book written on Jupyter Notebooks which is the gold-reference for doing most analytic tasks on Python: https://jakevdp.github.io/PythonDataScienceHandbook/
- Course 1 and 2 of the Bioinformatics Specialization on Coursera. They cover a lot of biological essentials as well as the general data structures and challenges regarding it. https://www.coursera.org/specializations/bioinformatics#courses
Tools
- QIIME2 - This is an widely used toolkit for doing pipeline analysis on sequencing data. Useful for cleaning and preparing data before handing it over to more other more generalist analysts. https://qiime2.org/