Introduction Set Up Your Environment The Microbiome Data Download and Install necessary R packages Set up Working Environment Reads The Analysis Check Read Quality Read Filtering Learn the Error Rates and Infer Sequences Merge Forward and Reverse Reads Construct Sequence Table Remove chimeras Tracking Reads throughout Pipeline Assign Taxonomy Information Phyloseq Object Taxonomic Filtering Prevalence Filtering Visualization / Diversity Phylum Relative Abundance Genus Relative Abundance Introduction The purpose of this post will be to guide researchers through a basic analysis of microbiome data using R packages DADA2 and Phyloseq.
library(caret) # The caret package library(tidymodels) # Suite of packages for tidymodeling (eg. parsnip, recipes, yardstick, etc.) library(tidyverse) # Suite of packages for tidy data science library(skimr) # Package for summary stats on datasets library(cowplot) # for making multi-paneled plots options(width = 100) # ensure skim results fit on one line I will be relying heavily on this website. My goal will be to record basic vignettes for common machine learning algorithms using caret…so that I don’t have to keep looking it up everytime I re-try something 😜.
I’m gearing up to write a scientific manuscript and I want to try Rmarkdown for this. I’ve never used anything but Microsoft Word to write my manuscripts, and EndNote to manage my references. While I’m excited to try this out, I think the biggest challenge is going to be sharing my drafts with collaborators so that they can make edits (typically with track changes in microsoft word). Here are some of the key points that need to be addressed as I start researching methods for drafting manuscripts in Rmarkdown:
First…here is the finished plot!
Keep reading if you want to make this plot yourself in R!
Biologists love heatmaps, like they REALLY REALLY like heatmaps!! When I was in graduate school, I think my number one google search was “how do I make a heatmap in R”. There are many fantastic tutorials out there that really helped me…and my goal is to create another R heatmap tutorial for the newest of R users.
To start my first ever blog post, I should mention that I am not a data scientist. I’m a microbiologis by training. But like many scientific fields, microbiology is becominge more data-centric. Every day (it seems), new technologies emerge that generate tons of data including genome sequencing and microarrays. About 7 months ago, I was hired by a research group that contracted out numerous samples to be sequenced/analyzed and they were sitting on the data because they had 0 clue how to analyze it.