Hello, I am new to DESeq2 and was able to run RUVSeq on my samples to help account for batch effects. When I run plotPCA() on the set after doing RUVg, the data clusters nicely, how I want it to.
I then moved the data to a dds
object to use DESeq2 using the code below:
dds <- DESeqDataSetFromMatrix(countData = counts(set2),
colData = pData(set2),
design = ~ W_1 + genotype)
dds <- DESeq(dds)
I wanted to look at PCA plots using vsd()
next so I do that with the following code:
vsd <- vst(dds, blind=FALSE)
vsd_nobatch <- removeBatchEffect(assay(vsd),
design = model.matrix(~ set2@phenoData@data$genotype),
covariates = set2@phenoData@data$W_1)
plotPCA(vsd_nobatch, col=colors[metadata$genotype])
What I get is a PCA plot that clusters as expected, with the data points in the graph being the sample names (column names of vsd_nobatch
), colored based on genotype. However, what I would like to do is to show a PCA plot with the data points just being points, still colored based on genotype, and then overlaying different labels from the metadata (not just sample name).
The issue is that when I run removeBatchEffect()
, it returns a matrix, rather than a DESeq2 object, so I can't edit the labels like I can do before running removeBatchEffect()
. Pre-removeBatchEffect()
, I can run plotPCA()
on vsd
, say that the groups are by genotype
, and then add a label based on id
on top of the point, as shown in the code below.
plotPCA(vsd, intgroup="genotype") +
geom_text(aes(label=metadata$id), color = "black")
Is there a way I can get the same results after running 'removeBatchEffect()? When I try running this, the code still runs, but it's as though it ignored the whole
geom_text` line- looks the same whether or not that line is included:
plotPCA(vsd_nobatch, intgroup="genotype")+
geom_text(aes(label=threeDSSmetadata$id), color = "black")
Thank you very much for your help! Please let me know if anything isn't clear from my question.
sessionInfo( )
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] ggplot2_3.3.3 pheatmap_1.0.12
[3] RColorBrewer_1.1-2 RUVSeq_1.20.0
[5] edgeR_3.28.1 limma_3.42.2
[7] EDASeq_2.20.0 ShortRead_1.44.3
[9] GenomicAlignments_1.22.1 Rsamtools_2.2.3
[11] Biostrings_2.54.0 XVector_0.26.0
[13] dplyr_1.0.5 DESeq2_1.26.0
[15] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[17] BiocParallel_1.20.1 matrixStats_0.58.0
[19] Biobase_2.46.0 GenomicRanges_1.38.0
[21] GenomeInfoDb_1.22.1 IRanges_2.20.2
[23] S4Vectors_0.24.4 BiocGenerics_0.32.0
Ah I got it now, thank you very much!