1
0
Entering edit mode
@sofiagreen72211-13691
Last seen 22 months ago
United States

Hi All,

I'm trying to draw scaled PCA plot for my sequencing data while using DESeq2. The code, I am using is:

ds <- DESeqDataSetFromMatrix(countData=counts, colData=samples, design=~condition)
ds$condition <- relevel(ds$condition, "Ctrl")
ds <- ds[rowSums(counts(ds)>1) >= 2, ] #PCA plot

# transformation
rld_ds <- rlog(ds, blind=FALSE) #regularized log-transformation

#PCA plot1
plotPCA(rld_ds, intgroup = c("SampleName"))

# PCA plot 2 using ggplot2
(pcadata <- plotPCA(rld_ds, intgroup = c( "condition", "SampleName"), returnData=TRUE))
(percentVar <- round(100 * attr(pcadata, "percentVar"))

ggplot(pcadata, aes(PC1, PC2, color=condition, label=SampleName)) +
geom_point(size=3) +
geom_text(aes(label=SampleName),hjust=.5, vjust=-1, size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) #plot 2

Now, my question:

Are these PCA plots already scaled due to rld transformation? if not, how can I do that?

Thanks,

Sofia

deseq2 • 2.4k views
1
Entering edit mode
@steve-lianoglou-2771
Last seen 4 weeks ago
United States

You can take a look at the source for plotPCA by fetching it like so:

library(DESeq2)
getMethod("plotPCA", "DESeqTransform")


You will find in there the call to prcomp that does the PCA, namely:

pca <- prcomp(t(assay(object)[select, ]))


and a call to ?prcomp will show you the default values for the center and scale. paramenters (TRUE, and FALSE, respecitvely).

If you want to change the behavior, you can make your own pcaPlot function by copying the the code of that function into your own and tweaking accordingly.

0
Entering edit mode

Hi Steve:

Thank you for your quick response.

As I am passing rlog transformation object (rld_ds) to plotPCA which stabilizes variance applied to the raw data, so do I still need to scale my PCAplot with:

pca <- prcomp(rld_ds, center=T, scale=T)

Sofia

1
Entering edit mode

I think it's best not to scale the data (make all the genes have same variance). I've posted an argument on the support site before about this, but it's impossible to find again. Basically, the rlog or vst put the data on a scale so there are no systematic differences due to the dynamic range. Forcing each gene to have variance one makes it worse (reduces signal and amplifies noise).

0
Entering edit mode

Hi Michael:

Many thanks,

Sofia

0
Entering edit mode

I've tried to find that post several times now, too! Sadly I've had no luck with it either :-/

1
Entering edit mode
0
Entering edit mode

Yes! Thanks, bookmarked...

0
Entering edit mode

How did you divine that post?

1
Entering edit mode

I utilized my l33t Google skillz:

pca plot scale love site:support.bioconductor.org

the search on this site is worthless, so I always just use Google directly with the site qualifier. It's the first hit on the second page (at least based on what Google thinks I want).

0
Entering edit mode

nice move -- strangely(?), it's also the it's also the first on the second page for me, too.

Maybe the search box on this site should just urlencode a google query with site:support.bioconductor.org appended to the end and bypass the internal search functionality entirely.

0
Entering edit mode