Search
Question: PCA from deseq and r function differ
0
gravatar for tonja.r
21 months ago by
tonja.r30
United Kingdom
tonja.r30 wrote:

I stabilize my data with rlog and then plot PCA with the DESeq proposed method. 

dds = DESeqDataSetFromMatrix(countData=histone_m, colData=Design, design=~condition)
cds=estimateSizeFactors(dds) 
met = rlog(dds)     
data <- plotPCA(met, intgroup=c("condition"), returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))
myPlot = ggplot(data, aes(PC1, PC2, color=name)) +
       geom_point(size=3) +
        xlab(paste0("PC1: ",percentVar[1],"% variance")) +
        ylab(paste0("PC2: ",percentVar[2],"% variance"))

      
        
 
Then I decided to do a normal pca from r:     

met =assay(met)
pca<- prcomp(t(met))
screeplot_percent(pca)
col=c("red","pink","black","blue")
idx <- seq_len(3)
print(splom(pca$x[,idx], col=col,pch=19))

 

Function for screeplot:

screeplot_percent <- function(x, npcs = min(10, length(x$sdev)), ...) {
  idx <- seq_len(npcs)
  sum_var <- sum(x$sdev ^ 2)
  vars <- 100 * (x$sdev[idx] ^ 2 / sum_var)
  cumvar <- cumsum(vars)
  
  barplot(vars, width = 0.9, space = 0.1, names.arg = idx, ylim = c(0, 100),
          xlab = "Principal Component", ylab = "Percent Variance",
          xaxp = c(1, npcs, npcs - 1), las = 1)
  lines(x = idx - 0.5, y = cumvar, type = "b", lty = 2)
  legend("bottomright", legend = c("Proportion", "Cumulative"), lty = c(1, 2),
         pch = c(19, 1))
}

 

red -> WEN1
pink -> WEN3
black -> WNN1
blue -> WNN3

 

In fact PCA plots differ a bit and screeplot differ a lot. However, I do not understand why. 
As you see WEN3 and WNN3 are quite apart on the second plot in comparison to the first plot. Additionally, the scale on the y-axis and x-axis is different. What is the reason for this?

Also my screeplot tells me that PC1 explains app. 75% of variance whereas ggplot claims 87%.

 

ADD COMMENTlink modified 21 months ago by Michael Love13k • written 21 months ago by tonja.r30
2
gravatar for Michael Love
21 months ago by
Michael Love13k
United States
Michael Love13k wrote:
Whenever you have a question about a function in Bioconductor, a good to start is with the help page for that function. For ?plotPCA in DESeq2, you'll see there is an extra step of filtering to use the top high variance genes.
ADD COMMENTlink written 21 months ago by Michael Love13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 190 users visited in the last hour