Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Hi mailing list,
I have a question regarding the plotPCA function in DESeq.
Looking into the plotPCA code I realised that the PCA function takes
into account the 500 genes (ntop = 500 ,500 is just for an example, as
this number can be adjusted). Am I correct in understanding that this
500 genes are the most variable genes??
plotPCA = function(x, intgroup, ntop=500)
{
require("lattice")
require("genefilter")
rv = rowVars(exprs(x))
select = order(rv, decreasing=TRUE)[seq_len(ntop)]
pca = prcomp(t(exprs(x)[select,]))
fac = factor(apply(pData(vsdFull)[, intgroup], 1, paste, collapse="
: "))
colours = brewer.pal(nlevels(fac), "Paired")
pcafig = xyplot(PC2 ~ PC1, groups=fac, data=as.data.frame(pca$x),
pch=16, cex=2,
aspect = "iso", col=colours,
main = draw.key(key = list(
rect = list(col = colours),
text = list(levels(fac)),
rep = FALSE)))
}
---
Specifically what is actually meant by most variable genes?? and why
would one use variable genes it in PCA plot??
Would a conclusion be is - If the 500 most variable gene cluster
together (as seen from PCA plot [figure 17] in the DESeq vignttes), it
means our expression data is good?? ... because even the most variable
genes do group together??
More generally (not DESeq specific)...If the purpose of doing a PCA is
to get a general overview on the data. Would it be best to do a PCA on
all of the genes rather than a subset (say 500)?
Appreciate any insight into this matter as I am new in R and RNA-seq
Many thanks
Zaki
-- output of sessionInfo():
not relevant
--
Sent via the guest posting facility at bioconductor.org.