DESEq2 Principal Component Analysis with different genes biotypes (protein coding, miRNAs, lncRNAs, etc)
Hi,

I would to generate different Principal Component Analysis using DESeq2, after rlog transformation. I would to generate one PCA for protein coding genes, one PCA for miRNAs, one PCA for lncRNAs, etc (for all gene_biotypes in my reference annotations). I have count files from 33 RNA-seq samples (11 conditions). Here is the code I used to generate a PCA from all my data (all gene_biotypes) :

directory <- "/mydirectory"

directory

sampleFiles <- grep("htseqcount",list.files(directory),value=TRUE)

sampleCondition=c("condition1","condition1","condition1","condition2","condition2","condition2","condition3","condition3","condition3",[...]"condition11","condition11","condition11")

sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)

library("DESeq2")

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition)

dds <- DESeq(ddsHTSeq)

rld <- rlog(dds, blind=FALSE)

plotPCA(rld, intgroup=c("condition"))

How can I adapt this code to generate all PCAs I want to ? My issue is that I can extract all genes from one biotype (protein coding genes for example) and after run the code above and then do the same thing for others biotypes. But I think I must normalize data once with all biotypes and then generate PCAs. Have you some advices to generate theses PCA ?

Thanks a lot !

You should add an indexing step before calling plotPCA:

idx <- ... # identify the relevant gene sub-types

plotPCA(rld[idx,], ...)
Thanks a lot for your answer ! It helps me a lot !