I would to generate different Principal Component Analysis using DESeq2, after rlog transformation. I would to generate one PCA for protein coding genes, one PCA for miRNAs, one PCA for lncRNAs, etc (for all gene_biotypes in my reference annotations). I have count files from 33 RNA-seq samples (11 conditions). Here is the code I used to generate a PCA from all my data (all gene_biotypes) :
directory <- "/mydirectory" directory sampleFiles <- grep("htseqcount",list.files(directory),value=TRUE) sampleCondition=c("condition1","condition1","condition1","condition2","condition2","condition2","condition3","condition3","condition3",[...]"condition11","condition11","condition11") sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) library("DESeq2") ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition) dds <- DESeq(ddsHTSeq) rld <- rlog(dds, blind=FALSE) plotPCA(rld, intgroup=c("condition"))
How can I adapt this code to generate all PCAs I want to ? My issue is that I can extract all genes from one biotype (protein coding genes for example) and after run the code above and then do the same thing for others biotypes. But I think I must normalize data once with all biotypes and then generate PCAs. Have you some advices to generate theses PCA ?
Thanks a lot !