Hi there, I'm a complete newbie to working with data in R for my own project that's not been cleaned up and streamlined for a university course, so wanted to ask if anyone has had experience working with or has thoughts on using data from the TCGA's RNA Seq V2 RSEM data for differential expression analysis through voom and limma
After scouring the forums, it seems that there's a general consensus to filter out any low counts and then run voom-limma
My question is whether the presence of a mutation in a tumour leads to significant up or downregulation of genes compared to those without the mutation. However, when I do this with the TCGA's RSEM data, I get that almost all of the genes in both my groups are significantly upregulated. Any thoughts or advice would be appreciated. This is bits and pieces of what I've been able to learn online through papers, forums, and youtube videos.
I can also supply more code to try to describe the problem better. Thanks so much for reading!
#mrna is a dataframe of 20531 genes x 230 samples counts <- mrna counts[is.na(counts)] <- 0 d0 <- DGEList(counts) #mut is a variable of 'Yes' and 'No' regarding whether the patient harbours the mutation of interest or not group <- as.factor(sample$mut) d0$samples$group <- group keep <- rowSums(counts) > 20 counts <- counts[keep,] design <- model.matrix(~0+group) colnames(design) <- gsub("group", "", colnames(design)) v <- voom(d0, design, plot=T) vfit <- lmFit(v, design) efit <- eBayes(vfit) summary(decideTests(efit)) tfit <- treat(vfit, lfc=1) dt <- decideTests(tfit) summary(dt) #the output of this is that both the groups with and without the mutation almost all the 20,300 genes that were included are upregulated