I am performing an EdgeR analysis in TCGA samples. I have a subset of the 83 patients I want to check their differential expression in comparison to the 41 samples of normal tissue. Checking for the differential expression of known critical genes (basically almost all the genes taking part in signaling pathways involved in cancer) I see that almost all of the 297 genes have a logFC < 0.8.
In general, the results include
Genes with counts > 1 : 21251
Significant DEGs FDR < 0.05 16222
Significant DEGs FDR < 0.05 with logFC > 2 1640
Significant DEGs FDR < 0.05 with logFC < -2 1664
I would assume that not all, but at least some, would show a high differential expression. I am concerned that the results are kind of "diluted" or the problem comes from the fact that the samples are not paired (normal samples come from other patients).
I have performed this analyis in 4 different subset of patients and the results are almost the same when it comes to the known critical genes.
This is the code I used during my analysis.
query = GDCquery(project = "TCGA-COAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts", barcode = c(subtype1.patients$IDs,Normal$IDs)) GDCdownload(query) # Save in matrix form exp_subtype1_NORMAL_SE = GDCprepare(query=query, save=TRUE, save.filename = "exp_subtype1_NORMAL.rda", summarizedExperiment = FALSE) exp_subtype1_NORMAL_SE = exp_subtype1_NORMAL_SE[-c(1:5),] rownames(exp_subtype1_NORMAL_SE) = exp_subtype1_NORMAL_SE$X1 exp_subtype1_NORMAL_SE = exp_subtype1_NORMAL_SE[,-1] #Define groups and model matrix groups = factor(c(rep("subtype1", 83), rep("normal",41))) groups = relevel(groups, ref = "normal") design = model.matrix(~groups, y$samples) #Convert to DGEList y1 <- DGEList(counts=exp_subtype1_NORMAL_SE,genes = rownames(exp_subtype1_NORMAL_SE), group = groups) # Filter out low expressed genes keep <- rowSums(cpm(y1)>1) >= 2 y1 <- y1[keep, , keep.lib.sizes=FALSE] y1 <- calcNormFactors(y1) y1 <- estimateDisp(y1) et1 <- exactTest(y1) tt1 <- topTags(et, n=nrow(y), p.value=0.05)
Here is the first three columns of the expression matrix used in the start of the analysis.
head(exp_subtype1_NORMAL_SE)[,1:3] TCGA-AA-3815-01A-01R-1022-07 TCGA-AA-A02R-01A-01R-A00A-07 TCGA-CK-4951-01A-01R-1410-07 ENSG00000000003.13 2449 1040 6513 ENSG00000000005.5 6 0 8 ENSG00000000419.11 487 450 959 ENSG00000000457.12 269 223 307 ENSG00000000460.15 177 174 246 ENSG00000000938.11 331 415 324
Hope I have provided everything needed.
Thank you in advance.