Hi,
I am dealing with RNASeq data that render ludicrously hig number of DEGs in most of the contrasts I have tested. In the attached image file I reproduced the summary table of the analysis. The table shows percentages over the whole set, “NS” meaning non-significant (i.e. BH>0.05) and “UP” or “DOWN” being the percentages of up or downregulated genes on every comparison (in columns).
You’ll soon realize that for many cases the DEGs (i.e. 100-NS) are around 70 to 80 percent of the genes. Reassuringly enough, the UP and DOWN numbers are balanced, but I still wonder if the numbers of DEGS make the analysis dubious. I have had big numbers of DEGs in the past, but this seems too much. I should add that only 45 genes are not differentially expressed under any of the conditions.
Do you have any views on the subject you’d be willing to share? I’d be grateful if I could to get any feedback on this.
Best
PS:
For some reason the Imgur link was not acceped. Please find below the above-mentioned table.
|
GP_G |
GP_Gpl |
Gpl_G |
T2T_T1T |
P3T_T1T |
A3T_T1T |
P4T_T1T |
J4T_T1T |
A4T_T1T |
J5T_T1T |
A5T_T1T |
J4H_P4H |
A4H_P4H |
J5F_J5H |
A5H_J5H |
A5FS_J5H |
A5F_J5H |
DOWN |
37 |
34 |
35 |
12 |
35 |
23 |
36 |
37 |
38 |
36 |
37 |
30 |
30 |
36 |
29 |
36 |
33 |
UP |
40 |
34 |
36 |
13 |
27 |
22 |
36 |
37 |
38 |
36 |
37 |
36 |
37 |
37 |
24 |
37 |
34 |
NS |
23 |
32 |
29 |
75 |
38 |
55 |
28 |
25 |
25 |
28 |
26 |
33 |
34 |
27 |
47 |
27 |
32 |
Well, it's hard to tell, because you don't post any code showing what you've done, or the experimental set-up.
Hi Aaron, thanks for your quick reply. THe column correspond to the contrast I've looked at. They are simple comparisons between two conditions at a time: FOr instance GP vs G, GP vs Gpl etc. I attach here below the same table as above, but transposed, which looks better. For every condition there weree three replicates.
Code? And do you really only have 100 genes in total? Are those actually percentages?
Yes, they are percentages. Please find below the real numbers and some code further below. Just note that what I call design factor corresponds to the modalities you may see on the contrasts and that reps, which I have implemented in the model, correspond to the replicate effect. Thanks again for your help.
Code:
design=model.matrix(~0+design.factor+reps)
colnames(design)=sub("design.factor|reps","",colnames(design))
rownames(design)=colnames(counts)
dge <- estimateGLMCommonDisp(dge,design,verbose=TRUE)
# Disp = 0.00196 , BCV = 0.0443
dge <- estimateGLMTrendedDisp(dge,design)
dge <- estimateGLMTagwiseDisp(dge,design)
fit <- glmFit (dge,design)
my.contrasts=makeContrasts(
GP_G = GP-G,
GP_Gpl = GP-Gplus,
Gpl_G = Gplus-G,
T2T_T1T = T2T-T1T,
P3T_T1T = P3T-T1T,
A3T_T1T = A3T-T1T,
P4T_T1T = P4T-T1T,
J4T_T1T = J4T-T1T,
A4T_T1T = A4T-T1T,
J5T_T1T = J5T-T1T,
A5T_T1T = A5T-T1T,
J4H_P4H = J4H-P4H,
A4H_P4H = A4H-P4H,
J5F_J5H = J5F-J5H,
A5H_J5H = A5H-J5H,
A5FS_J5H = A5FS-J5H,
A5F_J5H = A5F-J5H,
levels=design)
lrt=as.list(c(1:ncol(my.contrasts)))
names(lrt)=colnames(my.contrasts)
for (i in 1:ncol(my.contrasts))
lrt[[i]] <- glmLRT(fit, contrast=my.contrasts[,i])