Hi!
I'm doing some DE analysis via edgeR. There is one part I have some trouble to understand, and I suspect it may have to do with poor mathematical understanding on my part.
Below I show an MA-plot. The points (genes) are labeled according to some gene lists. The genes are marked red if they are considered significantly deferentially expressed. Or this is my goal anyway. In order to do this, I make use of the following function within edgeR:
decideTests(efit, adjust.method="BH")!=0
(0 in this case is genes that are not flagged as significant)
my understanding is that "efit" have a 5% p-adjusted threshold as default
I do however find this plot somewhat confusing. Differential expression is more pronounced at the extreme in the y-axis, and futher to the right of the x-axis would mean that we have more "confident"/"support" of our differential expression since we have a higher number of counts (i.e. more data to rely on). Hence, would it not be more logical if the red dots appear in the top right / buttom right corner of the graph. Some genes are marked black eventhough they have a higher |FC| AND higher mean of expression.
Is my reasoning wrong here?
Furthermore, there might be something I have got wrong about the structure of the MArrayLM object, since when I do the following
table(efit$F.p.value<0.05)
I get a very much higher number of hits. Shouldn't these be the same? Or is this some other padjusted value?
Any explanation would be highly appreciated!
Thanks!
/JB
Please post the code that you used to perform the DE analyses and to generate this plot.
#**** Reading in my count files ****** dgeobj = readDGE(files, columns=c(1,2), header=FALSE) #**** Just adding some names ***** samplenames = substring(colnames(dgeobj), 0 ,3) colnames(dgeobj) = samplenames #**** Group according to treatment **** group = factor(c("Medium", "Polyl", "Medium", "Polyl"), levels = c("Medium", "Polyl")) cell = as.factor(c("D68", "D68", "D69", "D69")) dgeobj$samples$group = group dgeobj$samples$cell = cell #**** geneIDs are stored in rownames of DGEList object ***** geneid = rownames(dgeobj) genes = select(Homo.sapiens, keys=geneid, columns=c("SYMBOL"), keytype="ENSEMBL") #**** Check and remove duplicate geneIDs **** dup = genes$ENSEMBL[duplicated(genes$ENSEMBL)] mat = match(geneid, genes$ENSEMBL) genes = genes[mat,] dgeobj$genes = genes