Hi everybody,
I performed differential gene expression analysis for RNA-seq data with edgeR. The method for testing DE genes was glmLRTest. For each locality (8 in the experiment), DE comparisons have been performed between 2 samples (treatment vs control) with 3 replicates each (i.e.: controls_locality1 vs treatments_localitity1; controls_locality2 vs treatments_localitity2).
Raw counts of some genes in the two samples of the same locality are zero but high log fold-change (LFC) were estimated (p-adjusted >0.05).
After consulting several posts and edgeR manual my first basic understanding is that our LFC results are linked to the internal transformations and normalization (pseudo- count addition, library-size,) that edgeR applied on raw counts in order to adjust zero counts to some value larger than zero to allow FC estimation,
a) Is that correct?
b) Due to I’ve calculated DE between sample groups of the same locality, it’s hard for me to understand how a gene with zero values in its six replicates is highly up-regulated (i.e: logFC=8.3) in locality1.
Thank you very much for your help.
Hello, Thank you so much for your response. In fact, I removed those genes that have at least a cpm of 1 or greater for at least three samples (the size of the smallest group of replicates).
dgefilt <- rowSums(cpm(dge)>=1) >= 3
I expected that those genes with counts below 1 CPM in 3 or more replicates would be removed.
TRINITY_DN12592_c0_g1 0.00 0.00 0.00 0.00 0.00 0.00 57.04 37.41 0.00 0.00 0.00 54.99 0.00 0.00 36.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.81 0.00 3.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The comparison between T_SARvsC_SAR estimated a logFC=8.3 for this gene.
Thanks for your help.