Question

edgeR_robust: where are moderately expressed genes?

0

Entering edit mode

marek_koter • 0

@marek_koter-8946

Last seen 8.5 years ago

Poland

Dear all,

I'm running this edgeR-robust code to analyse my miRNA-seq DE sequences. But I receive only highly/lowly expressed miRNAs with logFC values higher than 4/lower than -4. But nothing in between... Should I change some parametres to get the whole logFC spectrum???

Thank you, Marek D. Koter

My code:

rm(list = ls())
library(edgeR)
y <- read.delim("3DPI_All_bez_K2_I2.txt", row.names="seq", header = T)
group <- factor(c(1,1,1,2,2,2))
keep<-rowSums(cpm(y)>5) >=1
y<-y[keep, ]
y <- DGEList(counts=y,group=group)
design <- model.matrix(~group)
y <- calcNormFactors(y)
y <- estimateGLMRobustDisp(y, design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef=2)
topTags(lrt, n = 50)

Fragment of my results:

	logFC	logCPM	LR	PValue	FDR
CTACCCCTGAACTTATTTTAT	-5,28003	2,902426	4,407249	0,035787	0,362601
TTTCGTTACTCTGAGGCACA	-4,83913	2,484342	3,876514	0,048966	0,362601
AAAGAGACATAACTTTGGGCT	4,173604	1,771509	3,934848	0,047295	0,362601
AACAGGACACTCTAAATCGGG	4,173604	1,771509	3,934848	0,047295	0,362601

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] edgeR_3.10.5  limma_3.24.15
loaded via a namespace (and not attached):
[1] tools_3.2.2

edger rnaseq microrna • 1.7k views

ADD COMMENT • link updated 8.5 years ago by Aaron Lun ★ 28k • written 8.5 years ago by marek_koter • 0

0

Entering edit mode

You're using an old version of Bioconductor. I suggest you upgrade to BioC 3.2 and edgeR 3.12.0 and use that for your analyses, to make sure this behaviour isn't present in the latest version.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

score 2 · Answer 1 · 2015-11-05

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 20 hours ago

The city by the bay

Obviously, if you take the top set of DE genes with topTags, you'll be looking at the genes that have the strongest differences between conditions. So it's no surprise that you get lots of genes with extreme log-fold changes. If you want to get all the genes (including those with smaller log-FCs), you should set n=Inf when running topTags. This will report all genes in the data set, ordered by p-value.

As an aside, it seems that none of your top genes are significant at a FDR of 5%, even though they have strong log-fold changes. I'd suggest having a closer look at the counts and seeing if your large fold changes are being driven by an outlier sample. This would also inflate the dispersion and reduce the significance of any DE. If there is an outlier, you could try removing it to see whether it improves DE detection. Of course, the other possibility is that your data is just inherently variable, in which case there's not much that can be done.

ADD COMMENT • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

You forgot to mention that there is something you can do with variable data! The OP could try using voomWithQualityWeights!

ADD REPLY • link 8.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

While that's true, I was thinking more of a situation where you have high biological variability across all samples, rather than variability induced by a number of poor-quality samples in each condition. In such cases, I don't think sample-based weighting would provide any benefit, as each sample would be of the same (relative) quality such that the weights would all be the same. Besides, you'd have to switch from edgeR to voom, which may or may not be a problem for miRNA data depending on the size of the counts, the mean-variance relationship, etc. I've never worked with that stuff, so I wouldn't know the details - I suspect that voom would be okay for miRNA counts, but the proof is in the pudding.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Steve :-)

ADD REPLY • link 8.5 years ago marek_koter • 0

0

Entering edit mode

Many thanks Aaron :-)

ADD REPLY • link 8.5 years ago marek_koter • 0