Limma - how to manage huge differences in library sizes
2
0
Entering edit mode
baus87 • 0
@baus87-13757
Last seen 6.0 years ago

Dear Bioconductor community,

i'm performing a differential expression analysis (classic approach: treatment vs control condition) in which library sizes of biological replicates of control thesis exceed of 100 times those of treatment condition. The software used, limma, is impaired in DEGs detection despite filtering threshold was set up on the basis of the lowest library size. In particular, limma found up-/down-regulated genes which actually are not (this is clearly showed up by looking at raw counts of such genes).

How can I manage it?

Thanks in advance

limma • 2.6k views
ADD COMMENT
0
Entering edit mode

Dear Wolfgang,

thanks for your reply. Actually it's a consequence of experimental design. We're talking about fungal pure culture (control condition) vs. an in planta fungal interaction at the very early stage of fungal growth. Regrettably, we do not have negative control. Also, the failure in proper DEGs identification it is not a consequence of a mistaken limma workflow, since i verified that this drawback does not occur when taking into account theses having similar library sizes.

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 17 minutes ago
WEHI, Melbourne, Australia

The fact that the sequencing depths are so radically different is certainly undesirable, especially when library size is confounded with treatment condition. However, limma does not give DE results just because the library sizes are different.

I assume that you have RNA-seq data and you are using limma-voom. voom is specifically designed to handle library sizes that vary by an order of magnitude or more. Are you using robust=TRUE when you call eBayes()? That may help. How many replicates do you have?

There is only one situation in which limma can give too much DE, which is when you have a lot of genes with total count equal to zero in one of the treatment conditions (i.e., all the replicate treatment condition samples have count=0, but the control counts are not zero). This could easily be occurring in your study if the treated samples have small library sizes. Otherwise limma should control the error rate correctly. I don't believe that limma is so dumb that you can determine the true status of the genes yourself without any doubt by just eyeballing the counts whereas limma can't figure it out correctly. If you wish to claim this, you need to give some evidence.

I myself use edgeR instead of limma in extreme count situations when there are lots of zeros. The edgeR quasi-likelihood pipeline (described here:)

    https://www.bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

    https://f1000research.com/articles/5-1438

is similar to limma, but should control the error rate correctly even when the counts are all zero on one group. This issue is discussed here:

    No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data

ADD COMMENT
0
Entering edit mode

Dear Gordon,

the situation in my experiment was actually the opposite: the "treatment" condition is full of genes having 0 counts in all libraries (4). In such situation, by using voom with quality weights, I got some up-regulated genes (contrast matrix: T-C) having 0 counts in all libraries. 
However, I managed by using the edgeR QL as you suggested. 
Thank you so much,
best regards

ADD REPLY
0
Entering edit mode

Ah, yes, I had misread which of the "treatment" or "control" samples had the smaller library sizes. I've edited my answer to fix that.

Glad that edgeR QL worked out for you.

ADD REPLY
1
Entering edit mode
@wolfgang-huber-3550
Last seen 17 days ago
EMBL European Molecular Biology Laborat…

Is this a data quality problem or an expected consequence of the experimental design?

If it's the former, then perhaps it's best to stop here.

If it's the latter, then what is your approach to deconfound true biological effects on library size from technical variations? Do you have spike-in controls, known 'negative control' genes, etc.?

ADD COMMENT

Login before adding your answer.

Traffic: 960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6