Can I ignore the warnings in limma show " Coefficients not estimable" after batch correct ?
1
1
Entering edit mode
xingxd16 ▴ 20
@xingxd16-20156
Last seen 2.6 years ago

Hi all :

• When I use limma to identify differentrial expression genes , I want to correct batch effect also.
• My command and data information is :
> table(cluster)
cluster
control   treat
368     957
> table(Exp)
Exp
p3   p4 LC05 LC09 LC10 LC11 LC12
370  587   21  210   19   69   49
> table(cluster,Exp)
Exp
cluster    p3  p4 LC05 LC09 LC10 LC11 LC12
control   0   0   21  210   19   69   49
treat   370 587    0    0    0    0    0
> design <- model.matrix(~ cluster+Exp)
> fit <- lmFit(selectExpressionSetObject, design)
Coefficients not estimable: ExpLC12
Warning message:
Partial NA coefficients for 6399 probe(s)
> fit <- eBayes(fit, trend=TRUE)
Warning messages:
1: In out$var.prior[is.na(out$var.prior)] <- 1/out\$s2.prior :
被替换的项目不是替换值长度的倍数(*Chinese here should translate to The replaced item is not a multiple of the length of the replacement value*)
2: In .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim,  :
Estimation of var.prior failed - set to default value
> res <- decideTests(fit, p.value=adjPvalueCutoff, lfc = logFCcutoff)
> summary(res)
(Intercept) clustertreat Expp4 ExpLC05 ExpLC09 ExpLC10 ExpLC11
Down             0          106    63       0      21       4       9
NotSig        5239         6259  6151    6387    6344    6374    6356
Up            1160           34   185      12      34      21      34
ExpLC12
Down         0
NotSig       0
Up           0
>
> DEgeneSets <- topTable(fit, coef="clustertreat", number=Inf,
gene_short_name Database     logFC   AveExpr         t      P.Value
SFTPC             SFTPC  emsembl -1.740316 0.4582606 -19.04131 4.347876e-72
SCGB3A1         SCGB3A1  emsembl -1.336138 0.5532937 -17.45647 6.572995e-62
H3F3A             H3F3A  emsembl -1.706766 0.6291855 -16.61615 9.704449e-57
HSPA1A           HSPA1A  emsembl  2.857433 1.6038324  14.99185 3.061836e-47
ZFP36L2         ZFP36L2  emsembl -2.080686 0.7430561 -14.98351 3.411612e-47
HSPA1B           HSPA1B  emsembl  1.978068 0.9748275  12.42446 1.060178e-33
SFTPC   2.782206e-68 151.97484
SCGB3A1 2.103030e-58 129.01848
H3F3A   2.069959e-53 117.36082
HSPA1A  4.366181e-44  95.94128
ZFP36L2 4.366181e-44  95.83537
HSPA1B  1.130679e-30  65.43102

• I know the reason may that the control samples are in 2 batch , the treat samples are in another 5 batch. So my matrix's columns are dependent and this causes the technical problem. I can not know that the different between control and treat are the biology difference or batch drived. Am I right ?
• Even there are warnings, I found that I can still get the results by " topTable(fit, coef="clustertreat") " see above . So my question is can I ignore the warnings and still use the results ? Or in this situation correct batch is wrong ? And I indeed found that some genes are logFC >0 without batch correct, but it turn to logFC <0 after batch correct .

Best

limma batchcorrect warnings • 673 views
0
Entering edit mode
2. "I found that I can still get the results by ." Well, by what?

A sloppy question deserves a sloppy answer. So don't be sloppy.

0
Entering edit mode

Sure, sorry for that. I have edit my question. Hope you reply !

0
Entering edit mode

I'm not sure that's much better. You know there is specific formatting for code, right? Looks like this:

this_is_some_code
# This is a comment that you should use for code output


0
Entering edit mode

3
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 1 hour ago
The city by the bay

I can not know that the different between control and treat are the biology difference or batch drived. Am I right ?

Correct.

So my question is can I ignore the warnings and still use the results ?

No.

Or in this situation correct batch is wrong ?

Yes. The experimental design was broken to begin with, and the warnings are a manifestation of that. Whenever you have coefficients that cannot be estimated, it usually means that something is confounded with something else. On occasion, this can be worked around, but not in this case. You don't have enough information to estimate all of the coefficients so limma just drops one of them to make the model mathematically solvable. However, removing one coefficient changes the meaning of all other coefficients such that your treatment term no longer represents the average treatment effect. It probably instead represents the difference between LC05 and p3 alone.

If the batches are replicates of each other, I would suggest:

1. Summing the counts across cells within each batch.
2. Pretending the summed counts are pseudo-bulk RNA-seq samples.
3. Analyzing the result in edgeR or voom as a two-group comparison.

There's no need to account for batch, because each batch is a single replicate in a pseudo-bulk analysis; trying to block on each replicate would make no sense. See here for an example of how this could be done.

0
Entering edit mode

Good explainations and good adives !!! Sincere thanks.