Question

p-adjusted values in multiple comparisons

0

Entering edit mode

ocanorocabayera • 0

@ac2327d8

Last seen 5 months ago

Spain

Dear community,

I have a question regarding the p-value adjustments in multiple testing in RNA-seq. I exposed fish from the same species but from different regions (3 sites per region) to nitrite in the lab and want to see different responses between Regions to the exposure to nitrite in the gene expression of liver with RNA-seq. I used 5 - 7 replicates per condition (control - exposed) and Site (110 samples in total), and I think that this experimental design fits the statistical methodology of the 'block' and 'correlation' parameters in the lmFit function in limma.

I filtered low count genes ('filterByExpr'), normalised them ('calcNormFactors' with method "TMM"), and I obtain with lmFit a few p-values (420/13000) lower than 0.05 in a given factor, but when applying the p-value adjustment for multiple testing of B-H all p-values become > 0.99. I am aware that I am probably going short of statistical power and/or the physiological effects are mild because the exposure was to environmentally relevant concentrations of nitrite instead of extreme exposures, but with other biomarkers I did see significant changes in respirometry rates, egg sizes, gene expression in gills in a few genes with RT-qPCR, and blood parameters. Therefore, I think the penalisation of using so many multiple testing in RNA-seq is driving my results. Is there an alternative approach in these cases (e.g. pathway analysis as a whole)? Thank you for your consideration.

```

StatisticalMethod limma RnaSeqSampleSizeData • 669 views

ADD COMMENT • link updated 5 months ago by Josh • 0 • written 6 months ago by ocanorocabayera • 0

score 2 · Answer 1 · 2025-08-16

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

If your data was completely random, with no truly differentially expressed (DE) genes, then you would expect to get nearly 5% of the p-values < 0.05. You have only 3.3% p-values < 0.05, suggesting that there is no true DE at all, so multiple testing adjustment doesn't seem to be the issue. If you want to follow this up further, limma provides the propTrueNull() function to estimate the proportion of truly DE genes in your data independently of multiple testing adjustments.

Anyway, the limma functions fry and camera do allow you to do pathway analysis on the data as a whole, without selecting DE genes, so that could be a way forward.

limma provides a number of analysis refinements to squeeze more power out of poor quality data. You can look for outliers with plotMDS(), turn on sample weights with voomLmFit(), and set robust=TRUE when running eBayes(). Trouble-shooting the data is generally a better route than trying to avoid multiple testing adjustment. Checking whether the fitted model seems correct for your experiment is also part of the trouble-shooting process.

ADD COMMENT • link 6 months ago Gordon Smyth 53k

0

Entering edit mode

Thank you Gordon for your thorough reply. It seems when MDS-plotting the obtained object from 'voom', that the points do not cluster together according to the 'treatment' group. When turning on sample weights with voomLmFit the percentage of uncorrected p-values <0.05 improves a little bit, it raises to 5.2% for Region 1 and 7% for Region 2 and 5% for Region 3. The 3.3% value from my previous message was from Region 2. Anyway when applying the B-H procedure all become > 0.05. Therefore, I guess that any sublethal effect in the exposed fish is not being captured by my experimental design.....

ADD REPLY • link 6 months ago ocanorocabayera • 0

1

Entering edit mode

plotMDS() is best run on the original DGEList object, i.e., on the input to voom rather than on the output. See the example analyses in the limma documentation and associated workflows.

ADD REPLY • link 6 months ago Gordon Smyth 53k

0

Entering edit mode

Yes, using lmFit with block and duplicateCorrelation is appropriate; mobile data collection can illustrate blocking importance. First estimate correlation with duplicateCorrelation(block=Site), then fit with lmFit(..., block=Site, correlation=cor) and call eBayes. For multiple testing, control false discoveries with Benjamini-Hochberg FDR (topTable adjust.method="BH"). With 5-7 replicates per condition and 110 samples you have reasonable power. Include Region, Exposure and their interaction in the model, test contrasts, and consider surrogate variables for unwanted variation. Report adjusted p-values and effect sizes for biological relevance.

ADD REPLY • link 5 months ago Josh • 0