p-adjusted values in multiple comparisons
1
0
Entering edit mode
@ac2327d8
Last seen 3 days ago
Spain

Dear community,

I have a question regarding the p-value adjustments in multiple testing in RNA-seq. I exposed fish from the same species but from different regions (3 sites per region) to nitrite in the lab and want to see different responses between Regions to the exposure to nitrite in the gene expression of liver with RNA-seq. I used 5 - 7 replicates per condition (control - exposed) and Site (110 samples in total), and I think that this experimental design fits the statistical methodology of the 'block' and 'correlation' parameters in the lmFit function in limma.

I filtered low count genes ('filterByExpr'), normalised them ('calcNormFactors' with method "TMM"), and I obtain with lmFit a few p-values (420/13000) lower than 0.05 in a given factor, but when applying the p-value adjustment for multiple testing of B-H all p-values become > 0.99. I am aware that I am probably going short of statistical power and/or the physiological effects are mild because the exposure was to environmentally relevant concentrations of nitrite instead of extreme exposures, but with other biomarkers I did see significant changes in respirometry rates, egg sizes, gene expression in gills in a few genes with RT-qPCR, and blood parameters. Therefore, I think the penalisation of using so many multiple testing in RNA-seq is driving my results. Is there an alternative approach in these cases (e.g. pathway analysis as a whole)? Thank you for your consideration.

```

StatisticalMethod limma RnaSeqSampleSizeData • 244 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 17 hours ago
WEHI, Melbourne, Australia

If your data was completely random, with no truly differentially expressed (DE) genes, then you would expect to get nearly 5% of the p-values < 0.05. You have only 3.3% p-values < 0.05, suggesting that there is no true DE at all, so multiple testing adjustment doesn't seem to be the issue. If you want to follow this up further, limma provides the propTrueNull() function to estimate the proportion of truly DE genes in your data independently of multiple testing adjustments.

Anyway, the limma functions fry and camera do allow you to do pathway analysis on the data as a whole, without selecting DE genes, so that could be a way forward.

limma provides a number of analysis refinements to squeeze more power out of poor quality data. You can look for outliers with plotMDS(), turn on sample weights with voomLmFit(), and set robust=TRUE when running eBayes(). Trouble-shooting the data is generally a better route than trying to avoid multiple testing adjustment. Checking whether the fitted model seems correct for your experiment is also part of the trouble-shooting process.

ADD COMMENT
0
Entering edit mode

Thank you Gordon for your thorough reply. It seems when MDS-plotting the obtained object from 'voom', that the points do not cluster together according to the 'treatment' group. When turning on sample weights with voomLmFit the percentage of uncorrected p-values <0.05 improves a little bit, it raises to 5.2% for Region 1 and 7% for Region 2 and 5% for Region 3. The 3.3% value from my previous message was from Region 2. Anyway when applying the B-H procedure all become > 0.05. Therefore, I guess that any sublethal effect in the exposed fish is not being captured by my experimental design.....

ADD REPLY
1
Entering edit mode

plotMDS() is best run on the original DGEList object, i.e., on the input to voom rather than on the output. See the example analyses in the limma documentation and associated workflows.

ADD REPLY

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6