limma: Genes Associated with Continuous Predictor in Subsets of Samples
1
0
Entering edit mode
@33bd02b2
Last seen 3 months ago
United Kingdom

Hello!

I have been using limma to analyse a relatively large multiomics dataset (RNA-Seq and 850k methylation array) of paired-samples. The paired pairwise-comparisons have been going smoothly (using the duplicateCorrelations approach), however, I now want to find genes whose expression is associated with a continuous predictor (Waist-to-Hip ratio). I have done this successfully before for all my samples together, but I am now struggling to do this within some groups of interest (male abdominal, male gluteal, female abdominal, female gluteal). To be clear: I want to find genes whose expression is significantly associated with waist-to-hip ratio in abdominal and gluteal tissues of males and females, each considered separately. My code is shown below (where targets$sex.depot is one of male.abdominal, male.gluteal, female.abdominal, or female.gluteal); design.WHR <- model.matrix(~new.WHR:sex.depot + machine + age.at.Biopsy + Cell.type1 + Cell.type2 + Cell.type3 + Cell.type4 + Cell.type5, data = targets) counts_voom.whr <- voomWithQualityWeights(myDGEList.filtered.norm,design.WHR) fit.whr <- lmFit(counts_voom.whr, design.WHR) ebFit.whr <- eBayes(fit.whr, robust = TRUE) results.whr <- decideTests(ebFit.whr, method="separate", adjust.method="BH", p.value=0.001, lfc=0) topTable(ebFit.whr, adjust ="BH", coef="new.WHR:sex.depotabdominal.female", number=40000, sort.by="none")  This code produces unusual results. Shown below are three of the top 15 most significant results from female abdominal: adjusted p values < 10^-25. Clearly something has not worked. However, after this top 15 or so, the subsequent results all look proper, as shown below. Furthermore, the results are broadly consistent with those I get when I simply correlate expression with WHR in each tissue. So, why are my top hits so weird looking? What have I done wrong? Your guidance is greatly appreciated! Thank you in advance. P.S Apologies if I made any mistakes - this is my first post! RNASeq limma • 251 views ADD COMMENT 0 Entering edit mode @gordon-smyth Last seen 8 hours ago WEHI, Melbourne, Australia The code doesn't show any statistical test or gene ranking, so it isn't clear what you've done. Did you run topTable and with what contrast? Anyway, that's not the problem. The design matrix will not actually allow you to find genes that depend on the covariate in specific groups anyway. Have a look at Section 8.2 here, which shows how to make covariate trends for different groups: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#covariates I am also a bit worried that you have removed duplicateCorrelation from the model. The samples are still paired even though you are examining a covariate associated with the subjects. ADD COMMENT 0 Entering edit mode Thanks for your clarification Gordon. Yes you are right, I missed a line of code calling topTable (to pull out female abdominal results), which I've now included in my original post. Thanks for directing me to 8.2 in your design matrix guide. Following that scheme, my model would look like; design.WHR <- model.matrix(~0 + sex.depot + new.WHR:sex.depot, data = targets) ... topTable(ebFit.whr, adjust ="BH", coef="new.WHR:sex.depotabdominal.female") # To find genes associated with WHR in Female Abdominal  This seems to have worked i.e. all significant results look like real associations when plotted in the correct sex:tissue, with no funky stuff. Thank you! You raise a good point about sample pairings, which I had planned to ask as a follow-up. I have added duplicateCorrelation values block=targets$individual, correlation=corfit\$consensus back into voomWithQualityWeights in the normal way.

After reading Limma time series spline help, and given I have decent sample sizes (~32 for each sex.depot block), I am wondering if you think exploring using the spline method,

X <- ns(WHR, df=5)
design.whr <- model.matrix(~0 + new.WHR + sex.depot:X)


might help me to to capture even more meaningful associations. What do you think?

Thanks again,

Dan

0
Entering edit mode

No, judging from the plots you show, a 5-parameter spline would be overkill. You only have vague correlations and anything more than 2 parameters seems unnecessary.

You can't fit a spline and include WHR in your model as well.

0
Entering edit mode

Thanks again Gordon. I've now produced some results that appear to be sensical and I'm happy with.

I was hoping I could steal one final answer from you about the interpretability of these results. I am aware that (e.g. from your answer here: https://www.biostars.org/p/383626/) that the logFC for a continuous predictor is the FC in log expression per unit increase of the predictor e.g. per 1 year, for age. However, my predictor is a ratio (waist circumference in cm/hip circumference in cm), ranging from ~0.7 to ~1. Some of my logFC values are large (e.g. 10) and I get the impression limma is basing this off a change in WHR of 1. Does this sound correct? If so I plan to divide by 10 to increase interpretability but thought I'd clarify first.

Best wishes, Dan