Question

Removing Effect of Covariate in Only a Subset of Samples in a Model Containing All Samples Loses Full Rank if Factor but not Numerical

1

Entering edit mode

danphillips28 ▴ 10

@33bd02b2

Last seen 4 months ago

United Kingdom

I previously made a post asking about how to model my data using limma (limma: Genes Associated with Continuous Predictor in Subsets of Samples). In brief, I have samples from ~130 people, approximately half male and half female. For most people, there is a sample from tissue 1 and a sample for tissue 2. One primary focus of our experiment is to find genes associated with fat distribution. Because males and female differ in regards to this fatness trait, and so too do the two tissues, I thought it appropriate to model gene expression in each of the four tissues separately (male-tissue 1, female-tissue 1, male-tissue 2, female-tissue 2). I have also corrected for some covariates: age, sequencing batch, overall fatness. I also accounted for sample pairings using duplicateCorrelation. The code is shown below.


design.AGR.fmi <- model.matrix(~0 + 
                             sex.depot + # A factor combining sex.tissue
                             sex.depot:Android.Gynoid.Ratio + 
                             machine + 
                             age.at.Biopsy + 
                             Fatmass.Index, 
                           data = targets.multiomics) 

corfit.agr.fmi <- duplicateCorrelation(log2.cpm.filtered.norm, design.AGR.fmi, block = targets.multiomics$Individual.ID)

counts_voom.agr.fmi <- voomWithQualityWeights(myDGEList.filtered.norm, design.AGR.fmi)

corfit.agr.fmi <- duplicateCorrelation(counts_voom.agr.fmi, 
                               design.AGR.fmi, 
                               block=targets.multiomics$OBB.ID,
                               corfit.agr.fmi$consensus)

counts_voom.agr.fmi <- voomWithQualityWeights(myDGEList.filtered.norm, 
                                          design.AGR.fmi, 
                                          block=targets.multiomics$Individual.ID,
                                          correlation=corfit.agr.fmi$consensus)

fit.agr.fmi <- lmFit(counts_voom.agr.fmi, design.AGR.fmi)

ebFit.agr.fmi <- eBayes(fit.agr.fmi, robust = TRUE)

results.agr.fmi <- decideTests(ebFit.agr.fmi, 
                           method="separate", 
                           adjust.method="BH", 
                           p.value=pval_threshold, 
                           lfc=0)

The above has been working well. However, the age distribution of the females covered likely pre-, peri-, and post-menopausal ages. We tested FSH levels for the women and have now confidently labelled each women as either pre-, peri-, or post-menopausal. However, when I include menopausal status (as a factor) as a covariate, I receive a warning letting me know the matrix is no longer full rank. This is the case whether males are all labelled as "male" or by their individual ID code. What especially confused me is that if I include the FSH measurement as a continuous value, the matrix remains full rank. I'm not exactly sure how to deal with this as the correction needs to be done using menopausal status, which is somehow loses full rank.

Greatly appreciate some wisdom here! I am a little out of my depth with this project I feel.

Thanks in advance

limma • 1.6k views

ADD COMMENT • link updated 5 months ago by James W. MacDonald 68k • written 5 months ago by danphillips28 ▴ 10

score 1 · Answer 1 · 2024-10-09

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

Do you end up with a column of all zeros when you include menopausal status as a factor? Also, please note that voomLmFit is the preferred method rather than iterating between voomWithQualityWeights and duplicateCorrelation.

ADD COMMENT • link 5 months ago James W. MacDonald 68k