limma::contrast.fit() potential bugs when weights from voom() included
2
0
Entering edit mode
Zhenyu • 0
@66362ebb
Last seen 18 months ago
United States

Dear Limma authors,

Thank you very much for this amazing package. In the recent use of the package, I found a potential issue when analyzing RNASeq data. After doing some investigation, I found the issue only happens using contrast.fit() when weight is included from voom()

To illustrate the issue, I use GSE137430 data: there are 3 time points (BL, W4 and W16) and 2 treatments (Secukinumab and Placebo) in the data set. I only used LS tissue type. Let's only focus on change from baseline by treatment

I generate grp = paste(Treatment, Timepoint, sep = "_"), and output the result for gene ID ENSG00000000003.14. Please note that all the results are before eBayes(). The issue is all the standard errors are a little bit different when the design matrix change or reference level of treatment change in the same design matrix (2 & 3 below), given all the coefficients/estimates are the same.

  1. design matrix - model.matrix(~0+grp, data = info1) contrasts.fit() output

And origSE is the standard error before eBayes()

enter image description here

  1. design matrix - model.matrix(~1+Treatment*Timepoint, data=info1) treatment reference level is Placebo, and different design matrix from 1, but the model is practically the same. contrasts.fit() output

enter image description here

  1. design matrix - model.matrix(~1+Treatment*Timepoint, data=info2) treatment reference level is Secukinumab, the design matrix is the same as 2, the only different is Secukinumab is the reference level. contrasts.fit() output

enter image description here

As you can see all the coefficients across these three scenarios are the same, but SE and p values are slightly different.

The reason I suspect the difference caused by contrast.fit() is that lmFit() output seems good for the common terms among the first two models

  1. design matrix - model.matrix(~0+grp, data = info1) lmFit() output

enter image description here

  1. design matrix - model.matrix(~1+Treatment*Timepoint, data=info1) placebo as reference lmFit() output enter image description here

The intercept from model 2 is identical as grpPlacebo_BL output, and the intercept is the reference level of treatment and Timepoint (Placebo and Baseline). lmFit() output is probably correct.

Another interesting finding is the green circle part of the results are the same as the green circle part results from contrasts.fit(). The contrast for this output only has one term (detail code below), if lmFit() gives correct results, the result from contrasts.fit() with only one term is also correct, but issues happen when there more than one term.

  1. design matrix - model.matrix(~1+Treatment*Timepoint, data=info2) Secukinumab as reference lmFit() output enter image description here

similar to previous model, blue circle info are the same as blue circle in contrasts.fit() output

The codes below show these example outputs. I did some further investigation and found that when the samples were independent, similar issues happened when weight is included. Without using weight, all the output of contrasts.fit() are identical among different model or different reference levels.

Please note that the difference of the p-values from contrasts.fit() are small - as you can see from the examples.

Happy to share the csv data sets used. Thank you very much.

Best, Zhenyu

library(data.table)
library(plyr)
library(tidyverse)
library(limma)
library(edgeR)


dir_in1 <- "C:/Users/wangz260/OneDrive - Pfizer/Downloads"

info0 <- fread(file.path(dir_in1, "GSE137430_Sample_Annotation.txt"))
info1 <- info0 %>% filter(tissuetype=="LS") %>%
  mutate(grp = paste(Treatment, Timepoint, sep = "_"))

#Count table was generated by GSE137430 fastq files.
ct0 <- fread(file.path(dir_in1, "GSE137430_Gene-count-table.txt")) 
ct1 <- ct0 %>% filter(gene_type=="protein_coding") 
ct2 <- ct1 %>% select(-gene, -gene_name, -gene_type, -chr, -start, -end, -strand, -length)
rownames(ct2) = ct1$gene
ct3 <- as.matrix(ct2 %>% select(info1$Sample_ID))

#sanity check
colnames(ct3)==info1$Sample_ID

rownames(ct3) <- rownames(ct2)
dlist1 <- DGEList(counts=ct3)
#filter out low read genes
keep <- filterByExpr(dlist1, group = info1$grp, min.count=20, min.total.count=100) 
sum(keep)
dlist2 <- dlist1[keep, , keep.lib.sizes=TRUE]
tmm1 <- calcNormFactors(dlist2,method="TMM")

desmat1 <- model.matrix(~0+grp, data = info1)
vres1 <- voom(tmm1, desmat1, plot=TRUE)
vcorfit1 <- duplicateCorrelation(vres1,desmat1,block=info1$Patient_ID)
#skip voomWithQualityWeights() 

vfit1 <- lmFit(vres1, desmat1, block=info1$Patient_ID, correlation=vcorfit1$consensus, plot = TRUE)
vcont1 <- makeContrasts(
  chg_Secu_W4 = grpSecukinumab_W4 - grpSecukinumab_BL,
  chg_Secu_W16 = grpSecukinumab_W16 - grpSecukinumab_BL,
  chg_Pbo_W4 = grpPlacebo_W4 - grpPlacebo_BL,
  chg_Pbo_W16 = grpPlacebo_W16 - grpPlacebo_BL,
  levels = desmat1)

vfit1cont1 <- contrasts.fit(vfit1, vcont1)

# get the lmFit() model output
modelcoef1 <- vfit1$coef
modelcoef2 <- modelcoef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelcoef")
modelSE1 <- vfit1$stdev.unscaled * vfit1$sigma
modelSE2 <- modelSE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelSE")

modelt1 <- vfit1$coef / vfit1$stdev.unscaled / vfit1$sigma
modelt2 <- modelt1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelt")
modelp1 <- 2 * pt(-abs(modelt1), df = vfit1$df.residual)
modelp2 <- modelp1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelp")
modelcoef3 <- modelcoef2 %>% left_join(modelSE2) %>%left_join(modelp2) 

# get contrasts.fit() output before eBays()
SE1 <- vfit1cont1$stdev.unscaled * vfit1cont1$sigma
SE2 <- SE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origSE")

t1 <- vfit1cont1$coef / vfit1cont1$stdev.unscaled / vfit1cont1$sigma
t2 <- t1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origt")
p1 <- 2 * pt(-abs(t1), df = vfit1cont1$df.residual)
p2 <- p1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origp")

coef1 <- vfit1cont1$coef
coef2 <- coef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "coef")
coef3 <- coef2 %>% full_join(SE2) %>% left_join(t2) %>% left_join(p2) 

#save the info 
inter_0_model <- modelcoef3
inter_0_contrast <- coef3




#####
#in order to show the issue, a different design matrix is needed.
#For the current info1, the reference level of treatment is Placebo
#####
desmat2 <- model.matrix(~1+Treatment*Timepoint, data=info1)
colnames(desmat2) <- gsub(":", "_", colnames(desmat2))
colnames(desmat2) <- gsub("\\(|\\)", "", colnames(desmat2))


vfit2 <- lmFit(vres1, desmat2, block=info1$Patient_ID, correlation=vcorfit1$consensus, plot = TRUE)
vcont2 <- makeContrasts(
  chg_Secu_W4 = TreatmentSecukinumab_TimepointW4 + TimepointW4,
  chg_Secu_W16 = TreatmentSecukinumab_TimepointW16 + TimepointW16,
  chg_Pbo_W4 = TimepointW4,
  chg_Pbo_W16 = TimepointW16,
  levels = desmat2)

vfit2cont1 <- contrasts.fit(vfit2, vcont2)


# get the lmFit() model output
modelcoef1 <- vfit2$coef
modelcoef2 <- modelcoef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelcoef")
modelSE1 <- vfit2$stdev.unscaled * vfit2$sigma
modelSE2 <- modelSE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelSE")

modelt1 <- vfit2$coef / vfit2$stdev.unscaled / vfit2$sigma
modelt2 <- modelt1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelt")
modelp1 <- 2 * pt(-abs(modelt1), df = vfit2$df.residual)
modelp2 <- modelp1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelp")
modelcoef3 <- modelcoef2 %>% left_join(modelSE2) %>%left_join(modelp2) 

# get contrasts.fit() output before eBays()
SE1 <- vfit2cont1$stdev.unscaled * vfit2cont1$sigma
SE2 <- SE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origSE")

t1 <- vfit2cont1$coef / vfit2cont1$stdev.unscaled / vfit2cont1$sigma
t2 <- t1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origt")
p1 <- 2 * pt(-abs(t1), df = vfit2cont1$df.residual)
p2 <- p1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origp")

coef1 <- vfit2cont1$coef
coef2 <- coef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "coef")
coef3 <- coef2 %>% full_join(SE2) %>% left_join(t2) %>% left_join(p2) 

#save the info 
ref_pbo_model <- modelcoef3
ref_pbo_contrast <- coef3

#####
#Change the reference level of treatment to Secukinumab
#####
info2 <- info1 %>% mutate(Treatment=factor(Treatment, levels = c("Secukinumab", "Placebo")))
desmat3 <- model.matrix(~1+Treatment*Timepoint, data=info2)
colnames(desmat3) <- gsub(":", "_", colnames(desmat3))
colnames(desmat3) <- gsub("\\(|\\)", "", colnames(desmat3))


vfit3 <- lmFit(vres1, desmat3, block=info2$Patient_ID, correlation=vcorfit1$consensus, plot = TRUE)
vcont3 <- makeContrasts(
  chg_Secu_W4 = TimepointW4, 
  chg_Secu_W16 = TimepointW16,
  chg_Pbo_W4 = TreatmentPlacebo_TimepointW4 + TimepointW4,
  chg_Pbo_W16 = TreatmentPlacebo_TimepointW16 + TimepointW16,
  levels = desmat3)

vfit3cont1 <- contrasts.fit(vfit3, vcont3)

# get the lmFit() model output
modelcoef1 <- vfit3$coef
modelcoef2 <- modelcoef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelcoef")
modelSE1 <- vfit3$stdev.unscaled * vfit3$sigma
modelSE2 <- modelSE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelSE")

modelt1 <- vfit3$coef / vfit3$stdev.unscaled / vfit3$sigma
modelt2 <- modelt1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelt")
modelp1 <- 2 * pt(-abs(modelt1), df = vfit3$df.residual)
modelp2 <- modelp1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "modelp")
modelcoef3 <- modelcoef2 %>% left_join(modelSE2) %>%left_join(modelp2) 

# get contrasts.fit() output before eBays()
SE1 <- vfit3cont1$stdev.unscaled * vfit3cont1$sigma
SE2 <- SE1 %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origSE")

t1 <- vfit3cont1$coef / vfit3cont1$stdev.unscaled / vfit3cont1$sigma
t2 <- t1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origt")
p1 <- 2 * pt(-abs(t1), df = vfit3cont1$df.residual)
p2 <- p1  %>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "origp")

coef1 <- vfit3cont1$coef
coef2 <- coef1%>% data.frame %>% mutate(gene_id = rownames(.)) %>%
  pivot_longer(., -gene_id, names_to = "term", values_to = "coef")
coef3 <- coef2 %>% full_join(SE2) %>% left_join(t2) %>% left_join(p2) 

#save the info 
ref_secu_model <- modelcoef3
ref_secu_contrast <- coef3


#####
#Compare different model output, use gene ID ENSG00000000003.14 as example
#####

inter_0_contrast %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame
ref_pbo_contrast %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame
ref_secu_contrast %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame

inter_0_model %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame
ref_pbo_model %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame
ref_secu_model %>% filter(gene_id=="ENSG00000000003.14") %>% data.frame


sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.32.1      limma_3.52.2      forcats_0.5.0     stringr_1.4.0     dplyr_1.0.2       purrr_0.3.4       readr_1.3.1      
 [8] tidyr_1.1.2       tibble_3.0.3      ggplot2_3.3.2     tidyverse_1.3.0   plyr_1.8.6        data.table_1.14.0

loaded via a namespace (and not attached):
 [1] statmod_1.4.34      tinytex_0.26        locfit_1.5-9.4      tidyselect_1.1.0    xfun_0.17           splines_4.0.2      
 [7] haven_2.3.1         lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.8         generics_0.0.2      utf8_1.1.4         
[13] blob_1.2.1          rlang_0.4.11        pillar_1.4.6        glue_1.4.2          withr_2.4.3         DBI_1.1.0          
[19] bit64_4.0.5         BiocGenerics_0.36.0 dbplyr_1.4.4        modelr_0.1.8        readxl_1.3.1        lifecycle_1.0.1    
[25] munsell_0.5.0       gtable_0.3.0        cellranger_1.1.0    rvest_0.3.6         Biobase_2.50.0      parallel_4.0.2     
[31] fansi_0.4.1         broom_0.7.4         Rcpp_1.0.5          backports_1.1.10    scales_1.1.1        jsonlite_1.7.1     
[37] bit_4.0.4           fs_1.5.0            hms_0.5.3           stringi_1.5.3       grid_4.0.2          cli_3.1.0          
[43] tools_4.0.2         magrittr_1.5        crayon_1.3.4        pkgconfig_2.0.3     ellipsis_0.3.1      Matrix_1.2-18      
[49] xml2_1.3.2          NOISeq_2.34.0       reprex_0.3.0        lubridate_1.7.9     assertthat_0.2.1    httr_1.4.2         
[55] rstudioapi_0.13     R6_2.4.1            compiler_4.0.2
bugs limma • 1.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

This behavior is explained in the limma documentation: see the "Note" section of help("contrasts.fit"). It has been discussed several times on this help forum.

If you use the group-means parametrization of the design matrix model.matrix(~0+group), and lmFit doesn't define blocks, then the issue does not arise.

Later

I've posted a new answer below to give a more detailed guide for when contrasts.fit will be exact and when it is not.

ADD COMMENT
0
Entering edit mode

Thank you for the response, Gordon. I have a followup question: If model.matrix(~0+grp, data = info1) gives exact standard error from contrasts.fit(), should we expect the contrasts.fit() results (first screenshot) are the same as lmFit() outputs in green circle and blue circles (last two screenshot from the OP)? I think lmFit() gives exact standard error. Thank you.

ADD REPLY
0
Entering edit mode

The results from lmFit() are always exact regardless of design matrix or weights.

Sorry I cannot comment on your green and blue circles. The output you show is not standard limma output and I don't have time to go through your complex code to figure out what the screenshots represent.

ADD REPLY
0
Entering edit mode

Hi Gordon, Sorry for the complex code. I wanted to output the results of contrasts.fit() before eBayes() so that we can compare lmFit() output with contrasts.fit()'s. The code was to get the coefficient, SE, test statistics and p-value from the contrasts.fit() (before eBays()). If there's a easier way to get those info, please let me know and I'll rewrite my code. Thank you very much.

P.S. I was able to replicate the results (yellow and blue circles) using nlme::gls() and lsmeans() and contrast() from emmean package. with the consensus correlation and weights from voom().

ADD REPLY
0
Entering edit mode

There is no reason for doing what you are doing for any real data analysis so it is outside the scope of what I provide advice on. If you want to check whether the contrasts.fit output is exact, you only need to look at fit$stdev.unscaled.

lmFit has always been exact so I'm not sure why you are trying to replicate it using other packages.

ADD REPLY
0
Entering edit mode

Thank you, Gordon. I used fit$stdev.unscaled * fit$sigma to check the contrasts.fit SE output, and that's when I found the SE discrepancies (contrasts.fit() from model.matrix(~0+grp) vs lmFit() output from model.matrix(~1+Treatment*Timepoint)). And that's why I used other packages to see which results I can replicate. If you have more free time, it'll be great if you can take another look at my OP. Thanks again for providing the information.

ADD REPLY
0
Entering edit mode

I have no free time, but I still do my best to help people if I can.

I have already answered your questions, and the answers are in the limma documentation anyway. If you use ~0+grp (and there are no blocks shared between the groups) then contrasts.fit must agree exactly with lmFit because both are exact. Otherwise they won't be. What else do you need to know? If you want more help then please:

  1. Start a new question.
  2. Use limma in the way that it is intended to be used rather than trying to bypass the empirical Bayes or producing your own summary tables.
  3. Provide code and corresponding output rather than screen shots. At the moment your screen shots are disembodied from the code and it is very hard to figure out which code lines correspond to each screen shot.
  4. You would make it easier for people if you used basic R instead of so much tidyverse, which is not standard Bioconductor style.
ADD REPLY
0
Entering edit mode

Hi Gordon, thank you again for taking the time to reply my post. I'll start a new post with your suggestions. I found that contrasts.fit() results from model.matrix(~0+grp) is not exact and they are different from lmFit() output, and I tried to demonstrate that. In order to show you the comparison, I have to show the result before the eBayes().

ADD REPLY
0
Entering edit mode

OK, I notice that you are using duplicateCorrelation and blocks. The text of your question didn't mention using blocks but I see the use of duplicateCorrelation in the code.

The reason why contrasts.fit doesn't agree exactly with lmFit in your case, even with the group-means parametrization, will presumably be because there are blocks shared between the groups. (I can't confirm because you haven't shown your data but it must be so.) That will cause the group means being compared to be not independent and hence to contrasts.fit not being exact.

I have now added a new answer below to show you in more detail when contrasts.fit will be exact and when it will not.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

As I said in my previous answer, the fact that contrasts.fit() sometimes gives approximate results with precision weights is explained in the documentation. This behavior is due to the way that fitted models are stored in limma and it is not a bug.

I will explain more here how to know when contrasts.fit() is exact and when it is not.

The contrasts.fit help page says "the results from contrasts.fit are always exact if the coefficients being compared are statistically independent". You can check whether coefficients are independent by using fit$cov.coefficients.

Group-means parametrization without blocking

The main case when the coefficients are independent is the oneway layout without blocking when the group-means parametrization (~0+Group)) is used. Here is a simple reproducible example to show that contrast.fit agrees exacly with lmFit for the same contrasts.

First we fit a linear model with 4 groups and precision weights.

> library(limma)
> set.seed(20220914)
> 
> ngenes <- 1000
> nsamples <- 20
> Group <- gl(4,5,20)
> y <- matrix(rnorm(ngenes*nsamples),ngenes,nsamples)
> w <- matrix(runif(ngenes*nsamples),ngenes,nsamples)
> design1 <- model.matrix(~Group)
> fit1 <- lmFit(y,design1,weights=w)

The coefficients in fit1 use the standard parametrization with Group1 as the reference. Now estimate the same coefficients using a group-means fit and contrasts:

> design2 <- model.matrix(~0+Group)
> fit2 <- lmFit(y,design2,weights=w)
> cont.matrix <- cbind(
+   Group2=c(-1,1,0,0),
+   Group3=c(-1,0,1,0),
+   Group4=c(-1,0,0,1))
> fit2c <- contrasts.fit(fit2,cont.matrix)

We can check that the fit1 and fit2c results agree by examing the coefficients and unscaled standard deviations:

> all.equal(fit2c$stdev.unscaled, fit1$stdev.unscaled[,-1])
[1] TRUE
> all.equal(fit2c$coefficients, fit1$coefficients[,-1])
[1] TRUE

The reason why contrasts.fit and lmFit agree exactly in this case is that the fit2 coefficients are independent:

> fit2$cov.coefficients
       Group1 Group2 Group3 Group4
Group1    0.2    0.0    0.0    0.0
Group2    0.0    0.2    0.0    0.0
Group3    0.0    0.0    0.2    0.0
Group4    0.0    0.0    0.0    0.2

Group-means parametrization with blocking

Now let's rerun the same model fit with correlated blocks:

> Block <- gl(7,3,20)
> fit1 <- lmFit(y,design1,weights=w,block=Block,correlation=0.1)
> design2 <- model.matrix(~0+Group)
> fit2 <- lmFit(y,design2,weights=w,block=Block,correlation=0.1)
> fit2c <- contrasts.fit(fit2,cont.matrix)

In this case, the Group4 coefficient is independent of the others but the Group1, Group2 and Group3 coefficients are correlated:

> fit2$cov.coefficients
         Group1  Group2   Group3 Group4
Group1 0.230188 0.00941 0.000401  0.000
Group2 0.009412 0.22118 0.009412  0.000
Group3 0.000401 0.00941 0.230188  0.000
Group4 0.000000 0.00000 0.000000  0.232

Hence the Group4-Group1 contrast will be exact but the others will be approximate:

> all.equal(fit2c$stdev.unscaled[,1], fit1$stdev.unscaled[,2])
[1] "Mean relative difference: 0.00635"
> all.equal(fit2c$stdev.unscaled[,2], fit1$stdev.unscaled[,3])
[1] "Mean relative difference: 0.000346"
> all.equal(fit2c$stdev.unscaled[,3], fit1$stdev.unscaled[,4])
[1] TRUE
ADD COMMENT
0
Entering edit mode

Thank you very much for your explanation. As you noticed, I obviously missed the note of the help page of contrasts.fit(). I also checked the independent case before, and found the same issue, but now I know it's due to I had a continuous variable in the model.

I changed your simulation a little bit.


> set.seed(20220914)
> 
> ngenes <- 1000
> nsamples <- 20
> Group <- gl(4,5,20)
> y <- matrix(rnorm(ngenes*nsamples),ngenes,nsamples)
> w <- matrix(runif(ngenes*nsamples),ngenes,nsamples)
> contvar <- rnorm(nsamples)
> 
> design1 <- model.matrix(~Group+contvar)
> fit1 <- lmFit(y,design1,weights=w)
> 
> design2 <- model.matrix(~0+Group+contvar)
> fit2 <- lmFit(y,design2,weights=w)
> cont.matrix <- cbind(
+   Group2=c(-1,1,0,0,0),
+   Group3=c(-1,0,1,0,0),
+   Group4=c(-1,0,0,1,0))
> fit2c <- contrasts.fit(fit2,cont.matrix)
> 
> all.equal(fit2c$stdev.unscaled, fit1$stdev.unscaled[,2:4])
[1] "Mean relative difference: 0.02338285"
> all.equal(fit2c$coefficients, fit1$coefficients[,2:4])
[1] TRUE

And here is the correlation matrix:

> fit2$cov.coefficients
              Group1       Group2       Group3       Group4     contvar
Group1   0.203845904 -0.007157168  0.006945059  0.006091855 -0.01308200
Group2  -0.007157168  0.213319381 -0.012924648 -0.011336849  0.02434540
Group3   0.006945059 -0.012924648  0.212541614  0.011000871 -0.02362390
Group4   0.006091855 -0.011336849  0.011000871  0.209649408 -0.02072169
contvar -0.013081998  0.024345399 -0.023623901 -0.020721693  0.04449895
ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6