Fitting linear model with different design matrix for each gene in limma
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi, I have expression data from a large number of subjects with replicates. I am interested in fitting a linear model to understand the effect of a simple factor with two levels on the expression of each gene. I realize that if my design is as follows: Subject Factor 1 REF 1 REF 2 REF 2 REF 3 REF 3 REF 4 PLUS 4 PLUS 5 PLUS 5 PLUS ... I can fit a linear model with lmFit using design <- model.matrix(~targets$Factor) corfit <- duplicateCorrelation(data,design,block=targets$Subject) lmFit (data, design, weights=weights, block=targets$Subject, correlation=corfit$consensus) However, in my case the design is different for each gene. In other words, the subjects that belong to the "REF" or "PLUS" level of the factor changes for each gene. I am wondering if there is any possibility to include the changing design for each gene. I tried applying lmFit to each gene separately using the correct design with an appropriate apply function. However, I found Dr. Smyth's answer to an earlier question (https://stat.ethz.ch/pipermail/bioconductor/2010-August/034794.html) suggesting that this is not a good alternative. Thanks for any advice -- output of sessionInfo(): R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] nlme_3.1-103 edgeR_2.4.6 limma_3.10.3 loaded via a namespace (and not attached): [1] grid_2.14.1 lattice_0.20-6 sva_3.0.3 tools_2.14.1 -- Sent via the guest posting facility at bioconductor.org.
• 1.4k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 months ago
Scripps Research, La Jolla, CA
I'm pretty sure the post that you link to gives you your answer: if you want to fit a different model for each gene, then you should simply call "lm" on each gene individually instead of lmFit. You won't be able to get the empirical Bayes squeezing that limma performs, of course. However, I can't think of an experimental design where doing this makes logical sense. Are you sure you don't need to make a single design matrix with one coefficient for each "REF vs PLUS" partitioning, and then fit this single design for all genes using lmFit? On Wed Dec 18 11:55:57 2013, Elif Sarinay [guest] wrote: > > Hi, > > I have expression data from a large number of subjects with replicates. I am interested in fitting a linear model to understand the effect of a simple factor with two levels on the expression of each gene. I realize that if my design is as follows: > Subject Factor > 1 REF > 1 REF > 2 REF > 2 REF > 3 REF > 3 REF > 4 PLUS > 4 PLUS > 5 PLUS > 5 PLUS > ... > > I can fit a linear model with lmFit using > design <- model.matrix(~targets$Factor) > corfit <- duplicateCorrelation(data,design,block=targets$Subject) > lmFit (data, design, weights=weights, block=targets$Subject, correlation=corfit$consensus) > > However, in my case the design is different for each gene. In other words, the subjects that belong to the "REF" or "PLUS" level of the factor changes for each gene. I am wondering if there is any possibility to include the changing design for each gene. > > I tried applying lmFit to each gene separately using the correct design with an appropriate apply function. However, I found Dr. Smyth's answer to an earlier question (https://stat.ethz.ch/pipermail/bioconductor/2010-August/034794.html) suggesting that this is not a good alternative. > > > Thanks for any advice > > -- output of sessionInfo(): > > R version 2.14.1 (2011-12-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] nlme_3.1-103 edgeR_2.4.6 limma_3.10.3 > > loaded via a namespace (and not attached): > [1] grid_2.14.1 lattice_0.20-6 sva_3.0.3 tools_2.14.1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thanks a lot for the quick answer. However, lm cannot incorporate the fact that there are multiple measurements of the same subject as well as measurements from different subjects for a given factor level (REF or PLUS). I am leaning towards using a linear mixed effects model incorporating the factor of interest as a fixed effect and subject as a random effect. I think I can use lme from nlme package for this. Would you have any recommendations about which of the two models to use: fit <- lm(data$expr ~ data$Factor + data$Subject) vs fit2 <- lme(expr ~ Factor, data, random = ~ 1 | subject) Thanks again! ----- Original Message ----- From: "Ryan" <rct@thompsonclan.org> To: "Elif Sarinay [guest]" <guest at="" bioconductor.org=""> Cc: bioconductor at r-project.org, ccenik at stanford.edu Sent: Wednesday, December 18, 2013 4:06:09 PM Subject: Re: [BioC] Fitting linear model with different design matrix for each gene in limma I'm pretty sure the post that you link to gives you your answer: if you want to fit a different model for each gene, then you should simply call "lm" on each gene individually instead of lmFit. You won't be able to get the empirical Bayes squeezing that limma performs, of course. However, I can't think of an experimental design where doing this makes logical sense. Are you sure you don't need to make a single design matrix with one coefficient for each "REF vs PLUS" partitioning, and then fit this single design for all genes using lmFit? On Wed Dec 18 11:55:57 2013, Elif Sarinay [guest] wrote: > > Hi, > > I have expression data from a large number of subjects with replicates. I am interested in fitting a linear model to understand the effect of a simple factor with two levels on the expression of each gene. I realize that if my design is as follows: > Subject Factor > 1 REF > 1 REF > 2 REF > 2 REF > 3 REF > 3 REF > 4 PLUS > 4 PLUS > 5 PLUS > 5 PLUS > ... > > I can fit a linear model with lmFit using > design <- model.matrix(~targets$Factor) > corfit <- duplicateCorrelation(data,design,block=targets$Subject) > lmFit (data, design, weights=weights, block=targets$Subject, correlation=corfit$consensus) > > However, in my case the design is different for each gene. In other words, the subjects that belong to the "REF" or "PLUS" level of the factor changes for each gene. I am wondering if there is any possibility to include the changing design for each gene. > > I tried applying lmFit to each gene separately using the correct design with an appropriate apply function. However, I found Dr. Smyth's answer to an earlier question (https://stat.ethz.ch/pipermail/bioconductor/2010-August/034794.html) suggesting that this is not a good alternative. > > > Thanks for any advice > > -- output of sessionInfo(): > > R version 2.14.1 (2011-12-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] nlme_3.1-103 edgeR_2.4.6 limma_3.10.3 > > loaded via a namespace (and not attached): > [1] grid_2.14.1 lattice_0.20-6 sva_3.0.3 tools_2.14.1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
If you think that limma's duplicateCorrelation is the correct method to use except for the requirement to use multiple genes, then I would recommend you read the source of that function and incorporate its approach into your gene-by-gene pipeline. It looks like it uses mixedModel2Fit from the statmod package. On Wed Dec 18 17:30:34 2013, Can Cenik wrote: > Thanks a lot for the quick answer. > > However, lm cannot incorporate the fact that there are multiple measurements of the same subject as well as measurements from different subjects for a given factor level (REF or PLUS). I am leaning towards using a linear mixed effects model incorporating the factor of interest as a fixed effect and subject as a random effect. I think I can use lme from nlme package for this. Would you have any recommendations about which of the two models to use: > > fit <- lm(data$expr ~ data$Factor + data$Subject) > vs > fit2 <- lme(expr ~ Factor, data, random = ~ 1 | subject) > > Thanks again! > > ----- Original Message ----- > From: "Ryan" <rct at="" thompsonclan.org=""> > To: "Elif Sarinay [guest]" <guest at="" bioconductor.org=""> > Cc: bioconductor at r-project.org, ccenik at stanford.edu > Sent: Wednesday, December 18, 2013 4:06:09 PM > Subject: Re: [BioC] Fitting linear model with different design matrix for each gene in limma > > I'm pretty sure the post that you link to gives you your answer: if you > want to fit a different model for each gene, then you should simply > call "lm" on each gene individually instead of lmFit. You won't be able > to get the empirical Bayes squeezing that limma performs, of course. > However, I can't think of an experimental design where doing this makes > logical sense. Are you sure you don't need to make a single design > matrix with one coefficient for each "REF vs PLUS" partitioning, and > then fit this single design for all genes using lmFit? > > On Wed Dec 18 11:55:57 2013, Elif Sarinay [guest] wrote: >> >> Hi, >> >> I have expression data from a large number of subjects with replicates. I am interested in fitting a linear model to understand the effect of a simple factor with two levels on the expression of each gene. I realize that if my design is as follows: >> Subject Factor >> 1 REF >> 1 REF >> 2 REF >> 2 REF >> 3 REF >> 3 REF >> 4 PLUS >> 4 PLUS >> 5 PLUS >> 5 PLUS >> ... >> >> I can fit a linear model with lmFit using >> design <- model.matrix(~targets$Factor) >> corfit <- duplicateCorrelation(data,design,block=targets$Subject) >> lmFit (data, design, weights=weights, block=targets$Subject, correlation=corfit$consensus) >> >> However, in my case the design is different for each gene. In other words, the subjects that belong to the "REF" or "PLUS" level of the factor changes for each gene. I am wondering if there is any possibility to include the changing design for each gene. >> >> I tried applying lmFit to each gene separately using the correct design with an appropriate apply function. However, I found Dr. Smyth's answer to an earlier question (https://stat.ethz.ch/pipermail/bioconductor/2010-August/034794.html) suggesting that this is not a good alternative. >> >> >> Thanks for any advice >> >> -- output of sessionInfo(): >> >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] nlme_3.1-103 edgeR_2.4.6 limma_3.10.3 >> >> loaded via a namespace (and not attached): >> [1] grid_2.14.1 lattice_0.20-6 sva_3.0.3 tools_2.14.1 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6