edgeR GLM using factor that varies for each gene
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi, after going over the user guide and searching this mailing list I'm not quite clear on how to best address my specific situation: I'd like to test differential "expression" of specific splicing events between a mutant and the wild type in a replicated design. To do so, I've specifically counted reads that are specific to a certain splicing event for each gene. e.g. event AS.type mutant.line1.rep1 mutant.line1.rep2 mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2 S102-F_10.883 alt_donor 4 7 4 7 0 1 S102-F_12.884 alt_donor 0 1 0 1 0 2 S102-F_10.887 alt_donor 0 0 0 0 30 33 S102-F_10.886 alt_acceptor 0 0 0 0 22 21 S102-F_11.890 alt_donor 0 0 0 0 0 0 S102-F_11.889 alt_acceptor 0 0 0 0 0 0 S102-F_10.891 alt_acceptor 0 0 0 0 0 0 S103-R_3.901 alt_acceptor 4 5 4 5 10 11 S103-R_2.904 skipped_exon 2 4 2 4 33 28 S103-R_2.902 alt_acceptor 4 5 4 5 0 0 S103-R_1.906 alt_acceptor 0 1 0 1 1 0 It's not clear from this example, but overall there is a difference between abundances and noise levels of specific types of alternative splicing I'd like to correct for, but also assess using GLM. Thus, ideally I'd like to find differentially abundant splicing events between the mutant and the wild type irrespective of line and biological replicate. As far as I understood the UserGuide and the ReferenceManual design always refers to factors for describing the libraries/experiments the counts are derived from. If I'd be using "normal" GLM, what I want to do would look like glm(count ~ AS.type + genotype + line + biological.replicate). Can I accomplish this with edgeR without splitting up the events into different data sets per splice type? Any advise on this would be greatly appreciated. Best, Daniel -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
edgeR edgeR • 843 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia
Dear Daniel, I don't see any need for a gene-specific factor. Simply give all the count rows (for all genes and all splicing events) to edgeR. The design matrix is: genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt")) genotype <- relevel(genotype,ref="wt") design <- model.matrix(~genotype) If you want to find differentially abundant events between the mutants and wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to examine mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines. Best wishes Gordon > Date: Thu, 8 May 2014 00:33:05 -0700 (PDT) > From: "Daniel Lang [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, daniel.lang at biologie.uni- freiburg.de > Subject: [BioC] edgeR GLM using factor that varies for each gene > > Hi, > > after going over the user guide and searching this mailing list I'm not > quite clear on how to best address my specific situation: > > I'd like to test differential "expression" of specific splicing events > between a mutant and the wild type in a replicated design. To do so, > I've specifically counted reads that are specific to a certain splicing > event for each gene. > > e.g. > event AS.type mutant.line1.rep1 mutant.line1.rep2 mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2 > S102-F_10.883 alt_donor 4 7 4 7 0 1 > S102-F_12.884 alt_donor 0 1 0 1 0 2 > S102-F_10.887 alt_donor 0 0 0 0 30 33 > S102-F_10.886 alt_acceptor 0 0 0 0 22 21 > S102-F_11.890 alt_donor 0 0 0 0 0 0 > S102-F_11.889 alt_acceptor 0 0 0 0 0 0 > S102-F_10.891 alt_acceptor 0 0 0 0 0 0 > S103-R_3.901 alt_acceptor 4 5 4 5 10 11 > S103-R_2.904 skipped_exon 2 4 2 4 33 28 > S103-R_2.902 alt_acceptor 4 5 4 5 0 0 > S103-R_1.906 alt_acceptor 0 1 0 1 1 0 > > It's not clear from this example, but overall there is a difference > between abundances and noise levels of specific types of alternative > splicing I'd like to correct for, but also assess using GLM. Thus, > ideally I'd like to find differentially abundant splicing events between > the mutant and the wild type irrespective of line and biological > replicate. > > As far as I understood the UserGuide and the ReferenceManual design > always refers to factors for describing the libraries/experiments the > counts are derived from. > > If I'd be using "normal" GLM, what I want to do would look like > glm(count ~ AS.type + genotype + line + biological.replicate). > > Can I accomplish this with edgeR without splitting up the events into > different data sets per splice type? > > Any advise on this would be greatly appreciated. > > Best, > Daniel > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, thank you so much for your prompt and helpful answer. You're right I was thinking too complicated:-) Best, Daniel On 09.05.2014 06:12, Gordon K Smyth wrote: > Dear Daniel, > > I don't see any need for a gene-specific factor. > > Simply give all the count rows (for all genes and all splicing events) > to edgeR. The design matrix is: > > genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt")) > genotype <- relevel(genotype,ref="wt") > design <- model.matrix(~genotype) > > If you want to find differentially abundant events between the mutants > and wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to > examine mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines. > > Best wishes > Gordon > > >> Date: Thu, 8 May 2014 00:33:05 -0700 (PDT) >> From: "Daniel Lang [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, daniel.lang at biologie.uni- freiburg.de >> Subject: [BioC] edgeR GLM using factor that varies for each gene >> >> Hi, >> >> after going over the user guide and searching this mailing list I'm >> not quite clear on how to best address my specific situation: >> >> I'd like to test differential "expression" of specific splicing events >> between a mutant and the wild type in a replicated design. To do so, >> I've specifically counted reads that are specific to a certain >> splicing event for each gene. >> >> e.g. >> event AS.type mutant.line1.rep1 mutant.line1.rep2 >> mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2 >> S102-F_10.883 alt_donor 4 7 4 7 0 1 >> S102-F_12.884 alt_donor 0 1 0 1 0 2 >> S102-F_10.887 alt_donor 0 0 0 0 30 33 >> S102-F_10.886 alt_acceptor 0 0 0 0 22 21 >> S102-F_11.890 alt_donor 0 0 0 0 0 0 >> S102-F_11.889 alt_acceptor 0 0 0 0 0 0 >> S102-F_10.891 alt_acceptor 0 0 0 0 0 0 >> S103-R_3.901 alt_acceptor 4 5 4 5 10 11 >> S103-R_2.904 skipped_exon 2 4 2 4 33 28 >> S103-R_2.902 alt_acceptor 4 5 4 5 0 0 >> S103-R_1.906 alt_acceptor 0 1 0 1 1 0 >> >> It's not clear from this example, but overall there is a difference >> between abundances and noise levels of specific types of alternative >> splicing I'd like to correct for, but also assess using GLM. Thus, >> ideally I'd like to find differentially abundant splicing events >> between the mutant and the wild type irrespective of line and >> biological replicate. >> >> As far as I understood the UserGuide and the ReferenceManual design >> always refers to factors for describing the libraries/experiments the >> counts are derived from. >> >> If I'd be using "normal" GLM, what I want to do would look like >> glm(count ~ AS.type + genotype + line + biological.replicate). >> >> Can I accomplish this with edgeR without splitting up the events into >> different data sets per splice type? >> >> Any advise on this would be greatly appreciated. >> >> Best, >> Daniel >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:23}}
ADD REPLY

Login before adding your answer.

Traffic: 700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6