Question

edgeR GLM using factor that varies for each gene

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi, after going over the user guide and searching this mailing list I'm not quite clear on how to best address my specific situation: I'd like to test differential "expression" of specific splicing events between a mutant and the wild type in a replicated design. To do so, I've specifically counted reads that are specific to a certain splicing event for each gene. e.g. event AS.type mutant.line1.rep1 mutant.line1.rep2 mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2 S102-F_10.883 alt_donor 4 7 4 7 0 1 S102-F_12.884 alt_donor 0 1 0 1 0 2 S102-F_10.887 alt_donor 0 0 0 0 30 33 S102-F_10.886 alt_acceptor 0 0 0 0 22 21 S102-F_11.890 alt_donor 0 0 0 0 0 0 S102-F_11.889 alt_acceptor 0 0 0 0 0 0 S102-F_10.891 alt_acceptor 0 0 0 0 0 0 S103-R_3.901 alt_acceptor 4 5 4 5 10 11 S103-R_2.904 skipped_exon 2 4 2 4 33 28 S103-R_2.902 alt_acceptor 4 5 4 5 0 0 S103-R_1.906 alt_acceptor 0 1 0 1 1 0 It's not clear from this example, but overall there is a difference between abundances and noise levels of specific types of alternative splicing I'd like to correct for, but also assess using GLM. Thus, ideally I'd like to find differentially abundant splicing events between the mutant and the wild type irrespective of line and biological replicate. As far as I understood the UserGuide and the ReferenceManual design always refers to factors for describing the libraries/experiments the counts are derived from. If I'd be using "normal" GLM, what I want to do would look like glm(count ~ AS.type + genotype + line + biological.replicate). Can I accomplish this with edgeR without splitting up the events into different data sets per splice type? Any advise on this would be greatly appreciated. Best, Daniel -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.

edgeR edgeR • 843 views

ADD COMMENT • link updated 10.0 years ago by Gordon Smyth 50k • written 10.0 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-05-09

Dear Daniel, I don't see any need for a gene-specific factor. Simply give all the count rows (for all genes and all splicing events) to edgeR. The design matrix is: genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt")) genotype <- relevel(genotype,ref="wt") design <- model.matrix(~genotype) If you want to find differentially abundant events between the mutants and wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to examine mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines. Best wishes Gordon > Date: Thu, 8 May 2014 00:33:05 -0700 (PDT) > From: "Daniel Lang [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, daniel.lang at biologie.uni- freiburg.de > Subject: [BioC] edgeR GLM using factor that varies for each gene > > Hi, > > after going over the user guide and searching this mailing list I'm not > quite clear on how to best address my specific situation: > > I'd like to test differential "expression" of specific splicing events > between a mutant and the wild type in a replicated design. To do so, > I've specifically counted reads that are specific to a certain splicing > event for each gene. > > e.g. > event AS.type mutant.line1.rep1 mutant.line1.rep2 mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2 > S102-F_10.883 alt_donor 4 7 4 7 0 1 > S102-F_12.884 alt_donor 0 1 0 1 0 2 > S102-F_10.887 alt_donor 0 0 0 0 30 33 > S102-F_10.886 alt_acceptor 0 0 0 0 22 21 > S102-F_11.890 alt_donor 0 0 0 0 0 0 > S102-F_11.889 alt_acceptor 0 0 0 0 0 0 > S102-F_10.891 alt_acceptor 0 0 0 0 0 0 > S103-R_3.901 alt_acceptor 4 5 4 5 10 11 > S103-R_2.904 skipped_exon 2 4 2 4 33 28 > S103-R_2.902 alt_acceptor 4 5 4 5 0 0 > S103-R_1.906 alt_acceptor 0 1 0 1 1 0 > > It's not clear from this example, but overall there is a difference > between abundances and noise levels of specific types of alternative > splicing I'd like to correct for, but also assess using GLM. Thus, > ideally I'd like to find differentially abundant splicing events between > the mutant and the wild type irrespective of line and biological > replicate. > > As far as I understood the UserGuide and the ReferenceManual design > always refers to factors for describing the libraries/experiments the > counts are derived from. > > If I'd be using "normal" GLM, what I want to do would look like > glm(count ~ AS.type + genotype + line + biological.replicate). > > Can I accomplish this with edgeR without splitting up the events into > different data sets per splice type? > > Any advise on this would be greatly appreciated. > > Best, > Daniel > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}