edgeR : input and design matrix help

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 3.6 years ago

Finland

Dear edgeR users, I am not a statistician nor R programming geek, please forgive me if I ask stupid question. I have RNA-seq data for 2 different genotype of trees with different time points 0hour(control),3hr,24hours,and 48hours. Each time point has two replicates. The experiment design like following: Sample harvested after treatment at Tree H1 Ctrl 3hrs 24hrs 48hrs Tree H2 Ctrl 3hrs 24hrs 48hrs Tree L1 Ctrl 3hrs 24hrs 48hrs Tree L2 Ctrl 3hrs 24hrs 48hrs I would like to study genes that are differentially expressed (DE) throughout the time points in these 2 genotype of trees with edgeR. I read from the edgeR user guide, the suitable DE analysis method for my expriment is GLM likelihood ratio test. After read case study in the user guide, I have the RNA-Seq counts in a file as below in order to input into the edgeR package. Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H L2_3H L1_1D L2_1D L1_2D L2_2D AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 I used the following commands to input the counts file into edgeR: rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, stringsAsFactor=FALSE) trees <- factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS"," LS","LS","LS","LS")) treat <- factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2"," 3h1","3h2","24h1","24h2","48h1","48h2")) I'm not good in R programming, thus, having this file input into the edgeR and assign the factors as well as design-matrix is a challenge. I'm stuck how to tell the edgeR about my design matrix. Could you guys kindly help me on this? Have I input my RNA-Seq counts correctly? Please correct me if there is any mistake I have done in my edgeR input. I appreciate very much for your help and time. Have a nice day. Best regards, KJ Lim [[alternative HTML version deleted]]

edgeR ASSIGN edgeR ASSIGN • 3.4k views

ADD COMMENT • link 11.9 years ago KJ Lim ▴ 420

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi KJ Lim, > trees <- > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) This one seems ok, although could be written more compactly and I'll give it a different name: geno <- factor( rep(c("HS","LS"),each=8)) > treat <- > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) Maybe it's better to call this 'time' and you'll need to change this to something like: time <- factor( rep(c("C","3h","24h","48h"), each=4) ) These two factor vectors match the columns of your count table, so you should be ok there. If I understand your description, you are interested primarily in the differences in genotypes. I would suggest starting with: design <- model.matrix(~time+geno) > design (Intercept) time3h time48h timeC genoLS 1 1 0 0 1 0 2 1 0 0 1 0 3 1 0 0 1 0 4 1 0 0 1 0 5 1 1 0 0 0 6 1 1 0 0 0 7 1 1 0 0 0 8 1 1 0 0 0 9 1 0 0 0 1 10 1 0 0 0 1 11 1 0 0 0 1 12 1 0 0 0 1 13 1 0 1 0 1 14 1 0 1 0 1 15 1 0 1 0 1 16 1 0 1 0 1 attr(,"assign") [1] 0 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$time [1] "contr.treatment" attr(,"contrasts")$geno [1] "contr.treatment" Then, you can follow the usual steps as per the edgeR user's guide -- estimate dispersion, fit the glm with glmFit() and do the LR testing with glmLRT() and so on. Hope that gets you started. Best, Mark On 18.05.2012, at 05:15, KJ Lim wrote: > Dear edgeR users, > > I am not a statistician nor R programming geek, please forgive me if I ask > stupid question. > > I have RNA-seq data for 2 different genotype of trees with different time > points 0hour(control),3hr,24hours,and 48hours. Each time point has two > replicates. The experiment design like following: > > Sample harvested after treatment at > Tree H1 Ctrl 3hrs 24hrs 48hrs > Tree H2 Ctrl 3hrs 24hrs 48hrs > > Tree L1 Ctrl 3hrs 24hrs 48hrs > Tree L2 Ctrl 3hrs 24hrs 48hrs > > I would like to study genes that are differentially expressed (DE) > throughout the time points in these 2 genotype of trees with edgeR. I read > from the edgeR user guide, the suitable DE analysis method for my expriment > is GLM likelihood ratio test. > > After read case study in the user guide, I have the RNA-Seq counts in a > file as below in order to input into the edgeR package. > > Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > L2_3H L1_1D L2_1D L1_2D L2_2D > AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > > I used the following commands to input the counts file into edgeR: > > rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > stringsAsFactor=FALSE) > trees <- > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > treat <- > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > I'm not good in R programming, thus, having this file input into the edgeR > and assign the factors as well as design-matrix is a challenge. I'm stuck > how to tell the edgeR about my design matrix. > > Could you guys kindly help me on this? Have I input my RNA-Seq counts > correctly? Please correct me if there is any mistake I have done in my > edgeR input. > > I appreciate very much for your help and time. Have a nice day. > > Best regards, > KJ Lim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin

ADD COMMENT • link 11.9 years ago Mark Robinson ▴ 880

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 3.6 years ago

Finland

Dear Prof Mark Robinson, Good day. > treat <- > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > Maybe it's better to call this 'time' and you'll need to change this to > something like: > time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > I assigned the time factor as suggested and it showed me as: > time [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h Levels: 24h 3h 96h C However, may I ask, does this time factor vector matches the read counts of time in the input file? The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h C C 3h 3h 24h 24h 96h 96h Should I make some adjustment in my input file in order to suit the time factor vector? Please forgive me, if this is a stupid question. If I understand your description, you are interested primarily in the > differences in genotypes. I would suggest starting with: > > design <- model.matrix(~time+geno) > I would like to study the differentially expressed genes in both 2 HS and LS genotypes of tress across time course experiment. If I would like to study the differentially expressed genes in HS genotype tress across time course, can I have the design matrix as below? design <- model.matrix(~time, data=rawdata) Thank you very much for your time and help. Best regards, KJ Lim On 18 May 2012 11:32, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > Hi KJ Lim, > > > trees <- > > > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > > This one seems ok, although could be written more compactly and I'll give > it a different name: > > geno <- factor( rep(c("HS","LS"),each=8)) > > > treat <- > > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > Maybe it's better to call this 'time' and you'll need to change this to > something like: > > time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > > These two factor vectors match the columns of your count table, so you > should be ok there. > > If I understand your description, you are interested primarily in the > differences in genotypes. I would suggest starting with: > > design <- model.matrix(~time+geno) > > > design > (Intercept) time3h time48h timeC genoLS > 1 1 0 0 1 0 > 2 1 0 0 1 0 > 3 1 0 0 1 0 > 4 1 0 0 1 0 > 5 1 1 0 0 0 > 6 1 1 0 0 0 > 7 1 1 0 0 0 > 8 1 1 0 0 0 > 9 1 0 0 0 1 > 10 1 0 0 0 1 > 11 1 0 0 0 1 > 12 1 0 0 0 1 > 13 1 0 1 0 1 > 14 1 0 1 0 1 > 15 1 0 1 0 1 > 16 1 0 1 0 1 > attr(,"assign") > [1] 0 1 1 1 2 > attr(,"contrasts") > attr(,"contrasts")$time > [1] "contr.treatment" > > attr(,"contrasts")$geno > [1] "contr.treatment" > > Then, you can follow the usual steps as per the edgeR user's guide -- > estimate dispersion, fit the glm with glmFit() and do the LR testing with > glmLRT() and so on. > > Hope that gets you started. > > Best, > Mark > > > > > On 18.05.2012, at 05:15, KJ Lim wrote: > > > Dear edgeR users, > > > > I am not a statistician nor R programming geek, please forgive me if I > ask > > stupid question. > > > > I have RNA-seq data for 2 different genotype of trees with different time > > points 0hour(control),3hr,24hours,and 48hours. Each time point has two > > replicates. The experiment design like following: > > > > Sample harvested after treatment at > > Tree H1 Ctrl 3hrs 24hrs 48hrs > > Tree H2 Ctrl 3hrs 24hrs 48hrs > > > > Tree L1 Ctrl 3hrs 24hrs 48hrs > > Tree L2 Ctrl 3hrs 24hrs 48hrs > > > > I would like to study genes that are differentially expressed (DE) > > throughout the time points in these 2 genotype of trees with edgeR. I > read > > from the edgeR user guide, the suitable DE analysis method for my > expriment > > is GLM likelihood ratio test. > > > > After read case study in the user guide, I have the RNA-Seq counts in a > > file as below in order to input into the edgeR package. > > > > Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > > L2_3H L1_1D L2_1D L1_2D L2_2D > > AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > > AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > > AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > > AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > > > > I used the following commands to input the counts file into edgeR: > > > > rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > > stringsAsFactor=FALSE) > > trees <- > > > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > > treat <- > > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > > > I'm not good in R programming, thus, having this file input into the > edgeR > > and assign the factors as well as design-matrix is a challenge. I'm stuck > > how to tell the edgeR about my design matrix. > > > > Could you guys kindly help me on this? Have I input my RNA-Seq counts > > correctly? Please correct me if there is any mistake I have done in my > > edgeR input. > > > > I appreciate very much for your help and time. Have a nice day. > > > > Best regards, > > KJ Lim > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson@imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin > > [[alternative HTML version deleted]]

ADD COMMENT • link 11.9 years ago KJ Lim ▴ 420

0

Entering edit mode

On 18.05.2012, at 12:11, KJ Lim wrote: > Dear Prof Mark Robinson, > > Good day. > >> treat <- > >> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >> >> Maybe it's better to call this 'time' and you'll need to change this to >> something like: >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) >> > > I assigned the time factor as suggested and it showed me as: > >> time > [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h > Levels: 24h 3h 96h C > > However, may I ask, does this time factor vector matches the read counts of > time in the input file? > The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h > C C 3h 3h 24h 24h 96h 96h > > Should I make some adjustment in my input file in order to suit the time > factor vector? Please forgive me, if this is a stupid question. Apologies, my mistake ? change that to: time <- factor( rep(rep(c("C","3h","24h","48h"),each=2),2) ) Best, Mark > If I understand your description, you are interested primarily in the >> differences in genotypes. I would suggest starting with: >> >> design <- model.matrix(~time+geno) >> > > I would like to study the differentially expressed genes in both 2 HS and > LS genotypes of tress across time course experiment. > > If I would like to study the differentially expressed genes in HS genotype > tress across time course, can I have the design matrix as below? > > design <- model.matrix(~time, data=rawdata) > > Thank you very much for your time and help. > > Best regards, > KJ Lim > > > > On 18 May 2012 11:32, Mark Robinson <mark.robinson at="" imls.uzh.ch=""> wrote: > >> Hi KJ Lim, >> >>> trees <- >>> >> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS ","LS","LS","LS","LS")) >> >> This one seems ok, although could be written more compactly and I'll give >> it a different name: >> >> geno <- factor( rep(c("HS","LS"),each=8)) >> >>> treat <- >>> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >> >> Maybe it's better to call this 'time' and you'll need to change this to >> something like: >> >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) >> >> These two factor vectors match the columns of your count table, so you >> should be ok there. >> >> If I understand your description, you are interested primarily in the >> differences in genotypes. I would suggest starting with: >> >> design <- model.matrix(~time+geno) >> >>> design >> (Intercept) time3h time48h timeC genoLS >> 1 1 0 0 1 0 >> 2 1 0 0 1 0 >> 3 1 0 0 1 0 >> 4 1 0 0 1 0 >> 5 1 1 0 0 0 >> 6 1 1 0 0 0 >> 7 1 1 0 0 0 >> 8 1 1 0 0 0 >> 9 1 0 0 0 1 >> 10 1 0 0 0 1 >> 11 1 0 0 0 1 >> 12 1 0 0 0 1 >> 13 1 0 1 0 1 >> 14 1 0 1 0 1 >> 15 1 0 1 0 1 >> 16 1 0 1 0 1 >> attr(,"assign") >> [1] 0 1 1 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$time >> [1] "contr.treatment" >> >> attr(,"contrasts")$geno >> [1] "contr.treatment" >> >> Then, you can follow the usual steps as per the edgeR user's guide -- >> estimate dispersion, fit the glm with glmFit() and do the LR testing with >> glmLRT() and so on. >> >> Hope that gets you started. >> >> Best, >> Mark >> >> >> >> >> On 18.05.2012, at 05:15, KJ Lim wrote: >> >>> Dear edgeR users, >>> >>> I am not a statistician nor R programming geek, please forgive me if I >> ask >>> stupid question. >>> >>> I have RNA-seq data for 2 different genotype of trees with different time >>> points 0hour(control),3hr,24hours,and 48hours. Each time point has two >>> replicates. The experiment design like following: >>> >>> Sample harvested after treatment at >>> Tree H1 Ctrl 3hrs 24hrs 48hrs >>> Tree H2 Ctrl 3hrs 24hrs 48hrs >>> >>> Tree L1 Ctrl 3hrs 24hrs 48hrs >>> Tree L2 Ctrl 3hrs 24hrs 48hrs >>> >>> I would like to study genes that are differentially expressed (DE) >>> throughout the time points in these 2 genotype of trees with edgeR. I >> read >>> from the edgeR user guide, the suitable DE analysis method for my >> expriment >>> is GLM likelihood ratio test. >>> >>> After read case study in the user guide, I have the RNA-Seq counts in a >>> file as below in order to input into the edgeR package. >>> >>> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H >>> L2_3H L1_1D L2_1D L1_2D L2_2D >>> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 >>> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 >>> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 >>> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 >>> >>> I used the following commands to input the counts file into edgeR: >>> >>> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, >>> stringsAsFactor=FALSE) >>> trees <- >>> >> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS ","LS","LS","LS","LS")) >>> treat <- >>> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >>> >>> I'm not good in R programming, thus, having this file input into the >> edgeR >>> and assign the factors as well as design-matrix is a challenge. I'm stuck >>> how to tell the edgeR about my design matrix. >>> >>> Could you guys kindly help me on this? Have I input my RNA-Seq counts >>> correctly? Please correct me if there is any mistake I have done in my >>> edgeR input. >>> >>> I appreciate very much for your help and time. Have a nice day. >>> >>> Best regards, >>> KJ Lim >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ---------- >> Prof. Dr. Mark Robinson >> Bioinformatics >> Institute of Molecular Life Sciences >> University of Zurich >> Winterthurerstrasse 190 >> 8057 Zurich >> Switzerland >> >> v: +41 44 635 4848 >> f: +41 44 635 6898 >> e: mark.robinson at imls.uzh.ch >> o: Y11-J-16 >> w: http://tiny.cc/mrobin >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.9 years ago Mark Robinson ▴ 880

0

Entering edit mode

Thanks Prof Mark Robinson. You guys are doing a great job to help out the community! Thanks. On 18 May 2012 23:31, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > > > On 18.05.2012, at 12:11, KJ Lim wrote: > > > Dear Prof Mark Robinson, > > > > Good day. > > > >> treat <- > > > >> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >> > >> Maybe it's better to call this 'time' and you'll need to change this to > >> something like: > >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > >> > > > > I assigned the time factor as suggested and it showed me as: > > > >> time > > [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h > > Levels: 24h 3h 96h C > > > > However, may I ask, does this time factor vector matches the read counts > of > > time in the input file? > > The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h > > C C 3h 3h 24h 24h 96h 96h > > > > Should I make some adjustment in my input file in order to suit the time > > factor vector? Please forgive me, if this is a stupid question. > > > Apologies, my mistake â¦ change that to: > > time <- factor( rep(rep(c("C","3h","24h","48h"),each=2),2) ) > > Best, > Mark > > > > > > > > If I understand your description, you are interested primarily in the > >> differences in genotypes. I would suggest starting with: > >> > >> design <- model.matrix(~time+geno) > >> > > > > I would like to study the differentially expressed genes in both 2 HS and > > LS genotypes of tress across time course experiment. > > > > If I would like to study the differentially expressed genes in HS > genotype > > tress across time course, can I have the design matrix as below? > > > > design <- model.matrix(~time, data=rawdata) > > > > Thank you very much for your time and help. > > > > Best regards, > > KJ Lim > > > > > > > > On 18 May 2012 11:32, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > > > >> Hi KJ Lim, > >> > >>> trees <- > >>> > >> > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > >> > >> This one seems ok, although could be written more compactly and I'll > give > >> it a different name: > >> > >> geno <- factor( rep(c("HS","LS"),each=8)) > >> > >>> treat <- > >>> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >> > >> Maybe it's better to call this 'time' and you'll need to change this to > >> something like: > >> > >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > >> > >> These two factor vectors match the columns of your count table, so you > >> should be ok there. > >> > >> If I understand your description, you are interested primarily in the > >> differences in genotypes. I would suggest starting with: > >> > >> design <- model.matrix(~time+geno) > >> > >>> design > >> (Intercept) time3h time48h timeC genoLS > >> 1 1 0 0 1 0 > >> 2 1 0 0 1 0 > >> 3 1 0 0 1 0 > >> 4 1 0 0 1 0 > >> 5 1 1 0 0 0 > >> 6 1 1 0 0 0 > >> 7 1 1 0 0 0 > >> 8 1 1 0 0 0 > >> 9 1 0 0 0 1 > >> 10 1 0 0 0 1 > >> 11 1 0 0 0 1 > >> 12 1 0 0 0 1 > >> 13 1 0 1 0 1 > >> 14 1 0 1 0 1 > >> 15 1 0 1 0 1 > >> 16 1 0 1 0 1 > >> attr(,"assign") > >> [1] 0 1 1 1 2 > >> attr(,"contrasts") > >> attr(,"contrasts")$time > >> [1] "contr.treatment" > >> > >> attr(,"contrasts")$geno > >> [1] "contr.treatment" > >> > >> Then, you can follow the usual steps as per the edgeR user's guide -- > >> estimate dispersion, fit the glm with glmFit() and do the LR testing > with > >> glmLRT() and so on. > >> > >> Hope that gets you started. > >> > >> Best, > >> Mark > >> > >> > >> > >> > >> On 18.05.2012, at 05:15, KJ Lim wrote: > >> > >>> Dear edgeR users, > >>> > >>> I am not a statistician nor R programming geek, please forgive me if I > >> ask > >>> stupid question. > >>> > >>> I have RNA-seq data for 2 different genotype of trees with different > time > >>> points 0hour(control),3hr,24hours,and 48hours. Each time point has two > >>> replicates. The experiment design like following: > >>> > >>> Sample harvested after treatment at > >>> Tree H1 Ctrl 3hrs 24hrs 48hrs > >>> Tree H2 Ctrl 3hrs 24hrs 48hrs > >>> > >>> Tree L1 Ctrl 3hrs 24hrs 48hrs > >>> Tree L2 Ctrl 3hrs 24hrs 48hrs > >>> > >>> I would like to study genes that are differentially expressed (DE) > >>> throughout the time points in these 2 genotype of trees with edgeR. I > >> read > >>> from the edgeR user guide, the suitable DE analysis method for my > >> expriment > >>> is GLM likelihood ratio test. > >>> > >>> After read case study in the user guide, I have the RNA-Seq counts in a > >>> file as below in order to input into the edgeR package. > >>> > >>> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > >>> L2_3H L1_1D L2_1D L1_2D L2_2D > >>> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > >>> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > >>> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > >>> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > >>> > >>> I used the following commands to input the counts file into edgeR: > >>> > >>> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > >>> stringsAsFactor=FALSE) > >>> trees <- > >>> > >> > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > >>> treat <- > >>> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >>> > >>> I'm not good in R programming, thus, having this file input into the > >> edgeR > >>> and assign the factors as well as design-matrix is a challenge. I'm > stuck > >>> how to tell the edgeR about my design matrix. > >>> > >>> Could you guys kindly help me on this? Have I input my RNA-Seq counts > >>> correctly? Please correct me if there is any mistake I have done in my > >>> edgeR input. > >>> > >>> I appreciate very much for your help and time. Have a nice day. > >>> > >>> Best regards, > >>> KJ Lim > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> ---------- > >> Prof. Dr. Mark Robinson > >> Bioinformatics > >> Institute of Molecular Life Sciences > >> University of Zurich > >> Winterthurerstrasse 190 > >> 8057 Zurich > >> Switzerland > >> > >> v: +41 44 635 4848 > >> f: +41 44 635 6898 > >> e: mark.robinson@imls.uzh.ch > >> o: Y11-J-16 > >> w: http://tiny.cc/mrobin > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago KJ Lim ▴ 420

Login before adding your answer.