edgeR : input and design matrix help
2
0
Entering edit mode
KJ Lim ▴ 420
@kj-lim-5288
Last seen 4.3 years ago
Finland
Dear edgeR users, I am not a statistician nor R programming geek, please forgive me if I ask stupid question. I have RNA-seq data for 2 different genotype of trees with different time points 0hour(control),3hr,24hours,and 48hours. Each time point has two replicates. The experiment design like following: Sample harvested after treatment at Tree H1 Ctrl 3hrs 24hrs 48hrs Tree H2 Ctrl 3hrs 24hrs 48hrs Tree L1 Ctrl 3hrs 24hrs 48hrs Tree L2 Ctrl 3hrs 24hrs 48hrs I would like to study genes that are differentially expressed (DE) throughout the time points in these 2 genotype of trees with edgeR. I read from the edgeR user guide, the suitable DE analysis method for my expriment is GLM likelihood ratio test. After read case study in the user guide, I have the RNA-Seq counts in a file as below in order to input into the edgeR package. Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H L2_3H L1_1D L2_1D L1_2D L2_2D AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 I used the following commands to input the counts file into edgeR: rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, stringsAsFactor=FALSE) trees <- factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS"," LS","LS","LS","LS")) treat <- factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2"," 3h1","3h2","24h1","24h2","48h1","48h2")) I'm not good in R programming, thus, having this file input into the edgeR and assign the factors as well as design-matrix is a challenge. I'm stuck how to tell the edgeR about my design matrix. Could you guys kindly help me on this? Have I input my RNA-Seq counts correctly? Please correct me if there is any mistake I have done in my edgeR input. I appreciate very much for your help and time. Have a nice day. Best regards, KJ Lim [[alternative HTML version deleted]]
edgeR ASSIGN edgeR ASSIGN • 3.7k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.1 years ago
Hi KJ Lim, > trees <- > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) This one seems ok, although could be written more compactly and I'll give it a different name: geno <- factor( rep(c("HS","LS"),each=8)) > treat <- > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) Maybe it's better to call this 'time' and you'll need to change this to something like: time <- factor( rep(c("C","3h","24h","48h"), each=4) ) These two factor vectors match the columns of your count table, so you should be ok there. If I understand your description, you are interested primarily in the differences in genotypes. I would suggest starting with: design <- model.matrix(~time+geno) > design (Intercept) time3h time48h timeC genoLS 1 1 0 0 1 0 2 1 0 0 1 0 3 1 0 0 1 0 4 1 0 0 1 0 5 1 1 0 0 0 6 1 1 0 0 0 7 1 1 0 0 0 8 1 1 0 0 0 9 1 0 0 0 1 10 1 0 0 0 1 11 1 0 0 0 1 12 1 0 0 0 1 13 1 0 1 0 1 14 1 0 1 0 1 15 1 0 1 0 1 16 1 0 1 0 1 attr(,"assign") [1] 0 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$time [1] "contr.treatment" attr(,"contrasts")$geno [1] "contr.treatment" Then, you can follow the usual steps as per the edgeR user's guide -- estimate dispersion, fit the glm with glmFit() and do the LR testing with glmLRT() and so on. Hope that gets you started. Best, Mark On 18.05.2012, at 05:15, KJ Lim wrote: > Dear edgeR users, > > I am not a statistician nor R programming geek, please forgive me if I ask > stupid question. > > I have RNA-seq data for 2 different genotype of trees with different time > points 0hour(control),3hr,24hours,and 48hours. Each time point has two > replicates. The experiment design like following: > > Sample harvested after treatment at > Tree H1 Ctrl 3hrs 24hrs 48hrs > Tree H2 Ctrl 3hrs 24hrs 48hrs > > Tree L1 Ctrl 3hrs 24hrs 48hrs > Tree L2 Ctrl 3hrs 24hrs 48hrs > > I would like to study genes that are differentially expressed (DE) > throughout the time points in these 2 genotype of trees with edgeR. I read > from the edgeR user guide, the suitable DE analysis method for my expriment > is GLM likelihood ratio test. > > After read case study in the user guide, I have the RNA-Seq counts in a > file as below in order to input into the edgeR package. > > Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > L2_3H L1_1D L2_1D L1_2D L2_2D > AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > > I used the following commands to input the counts file into edgeR: > > rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > stringsAsFactor=FALSE) > trees <- > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > treat <- > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > I'm not good in R programming, thus, having this file input into the edgeR > and assign the factors as well as design-matrix is a challenge. I'm stuck > how to tell the edgeR about my design matrix. > > Could you guys kindly help me on this? Have I input my RNA-Seq counts > correctly? Please correct me if there is any mistake I have done in my > edgeR input. > > I appreciate very much for your help and time. Have a nice day. > > Best regards, > KJ Lim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin
ADD COMMENT
0
Entering edit mode
KJ Lim ▴ 420
@kj-lim-5288
Last seen 4.3 years ago
Finland
Dear Prof Mark Robinson, Good day. > treat <- > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > Maybe it's better to call this 'time' and you'll need to change this to > something like: > time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > I assigned the time factor as suggested and it showed me as: > time [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h Levels: 24h 3h 96h C However, may I ask, does this time factor vector matches the read counts of time in the input file? The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h C C 3h 3h 24h 24h 96h 96h Should I make some adjustment in my input file in order to suit the time factor vector? Please forgive me, if this is a stupid question. If I understand your description, you are interested primarily in the > differences in genotypes. I would suggest starting with: > > design <- model.matrix(~time+geno) > I would like to study the differentially expressed genes in both 2 HS and LS genotypes of tress across time course experiment. If I would like to study the differentially expressed genes in HS genotype tress across time course, can I have the design matrix as below? design <- model.matrix(~time, data=rawdata) Thank you very much for your time and help. Best regards, KJ Lim On 18 May 2012 11:32, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > Hi KJ Lim, > > > trees <- > > > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > > This one seems ok, although could be written more compactly and I'll give > it a different name: > > geno <- factor( rep(c("HS","LS"),each=8)) > > > treat <- > > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > Maybe it's better to call this 'time' and you'll need to change this to > something like: > > time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > > These two factor vectors match the columns of your count table, so you > should be ok there. > > If I understand your description, you are interested primarily in the > differences in genotypes. I would suggest starting with: > > design <- model.matrix(~time+geno) > > > design > (Intercept) time3h time48h timeC genoLS > 1 1 0 0 1 0 > 2 1 0 0 1 0 > 3 1 0 0 1 0 > 4 1 0 0 1 0 > 5 1 1 0 0 0 > 6 1 1 0 0 0 > 7 1 1 0 0 0 > 8 1 1 0 0 0 > 9 1 0 0 0 1 > 10 1 0 0 0 1 > 11 1 0 0 0 1 > 12 1 0 0 0 1 > 13 1 0 1 0 1 > 14 1 0 1 0 1 > 15 1 0 1 0 1 > 16 1 0 1 0 1 > attr(,"assign") > [1] 0 1 1 1 2 > attr(,"contrasts") > attr(,"contrasts")$time > [1] "contr.treatment" > > attr(,"contrasts")$geno > [1] "contr.treatment" > > Then, you can follow the usual steps as per the edgeR user's guide -- > estimate dispersion, fit the glm with glmFit() and do the LR testing with > glmLRT() and so on. > > Hope that gets you started. > > Best, > Mark > > > > > On 18.05.2012, at 05:15, KJ Lim wrote: > > > Dear edgeR users, > > > > I am not a statistician nor R programming geek, please forgive me if I > ask > > stupid question. > > > > I have RNA-seq data for 2 different genotype of trees with different time > > points 0hour(control),3hr,24hours,and 48hours. Each time point has two > > replicates. The experiment design like following: > > > > Sample harvested after treatment at > > Tree H1 Ctrl 3hrs 24hrs 48hrs > > Tree H2 Ctrl 3hrs 24hrs 48hrs > > > > Tree L1 Ctrl 3hrs 24hrs 48hrs > > Tree L2 Ctrl 3hrs 24hrs 48hrs > > > > I would like to study genes that are differentially expressed (DE) > > throughout the time points in these 2 genotype of trees with edgeR. I > read > > from the edgeR user guide, the suitable DE analysis method for my > expriment > > is GLM likelihood ratio test. > > > > After read case study in the user guide, I have the RNA-Seq counts in a > > file as below in order to input into the edgeR package. > > > > Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > > L2_3H L1_1D L2_1D L1_2D L2_2D > > AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > > AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > > AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > > AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > > > > I used the following commands to input the counts file into edgeR: > > > > rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > > stringsAsFactor=FALSE) > > trees <- > > > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > > treat <- > > > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > > > > I'm not good in R programming, thus, having this file input into the > edgeR > > and assign the factors as well as design-matrix is a challenge. I'm stuck > > how to tell the edgeR about my design matrix. > > > > Could you guys kindly help me on this? Have I input my RNA-Seq counts > > correctly? Please correct me if there is any mistake I have done in my > > edgeR input. > > > > I appreciate very much for your help and time. Have a nice day. > > > > Best regards, > > KJ Lim > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson@imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 18.05.2012, at 12:11, KJ Lim wrote: > Dear Prof Mark Robinson, > > Good day. > >> treat <- > >> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >> >> Maybe it's better to call this 'time' and you'll need to change this to >> something like: >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) >> > > I assigned the time factor as suggested and it showed me as: > >> time > [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h > Levels: 24h 3h 96h C > > However, may I ask, does this time factor vector matches the read counts of > time in the input file? > The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h > C C 3h 3h 24h 24h 96h 96h > > Should I make some adjustment in my input file in order to suit the time > factor vector? Please forgive me, if this is a stupid question. Apologies, my mistake ? change that to: time <- factor( rep(rep(c("C","3h","24h","48h"),each=2),2) ) Best, Mark > If I understand your description, you are interested primarily in the >> differences in genotypes. I would suggest starting with: >> >> design <- model.matrix(~time+geno) >> > > I would like to study the differentially expressed genes in both 2 HS and > LS genotypes of tress across time course experiment. > > If I would like to study the differentially expressed genes in HS genotype > tress across time course, can I have the design matrix as below? > > design <- model.matrix(~time, data=rawdata) > > Thank you very much for your time and help. > > Best regards, > KJ Lim > > > > On 18 May 2012 11:32, Mark Robinson <mark.robinson at="" imls.uzh.ch=""> wrote: > >> Hi KJ Lim, >> >>> trees <- >>> >> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS ","LS","LS","LS","LS")) >> >> This one seems ok, although could be written more compactly and I'll give >> it a different name: >> >> geno <- factor( rep(c("HS","LS"),each=8)) >> >>> treat <- >>> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >> >> Maybe it's better to call this 'time' and you'll need to change this to >> something like: >> >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) >> >> These two factor vectors match the columns of your count table, so you >> should be ok there. >> >> If I understand your description, you are interested primarily in the >> differences in genotypes. I would suggest starting with: >> >> design <- model.matrix(~time+geno) >> >>> design >> (Intercept) time3h time48h timeC genoLS >> 1 1 0 0 1 0 >> 2 1 0 0 1 0 >> 3 1 0 0 1 0 >> 4 1 0 0 1 0 >> 5 1 1 0 0 0 >> 6 1 1 0 0 0 >> 7 1 1 0 0 0 >> 8 1 1 0 0 0 >> 9 1 0 0 0 1 >> 10 1 0 0 0 1 >> 11 1 0 0 0 1 >> 12 1 0 0 0 1 >> 13 1 0 1 0 1 >> 14 1 0 1 0 1 >> 15 1 0 1 0 1 >> 16 1 0 1 0 1 >> attr(,"assign") >> [1] 0 1 1 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$time >> [1] "contr.treatment" >> >> attr(,"contrasts")$geno >> [1] "contr.treatment" >> >> Then, you can follow the usual steps as per the edgeR user's guide -- >> estimate dispersion, fit the glm with glmFit() and do the LR testing with >> glmLRT() and so on. >> >> Hope that gets you started. >> >> Best, >> Mark >> >> >> >> >> On 18.05.2012, at 05:15, KJ Lim wrote: >> >>> Dear edgeR users, >>> >>> I am not a statistician nor R programming geek, please forgive me if I >> ask >>> stupid question. >>> >>> I have RNA-seq data for 2 different genotype of trees with different time >>> points 0hour(control),3hr,24hours,and 48hours. Each time point has two >>> replicates. The experiment design like following: >>> >>> Sample harvested after treatment at >>> Tree H1 Ctrl 3hrs 24hrs 48hrs >>> Tree H2 Ctrl 3hrs 24hrs 48hrs >>> >>> Tree L1 Ctrl 3hrs 24hrs 48hrs >>> Tree L2 Ctrl 3hrs 24hrs 48hrs >>> >>> I would like to study genes that are differentially expressed (DE) >>> throughout the time points in these 2 genotype of trees with edgeR. I >> read >>> from the edgeR user guide, the suitable DE analysis method for my >> expriment >>> is GLM likelihood ratio test. >>> >>> After read case study in the user guide, I have the RNA-Seq counts in a >>> file as below in order to input into the edgeR package. >>> >>> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H >>> L2_3H L1_1D L2_1D L1_2D L2_2D >>> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 >>> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 >>> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 >>> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 >>> >>> I used the following commands to input the counts file into edgeR: >>> >>> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, >>> stringsAsFactor=FALSE) >>> trees <- >>> >> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS ","LS","LS","LS","LS")) >>> treat <- >>> >> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2 ","3h1","3h2","24h1","24h2","48h1","48h2")) >>> >>> I'm not good in R programming, thus, having this file input into the >> edgeR >>> and assign the factors as well as design-matrix is a challenge. I'm stuck >>> how to tell the edgeR about my design matrix. >>> >>> Could you guys kindly help me on this? Have I input my RNA-Seq counts >>> correctly? Please correct me if there is any mistake I have done in my >>> edgeR input. >>> >>> I appreciate very much for your help and time. Have a nice day. >>> >>> Best regards, >>> KJ Lim >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ---------- >> Prof. Dr. Mark Robinson >> Bioinformatics >> Institute of Molecular Life Sciences >> University of Zurich >> Winterthurerstrasse 190 >> 8057 Zurich >> Switzerland >> >> v: +41 44 635 4848 >> f: +41 44 635 6898 >> e: mark.robinson at imls.uzh.ch >> o: Y11-J-16 >> w: http://tiny.cc/mrobin >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks Prof Mark Robinson. You guys are doing a great job to help out the community! Thanks. On 18 May 2012 23:31, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > > > On 18.05.2012, at 12:11, KJ Lim wrote: > > > Dear Prof Mark Robinson, > > > > Good day. > > > >> treat <- > > > >> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >> > >> Maybe it's better to call this 'time' and you'll need to change this to > >> something like: > >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > >> > > > > I assigned the time factor as suggested and it showed me as: > > > >> time > > [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h > > Levels: 24h 3h 96h C > > > > However, may I ask, does this time factor vector matches the read counts > of > > time in the input file? > > The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h > > C C 3h 3h 24h 24h 96h 96h > > > > Should I make some adjustment in my input file in order to suit the time > > factor vector? Please forgive me, if this is a stupid question. > > > Apologies, my mistake … change that to: > > time <- factor( rep(rep(c("C","3h","24h","48h"),each=2),2) ) > > Best, > Mark > > > > > > > > If I understand your description, you are interested primarily in the > >> differences in genotypes. I would suggest starting with: > >> > >> design <- model.matrix(~time+geno) > >> > > > > I would like to study the differentially expressed genes in both 2 HS and > > LS genotypes of tress across time course experiment. > > > > If I would like to study the differentially expressed genes in HS > genotype > > tress across time course, can I have the design matrix as below? > > > > design <- model.matrix(~time, data=rawdata) > > > > Thank you very much for your time and help. > > > > Best regards, > > KJ Lim > > > > > > > > On 18 May 2012 11:32, Mark Robinson <mark.robinson@imls.uzh.ch> wrote: > > > >> Hi KJ Lim, > >> > >>> trees <- > >>> > >> > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > >> > >> This one seems ok, although could be written more compactly and I'll > give > >> it a different name: > >> > >> geno <- factor( rep(c("HS","LS"),each=8)) > >> > >>> treat <- > >>> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >> > >> Maybe it's better to call this 'time' and you'll need to change this to > >> something like: > >> > >> time <- factor( rep(c("C","3h","24h","48h"), each=4) ) > >> > >> These two factor vectors match the columns of your count table, so you > >> should be ok there. > >> > >> If I understand your description, you are interested primarily in the > >> differences in genotypes. I would suggest starting with: > >> > >> design <- model.matrix(~time+geno) > >> > >>> design > >> (Intercept) time3h time48h timeC genoLS > >> 1 1 0 0 1 0 > >> 2 1 0 0 1 0 > >> 3 1 0 0 1 0 > >> 4 1 0 0 1 0 > >> 5 1 1 0 0 0 > >> 6 1 1 0 0 0 > >> 7 1 1 0 0 0 > >> 8 1 1 0 0 0 > >> 9 1 0 0 0 1 > >> 10 1 0 0 0 1 > >> 11 1 0 0 0 1 > >> 12 1 0 0 0 1 > >> 13 1 0 1 0 1 > >> 14 1 0 1 0 1 > >> 15 1 0 1 0 1 > >> 16 1 0 1 0 1 > >> attr(,"assign") > >> [1] 0 1 1 1 2 > >> attr(,"contrasts") > >> attr(,"contrasts")$time > >> [1] "contr.treatment" > >> > >> attr(,"contrasts")$geno > >> [1] "contr.treatment" > >> > >> Then, you can follow the usual steps as per the edgeR user's guide -- > >> estimate dispersion, fit the glm with glmFit() and do the LR testing > with > >> glmLRT() and so on. > >> > >> Hope that gets you started. > >> > >> Best, > >> Mark > >> > >> > >> > >> > >> On 18.05.2012, at 05:15, KJ Lim wrote: > >> > >>> Dear edgeR users, > >>> > >>> I am not a statistician nor R programming geek, please forgive me if I > >> ask > >>> stupid question. > >>> > >>> I have RNA-seq data for 2 different genotype of trees with different > time > >>> points 0hour(control),3hr,24hours,and 48hours. Each time point has two > >>> replicates. The experiment design like following: > >>> > >>> Sample harvested after treatment at > >>> Tree H1 Ctrl 3hrs 24hrs 48hrs > >>> Tree H2 Ctrl 3hrs 24hrs 48hrs > >>> > >>> Tree L1 Ctrl 3hrs 24hrs 48hrs > >>> Tree L2 Ctrl 3hrs 24hrs 48hrs > >>> > >>> I would like to study genes that are differentially expressed (DE) > >>> throughout the time points in these 2 genotype of trees with edgeR. I > >> read > >>> from the edgeR user guide, the suitable DE analysis method for my > >> expriment > >>> is GLM likelihood ratio test. > >>> > >>> After read case study in the user guide, I have the RNA-Seq counts in a > >>> file as below in order to input into the edgeR package. > >>> > >>> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H > >>> L2_3H L1_1D L2_1D L1_2D L2_2D > >>> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > >>> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1 > >>> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0 > >>> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0 > >>> > >>> I used the following commands to input the counts file into edgeR: > >>> > >>> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE, > >>> stringsAsFactor=FALSE) > >>> trees <- > >>> > >> > factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS" ,"LS","LS","LS","LS")) > >>> treat <- > >>> > >> > factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2" ,"3h1","3h2","24h1","24h2","48h1","48h2")) > >>> > >>> I'm not good in R programming, thus, having this file input into the > >> edgeR > >>> and assign the factors as well as design-matrix is a challenge. I'm > stuck > >>> how to tell the edgeR about my design matrix. > >>> > >>> Could you guys kindly help me on this? Have I input my RNA-Seq counts > >>> correctly? Please correct me if there is any mistake I have done in my > >>> edgeR input. > >>> > >>> I appreciate very much for your help and time. Have a nice day. > >>> > >>> Best regards, > >>> KJ Lim > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> ---------- > >> Prof. Dr. Mark Robinson > >> Bioinformatics > >> Institute of Molecular Life Sciences > >> University of Zurich > >> Winterthurerstrasse 190 > >> 8057 Zurich > >> Switzerland > >> > >> v: +41 44 635 4848 > >> f: +41 44 635 6898 > >> e: mark.robinson@imls.uzh.ch > >> o: Y11-J-16 > >> w: http://tiny.cc/mrobin > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6