Reading GFF using Starr

0

Entering edit mode

Feseha Abebe-Akele ▴ 30

@feseha-abebe-akele-4412

Last seen 9.6 years ago

Hello everyone; I am trying to analyze Tiling array data using Starr Package and I am stuck at reading GFF files for the 7 genomic sequences of C. elegans. In the example that come with the vignette, a single primordial gff file (20 lines?) is used whic is not anywhere near the 56 MN (combined) gff files. My question is: how do I read in multiple gff files for analysis? among other things I have tried reading them like: gffs <- c(file.path(dataPath,"chrI.gff"), file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), file.path(dataPath,"chrX.gff")) transcriptAnno <- read.gffAnno(gffs, feature="transcript") But none worked for me. I would appreciate any help in getting my analysis to the next level: FYI: I am trying to analyze TEST vs CONTROL experession differential on the C. elegans Tiling Array 1.0 chips. Thanks

Starr Starr • 1.3k views

ADD COMMENT • link updated 13.1 years ago by zacher@lmb.uni-muenchen.de ▴ 50 • written 13.2 years ago by Feseha Abebe-Akele ▴ 30

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 17 days ago

EMBL European Molecular Biology Laborat…

Dear Feseha I am not sure whether this will solve your question, but have you tried cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff (on the OS command line) and then transcriptAnno = read.gffAnno("all.gff", feature="transcript") (in R). Alternatively, if you are so unfortunate to work with an operating system that does not have 'cat', you could also e.g. use R's readLines and writeLines. Best wishes Wolfgang Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: > Hello everyone; > I am trying to analyze Tiling array data using Starr Package > and I am stuck at reading GFF files for the 7 genomic sequences > of C. elegans. In the example that come with the vignette, a > single primordial gff file (20 lines?) is used whic is not > anywhere near the 56 MN (combined) gff files. > > My question is: how do I read in multiple gff files for analysis? > among other things I have tried reading them like: > > gffs <- c(file.path(dataPath,"chrI.gff"), > file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), > file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), > file.path(dataPath,"chrX.gff")) > > transcriptAnno <- read.gffAnno(gffs, feature="transcript") > > But none worked for me. > > I would appreciate any help in getting my analysis to the next level: > > FYI: > I am trying to analyze TEST vs CONTROL experession differential > on the C. elegans Tiling Array 1.0 chips. > > Thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang; "cat" indeed helped reading the GFF. However, I am still unclear about the feature="transcript" parameter. In the example that shipped with the package all entries are "transcript". In the gff I downloaded from NCBI the same column is populated by things like CDS, gene, tRNA etc.. Am I suposed to convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in the 4th column of the gff in to a generic "transcript" entry or would Starr take them in as is with the feature="transcript" parameter and use them? Thanks a lot. Feseha * Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM EST]: > Dear Feseha > > I am not sure whether this will solve your question, but have you tried > > cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff > > (on the OS command line) and then > > transcriptAnno = read.gffAnno("all.gff", feature="transcript") > > (in R). Alternatively, if you are so unfortunate to work with an > operating system that does not have 'cat', you could also e.g. use > R's readLines and writeLines. > > Best wishes > Wolfgang > > > > Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: >> Hello everyone; >> I am trying to analyze Tiling array data using Starr Package >> and I am stuck at reading GFF files for the 7 genomic sequences >> of C. elegans. In the example that come with the vignette, a >> single primordial gff file (20 lines?) is used whic is not >> anywhere near the 56 MN (combined) gff files. >> >> My question is: how do I read in multiple gff files for analysis? >> among other things I have tried reading them like: >> >> gffs <- c(file.path(dataPath,"chrI.gff"), >> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), >> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), >> file.path(dataPath,"chrX.gff")) >> >> transcriptAnno <- read.gffAnno(gffs, feature="transcript") >> >> But none worked for me. >> >> I would appreciate any help in getting my analysis to the next level: >> >> FYI: >> I am trying to analyze TEST vs CONTROL experession differential >> on the C. elegans Tiling Array 1.0 chips. >> >> Thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 13.2 years ago Feseha Abebe-Akele ▴ 30

0

Entering edit mode

Dear Feseha I would suggest omitting the 'feature' argument in your call to 'read.gffAnno' and then select those rows that you care about yourself. The 'Starr' maintainer might be able to provide more details in the function's manual page, or to allow 'feature' to be a vector or a regular expression. Best wishes Wolfgang Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto: > Dear Wolfgang; > > "cat" indeed helped reading the GFF. However, I am still unclear about the > feature="transcript" parameter. In the example that shipped with the > package > all entries are "transcript". In the gff I downloaded from NCBI the same > column is populated by things like CDS, gene, tRNA etc.. Am I suposed to > convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in the > 4th column of the gff in to a generic "transcript" entry or would Starr > take > them in as is with the feature="transcript" parameter and use them? > > Thanks a lot. > > Feseha > > > > * Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM EST]: > >> Dear Feseha >> >> I am not sure whether this will solve your question, but have you tried >> >> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff >> >> (on the OS command line) and then >> >> transcriptAnno = read.gffAnno("all.gff", feature="transcript") >> >> (in R). Alternatively, if you are so unfortunate to work with an >> operating system that does not have 'cat', you could also e.g. use R's >> readLines and writeLines. >> >> Best wishes >> Wolfgang >> >> >> >> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: >>> Hello everyone; >>> I am trying to analyze Tiling array data using Starr Package >>> and I am stuck at reading GFF files for the 7 genomic sequences >>> of C. elegans. In the example that come with the vignette, a >>> single primordial gff file (20 lines?) is used whic is not >>> anywhere near the 56 MN (combined) gff files. >>> >>> My question is: how do I read in multiple gff files for analysis? >>> among other things I have tried reading them like: >>> >>> gffs <- c(file.path(dataPath,"chrI.gff"), >>> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), >>> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), >>> file.path(dataPath,"chrX.gff")) >>> >>> transcriptAnno <- read.gffAnno(gffs, feature="transcript") >>> >>> But none worked for me. >>> >>> I would appreciate any help in getting my analysis to the next level: >>> >>> FYI: >>> I am trying to analyze TEST vs CONTROL experession differential >>> on the C. elegans Tiling Array 1.0 chips. >>> >>> Thanks >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 13.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

zacher@lmb.uni-muenchen.de ▴ 50

@zacherlmbuni-muenchende-3726

Last seen 9.6 years ago

Dear Feseha, sorry for the late reply. I am currently on holidays for some weeks. I am going to make the documentation more clear, regarding what is meant by the "feature" argument. I hope, everything works now. Please contact me if you have any further questions on Starr. Best, Benedikt Wolfgang Huber <whuber at="" embl.de=""> wrote : > Dear Feseha > > I would suggest omitting the 'feature' argument in your call to > 'read.gffAnno' and then select those rows that you care about yourself. > > The 'Starr' maintainer might be able to provide more details in the > function's manual page, or to allow 'feature' to be a vector or a > regular expression. > > Best wishes > Wolfgang > > > Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto: > > Dear Wolfgang; > > > > "cat" indeed helped reading the GFF. However, I am still unclear > about the > > feature="transcript" parameter. In the example that shipped with > the > > package > > all entries are "transcript". In the gff I downloaded from NCBI > the same > > column is populated by things like CDS, gene, tRNA etc.. Am I suposed > to > > convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in > the > > 4th column of the gff in to a generic "transcript" entry or > would Starr > > take > > them in as is with the feature="transcript" parameter and use > them? > > > > Thanks a lot. > > > > Feseha > > > > > > > > * Wolfgang Huber <whuber at="" embl.de=""> > [Fri 04 Mar 2011 01:30:18 PM EST]: > > > >> Dear Feseha > >> > >> I am not sure whether this will solve your question, but have you > tried > >> > >> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > > all.gff > >> > >> (on the OS command line) and then > >> > >> transcriptAnno = read.gffAnno("all.gff", > feature="transcript") > >> > >> (in R). Alternatively, if you are so unfortunate to work with an > >> operating system that does not have 'cat', you could also e.g. use > R's > >> readLines and writeLines. > >> > >> Best wishes > >> Wolfgang > >> > >> > >> > >> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: > >>> Hello everyone; > >>> I am trying to analyze Tiling array data using Starr Package > >>> and I am stuck at reading GFF files for the 7 genomic sequences > >>> of C. elegans. In the example that come with the vignette, a > >>> single primordial gff file (20 lines?) is used whic is not > >>> anywhere near the 56 MN (combined) gff files. > >>> > >>> My question is: how do I read in multiple gff files for > analysis? > >>> among other things I have tried reading them like: > >>> > >>> gffs <- c(file.path(dataPath,"chrI.gff"), > >>> file.path(dataPath,"chrII.gff"), > file.path(dataPath,"chrIII.gff"), > >>> file.path(dataPath,"chrIV.gff"), > file.path(dataPath,"chrV.gff"), > >>> file.path(dataPath,"chrX.gff")) > >>> > >>> transcriptAnno <- read.gffAnno(gffs, > feature="transcript") > >>> > >>> But none worked for me. > >>> > >>> I would appreciate any help in getting my analysis to the next > level: > >>> > >>> FYI: > >>> I am trying to analyze TEST vs CONTROL experession differential > >>> on the C. elegans Tiling Array 1.0 chips. > >>> > >>> Thanks > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> -- > >> > >> > >> Wolfgang Huber > >> EMBL > >> http://www.embl.de/research/units/genome_biology/huber > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.1 years ago zacher@lmb.uni-muenchen.de ▴ 50

Login before adding your answer.