Reading GFF using Starr
2
0
Entering edit mode
@feseha-abebe-akele-4412
Last seen 10.4 years ago
Hello everyone; I am trying to analyze Tiling array data using Starr Package and I am stuck at reading GFF files for the 7 genomic sequences of C. elegans. In the example that come with the vignette, a single primordial gff file (20 lines?) is used whic is not anywhere near the 56 MN (combined) gff files. My question is: how do I read in multiple gff files for analysis? among other things I have tried reading them like: gffs <- c(file.path(dataPath,"chrI.gff"), file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), file.path(dataPath,"chrX.gff")) transcriptAnno <- read.gffAnno(gffs, feature="transcript") But none worked for me. I would appreciate any help in getting my analysis to the next level: FYI: I am trying to analyze TEST vs CONTROL experession differential on the C. elegans Tiling Array 1.0 chips. Thanks
Starr Starr • 1.5k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…
Dear Feseha I am not sure whether this will solve your question, but have you tried cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff (on the OS command line) and then transcriptAnno = read.gffAnno("all.gff", feature="transcript") (in R). Alternatively, if you are so unfortunate to work with an operating system that does not have 'cat', you could also e.g. use R's readLines and writeLines. Best wishes Wolfgang Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: > Hello everyone; > I am trying to analyze Tiling array data using Starr Package > and I am stuck at reading GFF files for the 7 genomic sequences > of C. elegans. In the example that come with the vignette, a > single primordial gff file (20 lines?) is used whic is not > anywhere near the 56 MN (combined) gff files. > > My question is: how do I read in multiple gff files for analysis? > among other things I have tried reading them like: > > gffs <- c(file.path(dataPath,"chrI.gff"), > file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), > file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), > file.path(dataPath,"chrX.gff")) > > transcriptAnno <- read.gffAnno(gffs, feature="transcript") > > But none worked for me. > > I would appreciate any help in getting my analysis to the next level: > > FYI: > I am trying to analyze TEST vs CONTROL experession differential > on the C. elegans Tiling Array 1.0 chips. > > Thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT
0
Entering edit mode
Dear Wolfgang; "cat" indeed helped reading the GFF. However, I am still unclear about the feature="transcript" parameter. In the example that shipped with the package all entries are "transcript". In the gff I downloaded from NCBI the same column is populated by things like CDS, gene, tRNA etc.. Am I suposed to convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in the 4th column of the gff in to a generic "transcript" entry or would Starr take them in as is with the feature="transcript" parameter and use them? Thanks a lot. Feseha * Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM EST]: > Dear Feseha > > I am not sure whether this will solve your question, but have you tried > > cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff > > (on the OS command line) and then > > transcriptAnno = read.gffAnno("all.gff", feature="transcript") > > (in R). Alternatively, if you are so unfortunate to work with an > operating system that does not have 'cat', you could also e.g. use > R's readLines and writeLines. > > Best wishes > Wolfgang > > > > Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: >> Hello everyone; >> I am trying to analyze Tiling array data using Starr Package >> and I am stuck at reading GFF files for the 7 genomic sequences >> of C. elegans. In the example that come with the vignette, a >> single primordial gff file (20 lines?) is used whic is not >> anywhere near the 56 MN (combined) gff files. >> >> My question is: how do I read in multiple gff files for analysis? >> among other things I have tried reading them like: >> >> gffs <- c(file.path(dataPath,"chrI.gff"), >> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), >> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), >> file.path(dataPath,"chrX.gff")) >> >> transcriptAnno <- read.gffAnno(gffs, feature="transcript") >> >> But none worked for me. >> >> I would appreciate any help in getting my analysis to the next level: >> >> FYI: >> I am trying to analyze TEST vs CONTROL experession differential >> on the C. elegans Tiling Array 1.0 chips. >> >> Thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Dear Feseha I would suggest omitting the 'feature' argument in your call to 'read.gffAnno' and then select those rows that you care about yourself. The 'Starr' maintainer might be able to provide more details in the function's manual page, or to allow 'feature' to be a vector or a regular expression. Best wishes Wolfgang Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto: > Dear Wolfgang; > > "cat" indeed helped reading the GFF. However, I am still unclear about the > feature="transcript" parameter. In the example that shipped with the > package > all entries are "transcript". In the gff I downloaded from NCBI the same > column is populated by things like CDS, gene, tRNA etc.. Am I suposed to > convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in the > 4th column of the gff in to a generic "transcript" entry or would Starr > take > them in as is with the feature="transcript" parameter and use them? > > Thanks a lot. > > Feseha > > > > * Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM EST]: > >> Dear Feseha >> >> I am not sure whether this will solve your question, but have you tried >> >> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff >> >> (on the OS command line) and then >> >> transcriptAnno = read.gffAnno("all.gff", feature="transcript") >> >> (in R). Alternatively, if you are so unfortunate to work with an >> operating system that does not have 'cat', you could also e.g. use R's >> readLines and writeLines. >> >> Best wishes >> Wolfgang >> >> >> >> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: >>> Hello everyone; >>> I am trying to analyze Tiling array data using Starr Package >>> and I am stuck at reading GFF files for the 7 genomic sequences >>> of C. elegans. In the example that come with the vignette, a >>> single primordial gff file (20 lines?) is used whic is not >>> anywhere near the 56 MN (combined) gff files. >>> >>> My question is: how do I read in multiple gff files for analysis? >>> among other things I have tried reading them like: >>> >>> gffs <- c(file.path(dataPath,"chrI.gff"), >>> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"), >>> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"), >>> file.path(dataPath,"chrX.gff")) >>> >>> transcriptAnno <- read.gffAnno(gffs, feature="transcript") >>> >>> But none worked for me. >>> >>> I would appreciate any help in getting my analysis to the next level: >>> >>> FYI: >>> I am trying to analyze TEST vs CONTROL experession differential >>> on the C. elegans Tiling Array 1.0 chips. >>> >>> Thanks >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY
0
Entering edit mode
@zacherlmbuni-muenchende-3726
Last seen 10.4 years ago
Dear Feseha, sorry for the late reply. I am currently on holidays for some weeks. I am going to make the documentation more clear, regarding what is meant by the "feature" argument. I hope, everything works now. Please contact me if you have any further questions on Starr. Best, Benedikt Wolfgang Huber <whuber at="" embl.de=""> wrote : > Dear Feseha > > I would suggest omitting the 'feature' argument in your call to > 'read.gffAnno' and then select those rows that you care about yourself. > > The 'Starr' maintainer might be able to provide more details in the > function's manual page, or to allow 'feature' to be a vector or a > regular expression. > > Best wishes > Wolfgang > > > Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto: > > Dear Wolfgang; > > > > "cat" indeed helped reading the GFF. However, I am still unclear > about the > > feature="transcript" parameter. In the example that shipped with > the > > package > > all entries are "transcript". In the gff I downloaded from NCBI > the same > > column is populated by things like CDS, gene, tRNA etc.. Am I suposed > to > > convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in > the > > 4th column of the gff in to a generic "transcript" entry or > would Starr > > take > > them in as is with the feature="transcript" parameter and use > them? > > > > Thanks a lot. > > > > Feseha > > > > > > > > * Wolfgang Huber <whuber at="" embl.de=""> > [Fri 04 Mar 2011 01:30:18 PM EST]: > > > >> Dear Feseha > >> > >> I am not sure whether this will solve your question, but have you > tried > >> > >> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > > all.gff > >> > >> (on the OS command line) and then > >> > >> transcriptAnno = read.gffAnno("all.gff", > feature="transcript") > >> > >> (in R). Alternatively, if you are so unfortunate to work with an > >> operating system that does not have 'cat', you could also e.g. use > R's > >> readLines and writeLines. > >> > >> Best wishes > >> Wolfgang > >> > >> > >> > >> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto: > >>> Hello everyone; > >>> I am trying to analyze Tiling array data using Starr Package > >>> and I am stuck at reading GFF files for the 7 genomic sequences > >>> of C. elegans. In the example that come with the vignette, a > >>> single primordial gff file (20 lines?) is used whic is not > >>> anywhere near the 56 MN (combined) gff files. > >>> > >>> My question is: how do I read in multiple gff files for > analysis? > >>> among other things I have tried reading them like: > >>> > >>> gffs <- c(file.path(dataPath,"chrI.gff"), > >>> file.path(dataPath,"chrII.gff"), > file.path(dataPath,"chrIII.gff"), > >>> file.path(dataPath,"chrIV.gff"), > file.path(dataPath,"chrV.gff"), > >>> file.path(dataPath,"chrX.gff")) > >>> > >>> transcriptAnno <- read.gffAnno(gffs, > feature="transcript") > >>> > >>> But none worked for me. > >>> > >>> I would appreciate any help in getting my analysis to the next > level: > >>> > >>> FYI: > >>> I am trying to analyze TEST vs CONTROL experession differential > >>> on the C. elegans Tiling Array 1.0 chips. > >>> > >>> Thanks > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> -- > >> > >> > >> Wolfgang Huber > >> EMBL > >> http://www.embl.de/research/units/genome_biology/huber > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Login before adding your answer.

Traffic: 724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6