Reading gff files in R
2
0
Entering edit mode
Naira Naouar ▴ 140
@naira-naouar-2394
Last seen 10.2 years ago
Dear all, I was wondering if there were already tools to read gff files in R? I am looking for a fast way to extract gene coordinates from a gff file. Regards, Naira -- ================================================================== Na?ra Naouar Tel:+32 (0)9 331 38 63 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM nanao at psb.ugent.be http://www.psb.ugent.be
• 21k views
ADD COMMENT
2
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
hi Naira just parse the gff with read.delim("gff_file.gff", header=F, comment.char="#") -> gff and you get a table that you can filter for gene entries, sth like gff.genes <- gff[gff[,2]=="gene",] depending on where and how the gene is specified as a gene T. On Oct 21, 2008, at 11:00 AM, Naira Naouar wrote: > Dear all, > > I was wondering if there were already tools to read gff files in R? > I am looking for a fast way to extract gene coordinates from a gff > file. > > Regards, > Naira > > -- > ================================================================== > Na?ra Naouar > Tel:+32 (0)9 331 38 63 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > nanao at psb.ugent.be http://www.psb.ugent.be > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
ADD COMMENT
1
Entering edit mode
There used to be a package for this, by Oleg Skylar, but it was deprecated. There is a nice function in the davidTiling experimental data package for extracting attributes from GFF files (probably taken from the gff code base?), I'll do a copy and paste here, since davidTiling is a large download. getAttributeField <- function (x, field, attrsep = ";") { s = strsplit(x, split = attrsep, fixed = TRUE) sapply(s, function(atts) { a = strsplit(atts, split = "=", fixed = TRUE) m = match(field, sapply(a, "[", 1)) if (!is.na(m)) { rv = a[[m]][2] } else { rv = as.character(NA) } return(rv) }) } and here is quick parser gffRead <- function(gffFile, nrows = -1) { cat("Reading ", gffFile, ": ", sep="") gff = read.table(gffFile, sep="\t", as.is=TRUE, quote="", header=FALSE, comment.char="#", nrows = nrows, colClasses=c("character", "character", "character", "integer", "integer", "character", "character", "character", "character")) colnames(gff) = c("seqname", "source", "feature", "start", "end", "score", "strand", "frame", "attributes") cat("found", nrow(gff), "rows with classes:", paste(sapply(gff, class), collapse=", "), "\n") stopifnot(!anyis.na(gff$start)), !anyis.na(gff$end))) return(gff) } Now you can do stuff like gff <- gffRead(gfffile) gff$Name <- getAttributeField(gff$attributes, "Name") gff$ID <- getAttributeField(gff$attributes, "ID") gfffile is just an object holding the file name. Kasper On Oct 21, 2008, at 4:56 , Tobias Straub wrote: > hi Naira > just parse the gff with > > read.delim("gff_file.gff", header=F, comment.char="#") -> gff > > and you get a table that you can filter for gene entries, sth like > gff.genes <- gff[gff[,2]=="gene",] > depending on where and how the gene is specified as a gene > > T. > > On Oct 21, 2008, at 11:00 AM, Naira Naouar wrote: > >> Dear all, >> >> I was wondering if there were already tools to read gff files in R? >> I am looking for a fast way to extract gene coordinates from a gff >> file. >> >> Regards, >> Naira >> >> -- >> ================================================================== >> Na?ra Naouar >> Tel:+32 (0)9 331 38 63 >> VIB Department of Plant Systems Biology, Ghent University >> Technologiepark 927, 9052 Gent, BELGIUM >> nanao at psb.ugent.be http://www.psb.ugent.be >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------------------------------------------------------------------- > Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@michael-lawrence-2759
Last seen 10.2 years ago
On Tue, Oct 21, 2008 at 2:00 AM, Naira Naouar <nanao@psb.ugent.be> wrote: > Dear all, > > I was wondering if there were already tools to read gff files in R? > I am looking for a fast way to extract gene coordinates from a gff file. > The rtracklayer package can do this. Just: track <- import(file) and the attributes become columns in the featureData slot of 'track'. > Regards, > Naira > > -- > ================================================================== > Naïra Naouar > Tel:+32 (0)9 331 38 63 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > nanao@psb.ugent.be http://www.psb.ugent.be > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thank you a lot for all your answers :) Best regards, Naira Michael Lawrence wrote: > > > On Tue, Oct 21, 2008 at 2:00 AM, Naira Naouar <nanao at="" psb.ugent.be=""> <mailto:nanao at="" psb.ugent.be="">> wrote: > > Dear all, > > I was wondering if there were already tools to read gff files in R? > I am looking for a fast way to extract gene coordinates from a gff > file. > > > The rtracklayer package can do this. > > Just: > track <- import(file) > and the attributes become columns in the featureData slot of 'track'. > > > Regards, > Naira > > -- > ================================================================== > Na?ra Naouar > Tel:+32 (0)9 331 38 63 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > nanao at psb.ugent.be <mailto:nanao at="" psb.ugent.be=""> > http://www.psb.ugent.be > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ================================================================== Na?ra Naouar Tel:+32 (0)9 331 38 63 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM nanao at psb.ugent.be http://www.psb.ugent.be
ADD REPLY

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6