Question: Reading GFF files into R and GenomeGraphs
0
gravatar for Michael Dondrup
10.3 years ago by
Michael Dondrup140 wrote:
Hi, it took a while to figure out how to read genome annotation files with and without rtracklayer, thanks Michael, and also how to plot bacterial chromosomes using the GenomeGraphs package. I think, this information can be useful for others, so I made a tiny howto. I think some of this could also be added to the documentation examples of the GenomeGraphs package. I would be glad if someone would tell me if that fits or if there are some comments. Thank you again for this package. 1) Howto plot genomic annotation from a GFF file using rtracklayer: require("GenomeGraphs") require("rtracklayer") # read in the gff, example from import.gff: # import a GFF V2 file gff <- import.gff(system.file("tests", "v2.gff", package = "rtracklayer"), version = "2") # the gff object contains an IRanges list with the intervals # we need a function to convert to an AnnotationTrack from an IRanges object: makeAnnotationTrackFromIRanges = function (iranges, dp=DisplayPars(ranges = "yellow", plotId=TRUE )) { iranges.names = if (is.null(names(iranges))) {1:length(iranges)} else {names(iranges)} # make some IDs annotation = data.frame(start=iranges at start, end=iranges at start+iranges@width-1, feature=paste("ranges"), # there is no more feature information here group=c(1:length(iranges)), # put every region in a different group ID=iranges.names) makeAnnotationTrack(regions=annotation, dp=dp) } aTrack = makeAnnotationTrackFromIRanges(gff at ranges[[1]]) # there is a RangesList inside the gff gdPlot(aTrack, minBase=1, maxBase=11000) 2) Howto plot arbitrary chromosome data contained in a data.frame: I wasn't fully content with the output from the import.gff function, I have additional information like gene name & reading frame which I didn't get this way. Assume there is a data.frame "cds" containing annotation with columns containing at least columns: "GeneID", "Start", "Stop" Then, the following function can make an AnnotationTrack from it: makeSingleAnnotationTrack = function (cds, dp=DisplayPars(orf="blue", plotId=T)) { annotation = data.frame(start=cds$Start, end=cds$Stop, feature=rep.int("orf", nrow(cds)), # this can also be taken from some annotation field group=c(1:nrow(cds)), ID=cds$GeneID ) makeAnnotationTrack(annotation, dp=dp) } 3) Add coverage data to the annotation data: # convert an iranges object into a coverage track # I think this is a bit inefficient for large iranges objects, is there a better way? makeBaseTrackCoverageFromIRanges = function(iranges, ...) { baseCoverage = as.numeric(coverage(iranges)) makeBaseTrack(base=1:length(baseCoverage), value=baseCoverage, ...) } # and then plot this x = IRanges(start=c(2, 0, NA), end=c(NA, NA, 14), width=11:0) plotlist = list(makeAnnotationTrackFromIRanges(x), makeBaseTrackCoverageFromIRanges(x)) gdPlot(plotlist,min=1, max=20) This is maybe not very nice, but hope this helps..... best Michael Am 30.07.2009 um 13:24 schrieb Michael Lawrence: > On Thu, Jul 30, 2009 at 12:52 AM, Michael Dondrup < > Michael.Dondrup at bccs.uib.no> wrote: > >> Dear BioC list, >> >> I'm trying to use the package GenomeGraphs to visualise custom >> genome data >> (genome not in public databases). In the corresponding publication to >> genomegraphs (Durinck et al. BMC Bioinformatics, 2009 ) I found the >> partial >> clue: "... genomic annotation encoded in GFF files can be easily >> used to >> create a custom AnnotationTrack object for visualization... region >> start, >> and end positions need to be given, as well as how these regions >> are to be >> grouped." >> This leaves me with two questions: >> - I have no idea how to parse a GFF file, is there a GFF parser in >> Bioconductor? >> > > Yes, rtracklayer has a parser. Please see the vignette. > > >> >> - If I have such GFF file for a genome, how can I create such an >> AnnotationTrack with all CDS? >> >> Hope somebody can help me with one of this. >> >> Best >> Michael >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 10.3 years ago by Michael Dondrup140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour