Loading alternative genomes into ggbio
1
0
Entering edit mode
@danielantonypass-7717
Last seen 8.6 years ago
United Kingdom

I have been trying out ggbio for the karyogram figure generation, and everything works with the hg19 dataset as in the manual, but I don't understand how to load my own species of interest or which format the data has to be in. 

Advice would be appreciated!

What I've used for the hg19:

data(hg19IdeogramCyto, package = "biovizBase")
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))
autoplot(hg19, layout = "karyogram", cytoband = TRUE)

Thanks

ggbio • 2.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 10 minutes ago
United States

You could always look at the data you are currently using and infer from that what is expected, no?

> hg19IdeogramCyto
GRanges object with 862 ranges and 2 metadata columns:
        seqnames               ranges strand   |     name gieStain
           <Rle>            <IRanges>  <Rle>   | <factor> <factor>
    [1]     chr1  [      0,  2300000]      *   |   p36.33     gneg
    [2]     chr1  [2300000,  5400000]      *   |   p36.32   gpos25
    [3]     chr1  [5400000,  7200000]      *   |   p36.31     gneg
    [4]     chr1  [7200000,  9200000]      *   |   p36.23   gpos25
    [5]     chr1  [9200000, 12700000]      *   |   p36.22     gneg
    ...      ...                  ...    ... ...      ...      ...
  [858]     chrY [15100000, 19800000]      *   |  q11.221   gpos50
  [859]     chrY [19800000, 22100000]      *   |  q11.222     gneg
  [860]     chrY [22100000, 26200000]      *   |  q11.223   gpos50
  [861]     chrY [26200000, 28800000]      *   |   q11.23     gneg
  [862]     chrY [28800000, 59373566]      *   |      q12     gvar
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

So you need a GRanges with this extra metadata. Let's look at AnnotationHub.

> library(AnnotationHub)
> hub <- AnnotationHub()

> cyto <- query(hub, c("cytoband"))

> cyto
AnnotationHub with 7 records
# snapshotDate(): 2015-08-26
# $dataprovider: UCSC
# $species: Homo sapiens, Drosophila melanogaster, Mus musculus, Rattus norv...
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH5012"]]'

           title          
  AH5012 | Chromosome Band
  AH5129 | Chromosome Band
  AH5292 | Chromosome Band
  AH5416 | Chromosome Band
  AH6158 | Chromosome Band
  AH6379 | Chromosome Band
  AH6810 | Chromosome Band
> cyto$species
[1] "Homo sapiens"            "Homo sapiens"           
[3] "Homo sapiens"            "Homo sapiens"           
[5] "Mus musculus"            "Rattus norvegicus"      
[7] "Drosophila melanogaster"

> cyto[[1]]
require(\u201cGenomicRanges\u201d)
retrieving 1 resources
  |======================================================================| 100%
UCSC track 'cytoBand'
UCSCData object with 862 ranges and 1 metadata column:
        seqnames               ranges strand   |        name
           <Rle>            <IRanges>  <Rle>   | <character>
    [1]     chr1  [      1,  2300000]      *   |      p36.33
    [2]     chr1  [2300001,  5400000]      *   |      p36.32
    [3]     chr1  [5400001,  7200000]      *   |      p36.31
    [4]     chr1  [7200001,  9200000]      *   |      p36.23
    [5]     chr1  [9200001, 12700000]      *   |      p36.22
    ...      ...                  ...    ... ...         ...
  [858]    chr22 [37600001, 41000000]      *   |       q13.1
  [859]    chr22 [41000001, 44200000]      *   |       q13.2
  [860]    chr22 [44200001, 48400000]      *   |      q13.31
  [861]    chr22 [48400001, 49400000]      *   |      q13.32
  [862]    chr22 [49400001, 51304566]      *   |      q13.33
  -------
  seqinfo: 93 sequences from hg19 genome
There were 33 warnings (use warnings() to see them)

I don' t know what you mean by 'my own species'. That's presumably human, but maybe I am being too literal ;-D. Anyway if you care about human, mouse, rat, or fly, you are like 75% of the way there. All you need is the staining information, which is presumably somewhere that a google search can go.

ADD COMMENT
0
Entering edit mode

By 'my own species' I meant my species of research (Arabidopsis thaliana). I'm afraid I still don't see how I can import a dataset, or what format that has to come in. I have looked through the class structure, but it's not one I have seen before. I am looking at the way to construct it from https://www.bioconductor.org/help/workflows/annotation/AnnotatingRanges, but how can it be applied to an organism not in the database?

ADD REPLY
0
Entering edit mode

That is probably not what you want. Instead look at the primer vignette for GenomicFeatures. The basic idea is to have a compact representation (chr, start, end) of the position for something on a genome, as well as metadata (gene location, staining band, whatever) that corresponds to that location.

So to make a GRanges object for making a karyotype for Arabidopsis, you will need to get the (chr, start, end) for each staining region, and then you can create a GRanges object with those data and then use ggbio to make the karyogram. Since this is your species of research, I presume you already have the information you require, and just need to stuff it into a GRanges object.

As an example, we can make a small version of the karyotype data that comes with biovizBase:

> meta <- data.frame(name = c("p36.33","p36.32","p36.31"), gieStain = c("gneg","gpos25","gneg"))

> bands <- data.frame(chr = rep("chr1", 3), start = c(0,2300000,5400000), end = c(2300000,5400000,7200000))

> example <- GRanges(bands$chr, IRanges(bands$start, bands$end), name = meta$name, gieStain = meta$gieStain)
> example
GRanges object with 3 ranges and 2 metadata columns:
      seqnames             ranges strand |     name gieStain
         <Rle>          <IRanges>  <Rle> | <factor> <factor>
  [1]     chr1 [      0, 2300000]      * |   p36.33     gneg
  [2]     chr1 [2300000, 5400000]      * |   p36.32   gpos25
  [3]     chr1 [5400000, 7200000]      * |   p36.31     gneg
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Presumably you would have all these data, and could just read in and emulate what I have done here.

ADD REPLY

Login before adding your answer.

Traffic: 1073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6