Question

Help with toGRanges

0

Entering edit mode

mhartman3 • 0

@mhartman3-9996

Last seen 8.1 years ago

Hi,

I'm new to R, but am trying to analyze 2 ChIP-Seq data sets. I have been following along with several ChIPpeakAnno guides and I am running into a problem I can't figure out. I have managed to find overlapping peaks, but when I go to annotate them I run into the following problem:

> library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

> annoData <- toGRanges(EnsDb.Hsapiens.v75, feature="gene")
Error in toGRanges(EnsDb.Hsapiens.v75, feature = "gene") :
No valid data passed in. For example a data frame as BED format
             file with at least 3 fields in the order of: chromosome, start and end.
             Optional fields are name, score and strand etc.
             Please refer to http://genome.ucsc.edu/FAQ/FAQformat#format1 for details.

I'd appreciate any help in fixing this error!

granges chippeakanno • 2.2k views

ADD COMMENT • link updated 8.1 years ago by Dario Strbenac ★ 1.5k • written 8.1 years ago by mhartman3 • 0

0

Entering edit mode

For extra info:

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 10586)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats4 parallel grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] EnsDb.Hsapiens.v75_0.99.12 ensembldb_1.2.2            GenomicFeatures_1.22.13    AnnotationDbi_1.32.3
[5] Biobase_2.30.0             ChIPpeakAnno_3.4.6         BiocInstaller_1.20.1       RSQLite_1.0.0
[9] DBI_0.3.1                  VennDiagram_1.6.16         futile.logger_1.4.1        GenomicRanges_1.22.4
[13] GenomeInfoDb_1.6.3         Biostrings_2.38.4          XVector_0.10.0             IRanges_2.4.8
[17] S4Vectors_0.8.11           BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4                  AnnotationHub_2.2.5          regioneR_1.2.3               bitops_1.0-6
[5] futile.options_1.0.0         tools_3.2.4                  zlibbioc_1.16.0              biomaRt_2.26.1
[9] digest_0.6.9                 memoise_1.0.0                BSgenome_1.38.0              graph_1.48.0
[13] shiny_0.13.1                 httr_1.1.0                   rtracklayer_1.30.3           multtest_2.26.0
[17] R6_2.1.2                     XML_3.98-1.4                 survival_2.38-3              RBGL_1.46.0
[21] BiocParallel_1.4.3           limma_3.26.9                 GO.db_3.2.2                  lambda.r_1.1.7
[25] matrixStats_0.50.1           htmltools_0.3.5              Rsamtools_1.22.0             splines_3.2.4
[29] MASS_7.3-45                  GenomicAlignments_1.6.3      SummarizedExperiment_1.0.2   xtable_1.8-2
[33] mime_0.4                     interactiveDisplayBase_1.8.0 httpuv_1.3.3                 RCurl_1.95-4.8
>

ADD REPLY • link 8.1 years ago mhartman3 • 0

score 0 · Answer 1 · 2016-03-28

toGenes is for converting simple data structures, such as data frames, into a GRanges object. The correct approach is simply :

> genes(EnsDb.Hsapiens.v75)
GRanges object with 64102 ranges and 5 metadata columns:
                  seqnames                 ranges strand   |         gene_id   gene_name    entrezid   gene_biotype seq_coord_system
                     <Rle>              <IRanges>  <Rle>   |     <character> <character> <character>    <character>      <character>
  ENSG00000000003        X [ 99883667,  99894988]      -   | ENSG00000000003      TSPAN6        7105 protein_coding       chromosome
  ENSG00000000005        X [ 99839799,  99854882]      +   | ENSG00000000005        TNMD       64102 protein_coding       chromosome
  ENSG00000000419       20 [ 49551404,  49575092]      -   | ENSG00000000419        DPM1        8813 protein_coding       chromosome
  ENSG00000000457        1 [169818772, 169863408]      -   | ENSG00000000457       SCYL3       57147 protein_coding       chromosome
  ENSG00000000460        1 [169631245, 169823221]      +   | ENSG00000000460    C1orf112       55732 protein_coding       chromosome
              ...      ...                    ...    ... ...             ...         ...         ...            ...              ...
           LRG_94       10   [72357104, 72362531]      -   |          LRG_94      LRG_94        5551       LRG_gene       chromosome
           LRG_96       15   [55495792, 55582001]      -   |          LRG_96      LRG_96        5873       LRG_gene       chromosome
           LRG_97       22   [37621310, 37640305]      -   |          LRG_97      LRG_97        5880       LRG_gene       chromosome
           LRG_98       11   [36589563, 36601312]      +   |          LRG_98      LRG_98        5896       LRG_gene       chromosome
           LRG_99       11   [36613493, 36619812]      -   |          LRG_99      LRG_99        5897       LRG_gene       chromosome
  -------
  seqinfo: 273 sequences from GRCh37 genome