an annotation error when using RIPSeeker
2
0
Entering edit mode
xfwang ▴ 20
@xfwang-7176
Last seen 8.5 years ago
United States

Dear RIPSeeker people,

I tried to make RIPSeeker work by testing the system data. But I got an error as below, which seems an error when using biomaRt.

Actually, I checked the host of "dec2011.archive.ensembl.org", which is not down when I run the code.

Here is the code that I used:

 extdata.dir <- system.file("extdata", package="RIPSeeker")

 bamFiles <- list.files(extdata.dir, "\\.bam$", recursive=TRUE, full.names=TRUE)

 bamFiles <- grep("PRC2", bamFiles, value=TRUE)

 # specify control name

 cNAME <- "SRR039214"

 # output file directory

 outDir <- file.path(getwd(), "RIPSeeker_vigenette_example_PRC2")

 # Parameters setting

 binSize <- NULL      # set to NULL to automatically determine bin size

 minBinSize <- 10000  # min bin size in automatic bin size selection

 maxBinSize <- 10100

 multicore <- TRUE

 strandType <- "-"

 biomart <- "ENSEMBL_MART_ENSEMBL"

 biomaRt_dataset <- "mmusculus_gene_ensembl"

 host <- "dec2011.archive.ensembl.org"       

 # host = "www.ensembl.org"

 goAnno <- "org.Mm.eg.db"                  

 seekOut.PRC2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = TRUE, genomeBuild = "mm9", strandType = strandType, uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = TRUE, binSize=binSize, minBinSize = minBinSize, maxBinSize = maxBinSize, biomart=biomart, host=host, biomaRt_dataset = biomaRt_dataset, goAnno = goAnno, multicore=multicore, outDir=outDir)

........ # the console is truncated from annotation part and the error is as below.

*IV. Annotate RIP regions via online ensembl database (mmusculus_gene_ensembl):

Entity 'nbsp' not defined
attributes construct error
Couldn't find end of Start Tag img line 22
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'nbsp' not defined
Entity 'raquo' not defined
attributes construct error
Couldn't find end of Start Tag img line 40
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Opening and ending tag mismatch: img line 64 and li
Opening and ending tag mismatch: li line 64 and ul
Opening and ending tag mismatch: ul line 63 and div
Entity 'copy' not defined
attributes construct error
Couldn't find end of Start Tag img line 225
attributes construct error
Couldn't find end of Start Tag img line 228
attributes construct error
Couldn't find end of Start Tag img line 231
attributes construct error
Couldn't find end of Start Tag img line 260
Opening and ending tag mismatch: div line 18 and body
Opening and ending tag mismatch: body line 17 and html
Premature end of data in tag html line 2
Error: 1: Entity 'nbsp' not defined
2: attributes construct error
3: Couldn't find end of Start Tag img line 22
4: Entity 'hellip' not defined
5: Entity 'hellip' not defined
6: Entity 'nbsp' not defined
7: Entity 'raquo' not defined
8: attributes construct error
9: Couldn't find end of Start Tag img line 40
10: Entity 'hellip' not defined
11: Entity 'hellip' not defined
12: Entity 'hellip' not defined
13: Entity 'hellip' not defined
14: Entity 'hellip' not defined
15: Opening and ending tag mismatch: img line 64 and li
16: Opening and ending tag mismatch: li line 64 and ul
17: Opening and ending tag mismatch: ul line 63 and div
18: Entity 'copy' not defined
19: attributes construct error
20: Couldn't find end of Start Tag img line 225
21: attributes construct error
22: Couldn't find end of Start Tag img line 228
23: attributes construct error
24: Couldn't find end of Start Tag img line 231
25: attributes construct error
26: Couldn't find end of Start Tag img line 260
27: Opening and e

 

RIPSeeker biomart • 2.5k views
ADD COMMENT
0
Entering edit mode
Yue Li ▴ 370
@yue-li-5245
Last seen 8.9 years ago
USA

Hi,

Thanks for reporting the error. Sorry for the late response. Indeed, I'm getting the same error. There seems to be an issue on the Ensembl database query for the archived database.

My package relies on annotatePeakInBatch {ChIPpeakAnno}, which itself relies on useMart {biomaRt}. I got the same error occurs with this:

mart=useMart(biomart='ENSEMBL_MART_ENSEMBL', host="dec2011.archive.ensembl.org")

But querying the latest database seems to work fine (see below). Currently, a quick fix would be to call peaks without annotations and then annotate the peaks using findOverlaps or similar routines by intersecting the peaks (GRanegs) with the gene annotations (downloaded separately from UCSC/Ensembl).

Hope this helps.

Best,

Yue

> x <- annotateRIP(sigGRanges = ripGR, 
+     biomaRt_dataset = dataset, 
+     biomart='ensembl', goAnno="org.Mm.eg.db")


*** Annotating 71 genomic ranges with mmusculus_gene_ensembl.

Loading required package: org.Mm.eg.db
Loading required package: AnnotationDbi

 

*** GO analysis for 87 associated features with org.Mm.eg.db.

> x
$sigGRangesAnnotated
GRanges object with 87 ranges and 21 metadata columns:
       seqnames                 ranges strand   |    ensembl_gene_id external_gene_name
          <Rle>              <IRanges>  <Rle>   |        <character>        <character>
   [1]     chrX   [ 3893581,  3903615]      -   | ENSMUSG00000101381             Gm3701
   [2]     chrX   [ 7757056,  7777125]      -   | ENSMUSG00000000134               Tfe3
   [3]     chrX   [ 8579926,  8589960]      -   | ENSMUSG00000079701              Ssxb3
   [4]     chrX   [16256701, 16266735]      -   | ENSMUSG00000081312            Gm14520
   [5]     chrX   [20210491, 20220525]      -   | ENSMUSG00000037341             Slc9a7
   ...      ...                    ...    ... ...                ...                ...

 

 

ADD COMMENT
0
Entering edit mode

 

Hi Yue,

Thanks for your reply! It totally make sense. Actually, for my real data, I used the ensembl build version GRCh37. So, I don't have this issue for my real experience.

However, I got a couple of questions about RIPSeeker. First, how does it differentiate biological reps and technical reps for the all-in-one function -- ripSeek. I see we only give the paths to the bam files and the function will combine the bam file if they are replicates but I don't know how does it know if it is biological or technical reps. Or, it treats them as same?

Second, I got an error when I run the function of ripSeek. Here is the code that I used and the session info. I searched it a little bit. Here is a link that might be referred to my error got an error with wavClusteR package; NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)....

Third, why it shows  "do not have any alignment" for all chromosomes?

Thanks a lot!

Best,

Xiaofei

> binSize <-  NULL   # automatically determine bin size

> minBinSize <- 10000

> maxBinSize <- 10100

> multicore <- TRUE

> strandType <- "*"

> biomart <- "ensembl"

> biomaRt_dataset <- "hsapiens_gene_ensembl" # human dataset id name

> goAnno <- "org.Hs.eg.db"                    # GO annotation database

> seekOut.HuR <- ripSeek(bamPath = bamFiles, cNAME = ctrl, reverseComplement = FALSE, genomeBuild = "hg19", strandType = strandType, uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = TRUE,  binSize=binSize, minBinSize = minBinSize, maxBinSize = maxBinSize, biomart=biomart, goAnno = goAnno, biomaRt_dataset = biomaRt_dataset, multicore=multicore, outDir=outDir)

 

 

*I. Collect alignment files

 

 

RIP alignment files:

/Volumes/saturn/xiaofei/Xiaoqing/tophat_out/HuR_S1_tophat.bam

/Volumes/saturn/xiaofei/Xiaoqing/tophat_out/HuR_S2_tophat.bam

 

 

Control alignment files:

/Volumes/saturn/xiaofei/Xiaoqing/tophat_out/IgG_S1_tophat.bam

/Volumes/saturn/xiaofei/Xiaoqing/tophat_out/IgG_S2_tophat.bam

 

 

 

*II. Analyzing RIP library:

 

 

**A. Process and combine alignment files

 

Processing /Volumes/saturn/xiaofei/Xiaoqing/tophat_out/HuR_S1_tophat.bam ... All hits are returned with flags.

Processing /Volumes/saturn/xiaofei/Xiaoqing/tophat_out/HuR_S2_tophat.bam ... All hits are returned with flags.

2 BAM files are combined

*** Only reads from strand * will be considered.

*** Only unique hits are used to compute read count.

*** 1 do not have any alignment.

*** 10 do not have any alignment.

*** 11 do not have any alignment.

*** 12 do not have any alignment.

*** 13 do not have any alignment.

*** 14 do not have any alignment.

*** 15 do not have any alignment.

*** 16 do not have any alignment.

*** 17 do not have any alignment.

*** 18 do not have any alignment.

*** 19 do not have any alignment.

*** 2 do not have any alignment.

*** 20 do not have any alignment.

*** 21 do not have any alignment.

*** 22 do not have any alignment.

*** 3 do not have any alignment.

*** 4 do not have any alignment.

*** 5 do not have any alignment.

*** 6 do not have any alignment.

*** 7 do not have any alignment.

*** 8 do not have any alignment.

*** 9 do not have any alignment.

*** MT do not have any alignment.

*** X do not have any alignment.

*** Y do not have any alignment.

 

Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 

  subscript contains NAs or out-of-bounds indices

> sessionInfo()

R version 3.1.3 (2015-03-09)

Platform: x86_64-apple-darwin10.8.0 (64-bit)

Running under: OS X 10.8.5 (Mountain Lion)

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

 

other attached packages:

 [1] biomaRt_2.22.0          BiocInstaller_1.16.5    RIPSeeker_1.6.0         rtracklayer_1.26.3      GenomicAlignments_1.2.2

 [6] Rsamtools_1.18.3        Biostrings_2.34.1       XVector_0.6.0           GenomicRanges_1.18.4    GenomeInfoDb_1.2.5     

[11] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1    

 

loaded via a namespace (and not attached):

 [1] AnnotationDbi_1.28.2 base64enc_0.1-2      BatchJobs_1.6        BBmisc_1.9           Biobase_2.26.0       BiocParallel_1.0.3  

 [7] bitops_1.0-6         brew_1.0-6           checkmate_1.5.3      codetools_0.2-11     DBI_0.3.1            digest_0.6.8        

[13] fail_1.2             foreach_1.4.2        iterators_1.0.7      magrittr_1.5         RCurl_1.95-4.6       RSQLite_1.0.0       

[19] sendmailR_1.2-1      stringi_0.4-1        stringr_1.0.0        tools_3.1.3          XML_3.98-1.1         zlibbioc_1.12.0     

ADD REPLY
0
Entering edit mode
xfwang ▴ 20
@xfwang-7176
Last seen 8.5 years ago
United States

Dear Yue,

I guess my error might be cause by the version of genome build that I use. For tophat2 alignment, I used GRCh37 downloaded from genome. But, I found the default recent genome annotation in biomaRt is GRCh38, is't it? If want to use annotation for genome build of hg19, I think I have to change the version. How can I change the annotation version in function of "ripSeek"?

Thanks a lot!

> grch_default = useEnsembl(biomart="ensembl")
> listDatasets(grch_default)[31:35,]
                   dataset                                description      version
31 mlucifugus_gene_ensembl           Myotis lucifugus genes (myoLuc2)      myoLuc2
32   hsapiens_gene_ensembl             Homo sapiens genes (GRCh38.p5)    GRCh38.p5
33   pformosa_gene_ensembl      Poecilia formosa genes (PoeFor_5.1.2) PoeFor_5.1.2
34      mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0
35 tbelangeri_gene_ensembl           Tupaia belangeri genes (tupBel1)      tupBel1

----------------------------------------

> grch37 = useEnsembl(biomart="ensembl",GRCh=37)
> listDatasets(grch37)[31:35,]
                    dataset                                description      version
31    hsapiens_gene_ensembl            Homo sapiens genes (GRCh37.p13)   GRCh37.p13
32       mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0
33  tbelangeri_gene_ensembl           Tupaia belangeri genes (tupBel1)      tupBel1
34     ggallus_gene_ensembl              Gallus gallus genes (Galgal4)      Galgal4
35 xtropicalis_gene_ensembl          Xenopus tropicalis genes (JGI4.2)       JGI4.2

 

ADD COMMENT
0
Entering edit mode

Sorry, this is not an answer.

ADD REPLY

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6