Question: OrgDB download failed via AnnotationHub
0
gravatar for chirag.parsania
4 weeks ago by
chirag.parsania0 wrote:

Hi,

I tried to download the OrgDB object provided by fungidb through annotationhub. Somehow it failed. See the commands and error below. However, downloading GRanges objects working perfectly fine. Can anyone throw some light on why OrgDB failed ?

library("AnnotationHub")
hub <- AnnotationHub()

> hub
AnnotationHub with 46259 records
# snapshotDate(): 2019-05-02 
# $dataprovider: BroadInstitute, Ensembl, UCSC, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, Haemcode, FungiDB, Inparanoid8, TriTrypDB, PlasmoDB, AmoebaDB
# $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus, Rattus norvegicus, Pan troglodytes, Danio rerio, Gallus gallus, Mono...
# $rdataclass: GRanges, BigWigFile, TwoBitFile, OrgDb, Rle, ChainFile, EnsDb, Inparanoid8Db, TxDb, data.frame
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH5012"]]' 

            title                                                     
  AH5012  | Chromosome Band                                           
  AH5013  | STS Markers                                               
  AH5014  | FISH Clones                                               
  AH5015  | Recomb Rate                                               
  AH5016  | ENCODE Pilot                                              
  ...       ...                                                       
  AH73812 | org.Plasmodium_vivax.eg.sqlite                            
  AH73813 | org.Burkholderia_mallei_ATCC_23344.eg.sqlite              
  AH73814 | org.Bacillus_cereus_(strain_ATCC_14579_|_DSM_31).eg.sqlite
  AH73815 | org.Bacillus_cereus_ATCC_14579.eg.sqlite                  
  AH73816 | org.Schizosaccharomyces_cryophilus_OY26.eg.sqlite    

hub_subset <- query(hub , c("fungidb" ,"OrgDb"))

> hub_subset
AnnotationHub with 277 records
# snapshotDate(): 2019-05-02 
# $dataprovider: FungiDB
# $species: Naganishia albida, Albugo candida 2VRR, Albugo laibachii Nc14, Allomyces macrogynus ATCC 38327, Aphanomyces astaci, Aphanomyces invad...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH71411"]]' 

            title                                                         
  AH71411 | Transcript information for Albugo candida 2VRR                
  AH71412 | Transcript information for Albugo laibachii Nc14              
  AH71413 | Transcript information for Allomyces macrogynus ATCC 38327    
  AH71414 | Transcript information for Aspergillus aculeatus ATCC 16872   
  AH71415 | Transcript information for Aspergillus brasiliensis CBS 101740
  ...       ...                                                           
  AH71937 | Transcript information for Phytophthora sojae P6497           
  AH71938 | Transcript information for Pythium vexans DAOM BR484          
  AH71939 | Transcript information for Saccharomyces cerevisiae S288c     
  AH71940 | Transcript information for Scedosporium apiospermum IHEM 14462
  AH71941 | Transcript information for Yarrowia lipolytica CLIB89 W29   


> hub_subset[["AH71940"]]
downloading 1 resources
retrieving 1 resource
Downloading: 240 B     
Error: failed to load resource
  name: AH71940
  title: Transcript information for Scedosporium apiospermum IHEM 14462
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/78686’
  local file path: ‘/Users/chirag/Library/Caches/AnnotationHub/25d6f57295d_78686’
  reason: Forbidden (HTTP 403). 
2: bfcadd() failed; resource removed
  rid: BFC16
  fpath: ‘https://annotationhub.bioconductor.org/fetch/78686’
  reason: download failed 
3: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/78686’
  cache resource: ‘AH71940 : 78686’
  reason: bfcadd() failed; see warnings() 


> hub_subset[["AH71412"]]
downloading 1 resources
retrieving 1 resource
Downloading: 240 B     
Error: failed to load resource
  name: AH71412
  title: Transcript information for Albugo laibachii Nc14
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/78158’
  local file path: ‘/Users/chirag/Library/Caches/AnnotationHub/25d435cd1c6_78158’
  reason: Forbidden (HTTP 403). 
2: bfcadd() failed; resource removed
  rid: BFC17
  fpath: ‘https://annotationhub.bioconductor.org/fetch/78158’
  reason: download failed 
3: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/78158’
  cache resource: ‘AH71412 : 78158’
  reason: bfcadd() failed; see warnings()
orgdb annotationhub • 124 views
ADD COMMENTlink modified 14 days ago by shepherl ♦♦ 1.4k • written 4 weeks ago by chirag.parsania0
Answer: OrgDB download failed via AnnotationHub
0
gravatar for shepherl
4 weeks ago by
shepherl ♦♦ 1.4k
United States
shepherl ♦♦ 1.4k wrote:

There is an issue with the files. I have reached out to the maintainer of EuPathDb to hopefully get a resolution quickly.

ADD COMMENTlink written 4 weeks ago by shepherl ♦♦ 1.4k

Thanks for coming back. Waiting for your reply.

ADD REPLYlink written 4 weeks ago by chirag.parsania0

Hi @Shepherl,

I wonder if you get any updates from author.

Thanks.

ADD REPLYlink written 25 days ago by chirag.parsania0

Yes and I am working with them on the solution. There was a naming mismatch with the files and we are working on the re-upload

ADD REPLYlink written 25 days ago by shepherl ♦♦ 1.4k

Thanks ! looking forward to it

ADD REPLYlink written 24 days ago by chirag.parsania0

While we are waiting for the reupload - where you interested in any other AH ids besides the two above? I might be able to implement a temporary work around while the rest of the files are being processed?

ADD REPLYlink modified 20 days ago • written 20 days ago by shepherl ♦♦ 1.4k
1

The two above should now be downloadable - I made some manually changes while we wait for the datasets to be reloaded - if you need any more please let me know

> hub = AnnotationHub()
snapshotDate(): 2019-05-20
> hub_subset <- query(hub , c("fungidb" ,"OrgDb"))
> hub_subset[["AH71412"]]
downloading 1 resources
retrieving 1 resource
loading from cache 
    'AH71412 : 78158'


OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Albugo laibachii Nc14
| SPECIES: Albugo laibachii Nc14
| CENTRALID: GID
| Taxonomy ID: 890382
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information

ADD REPLYlink written 20 days ago by shepherl ♦♦ 1.4k

I am interested in all fungus data by fungi data (OrgDB and GRanges objects). I can wait till upload finish.

Thanks a lot. Cheers

ADD REPLYlink written 19 days ago by chirag.parsania0
Answer: OrgDB download failed via AnnotationHub
0
gravatar for shepherl
14 days ago by
shepherl ♦♦ 1.4k
United States
shepherl ♦♦ 1.4k wrote:

The maintainer has uploaded the new files. I believe everything should now be correct. If you have any further troubles please notify us here.

ADD COMMENTlink written 14 days ago by shepherl ♦♦ 1.4k

Thanks a lot. I will post here if any difficulty encountered,

~C.

ADD REPLYlink written 13 days ago by chirag.parsania0

Hi,

I encountered same error i reported before, but with different AH id

loading from cache 
    ‘AH70681 : 77427’
downloading 1 resources
retrieving 1 resource
Downloading: 240 B     
Error: failed to load resource
  name: AH70682
  title: Transcript information for Coccidioides immitis RMSCC 2394
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/77428’
  local file path: ‘/Users/chirag/Library/Caches/AnnotationHub/11135888e414_77428’
  reason: Forbidden (HTTP 403). 
2: bfcadd() failed; resource removed
  rid: BFC87
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77428’
  reason: download failed 
3: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/77428’
  cache resource: ‘AH70682 : 77428’
  reason: bfcadd() failed; see warnings()
ADD REPLYlink written 11 days ago by chirag.parsania0

Different error with different AH id

 hub[["AH71458"]]
downloading 0 resources
loading from cache 
    ‘AH71458 : 78204’
Error: failed to load resource
  name: AH71458
  title: Transcript information for Coprinopsis cinerea okayama7 130
  reason: database disk image is malformed
In addition: Warning messages:
1: Couldn't set cache size: database disk image is malformed
Use `cache_size` = NULL to turn off this warning. 
2: Couldn't set synchronous mode: database disk image is malformed
  Use `synchronous` = NULL to turn off this warning.

===============================================================

Edit :

Above problem is solved once I download with force=TRUE argument. Exact command is : hub[["AH71458" , force = TRUE]]

ADD REPLYlink modified 11 days ago • written 11 days ago by chirag.parsania0

This one there probably was a disruption when initially downloading causing a partial download. We will look into the other ERRORs

ADD REPLYlink written 11 days ago by shepherl ♦♦ 1.4k

Total failed downloads of GRanges, provided by fungidb.

  fpath: ‘https://annotationhub.bioconductor.org/fetch/77428’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77429’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77448’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77432’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77433’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77434’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77435’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77436’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77437’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77439’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77519’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77480’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77478’
  fpath: ‘https://annotationhub.bioconductor.org/fetch/77520’
ADD REPLYlink written 11 days ago by chirag.parsania0

It seems the files were not uploaded for these 14 files. I have again reached out to the maintainer to hopeful get the files uploaded. Sorry for the inconvenience.

ADD REPLYlink modified 11 days ago • written 11 days ago by shepherl ♦♦ 1.4k

Greetings, I spent some time hunting down the errors for these resources and found that they fall into two classes.

First. fungidb.org does not have transcript data for: Coccidioides.immitis.RMSCC.2394, Coccidioides.immitis.RMSCC.3703, Cryptococcus.neoformans.var.neoformans.B.3501A, Coccidioides.posadasii.CPA.0001, Coccidioides.posadasii.CPA.0020, Coccidioides.posadasii.CPA.0066, Coccidioides.posadasii.RMSCC.1037, Coccidioides.posadasii.RMSCC.1038, Coccidioides.posadasii.RMSCC.2133, Coccidioides.posadasii.RMSCC.3700, Naganishia.albida.NRRL.Y.1402, and Phytophthora.plurivora.AV1007. These should have been removed from the metadata I uploaded to AnnotationHub, but due to an error my filter failed; this has been corrected and the metadata regenerated.

Second. For a small number of species in the various eupathdb projects, including 3 from fungidb: Cryptococcus.neoformans.var.neoformans.B.3501A, Phanerochaete.chrysosporium.RP.78, and Phytophthora.capsici.LT1534; there are some utterly unexpected things in the data downloaded from the eupathdb, including random EOF entries in the middle of the data. I added logic to check for these strange cases and now have the OrgDB/TxDB/GRanges files for them.

I am committing the relevant changes now. If you wish I can upload the 18 or so new files at your leisure.

ADD REPLYlink modified 11 days ago • written 11 days ago by abelew0

Could you please retry this resource - I was able to download after chagning the permissions on the file to public.

> ah[["AH70681"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache 
    'AH70681 : 77427'
require("GenomicRanges")
GRanges object with 98134 ranges and 9 metadata columns:
          seqnames        ranges strand |   source            type     score
             <Rle>     <IRanges>  <Rle> | <factor>        <factor> <numeric>
      [1] DS016992     5125-5627      + | EuPathDB            gene      <NA>
      [2] DS016992     5125-5627      + | EuPathDB            mRNA      <NA>
      [3] DS016992     5125-5390      + | EuPathDB            exon      <NA>
      [4] DS016992     5558-5627      + | EuPathDB            exon      <NA>
      [5] DS016992     5125-5390      + | EuPathDB             CDS      <NA>
      ...      ...           ...    ... .      ...             ...       ...
  [98130] DS017001 354109-354201      + | EuPathDB three_prime_UTR      <NA>
  [98131] DS017007   77545-78036      - | EuPathDB            gene      <NA>
  [98132] DS017007   77545-78036      - | EuPathDB            mRNA      <NA>
  [98133] DS017007   77545-78036      - | EuPathDB            exon      <NA>
  [98134] DS017007   77545-78036      - | EuPathDB             CDS      <NA>
              phase                       ID          description
          <integer>              <character>          <character>
      [1]      <NA>               CIHG_04050 hypothetical protein
      [2]      <NA>         CIHG_04050-t26_1 hypothetical protein
      [3]      <NA>       exon_CIHG_04050-E1                 <NA>
      [4]      <NA>       exon_CIHG_04050-E2                 <NA>
      [5]         0 CIHG_04050-t26_1-p1-CDS1                 <NA>
      ...       ...                      ...                  ...
  [98130]      <NA>   utr_CIHG_05753-t26_1_1                 <NA>
  [98131]      <NA>               CIHG_06410 hypothetical protein
  [98132]      <NA>         CIHG_06410-t26_1 hypothetical protein
  [98133]      <NA>       exon_CIHG_06410-E1                 <NA>
  [98134]         0 CIHG_06410-t26_1-p1-CDS1                 <NA>
                    Parent   protein_source_id            Note
           <CharacterList>         <character> <CharacterList>
      [1]             <NA>                <NA>            <NA>
      [2]       CIHG_04050                <NA>            <NA>
      [3] CIHG_04050-t26_1                <NA>            <NA>
      [4] CIHG_04050-t26_1                <NA>            <NA>
      [5] CIHG_04050-t26_1 CIHG_04050-t26_1-p1            <NA>
      ...              ...                 ...             ...
  [98130] CIHG_05753-t26_1                <NA>            <NA>
  [98131]             <NA>                <NA>            <NA>
  [98132]       CIHG_06410                <NA>            <NA>
  [98133] CIHG_06410-t26_1                <NA>            <NA>
  [98134] CIHG_06410-t26_1 CIHG_06410-t26_1-p1            <NA>
  -------

If it still fails could you please also provide the results of sessionInfo()

ADD REPLYlink modified 11 days ago • written 11 days ago by shepherl ♦♦ 1.4k

Thanks shepherl. ah[["AH70681"]] working perfectly fine. However, the above 14 I mentioned are still failing to download.

Below is the summary table, showing fungidb provided OrgDB and GRanges failed downloads

# A tibble: 14 x 7
   genome                         species                                         taxonomyid GRanges OrgDb   orgdb_cols gr_cols
   <chr>                          <chr>                                                <int> <chr>   <chr>   <list>     <list> 
 1 FungiDB-42_CimmitisRMSCC2394   Coccidioides immitis RMSCC 2394                     404692 AH70682 AH71445 <NULL>     <NULL> 
 2 FungiDB-42_CimmitisRMSCC3703   Coccidioides immitis RMSCC 3703                     454286 AH70683 AH71446 <NULL>     <NULL> 
 3 FungiDB-42_CneoformansB-3501A  Cryptococcus neoformans var. neoformans B-3501A     283643 AH70702 AH71465 <NULL>     <NULL> 
 4 FungiDB-42_CposadasiiCPA0001   Coccidioides posadasii CPA 0001                     469472 AH70686 AH71449 <NULL>     <NULL> 
 5 FungiDB-42_CposadasiiCPA0020   Coccidioides posadasii CPA 0020                     490068 AH70687 AH71450 <NULL>     <NULL> 
 6 FungiDB-42_CposadasiiCPA0066   Coccidioides posadasii CPA 0066                     490069 AH70688 AH71451 <NULL>     <NULL> 
 7 FungiDB-42_CposadasiiRMSCC1037 Coccidioides posadasii RMSCC 1037                   490065 AH70689 AH71452 <NULL>     <NULL> 
 8 FungiDB-42_CposadasiiRMSCC1038 Coccidioides posadasii RMSCC 1038                   490066 AH70690 AH71453 <NULL>     <NULL> 
 9 FungiDB-42_CposadasiiRMSCC2133 Coccidioides posadasii RMSCC 2133                   469470 AH70691 AH71454 <NULL>     <NULL> 
10 FungiDB-42_CposadasiiRMSCC3700 Coccidioides posadasii RMSCC 3700                   469471 AH70693 AH71456 <NULL>     <NULL> 
11 FungiDB-42_NalbidaNRRLY1402    Naganishia albida                                   100951 AH70773 AH71536 <NULL>     <NULL> 
12 FungiDB-42_PcapsiciLT1534      Phytophthora capsici LT1534                         763924 AH70734 AH71497 <NULL>     <NULL> 
13 FungiDB-42_PchrysosporiumRP-78 Phanerochaete chrysosporium RP-78                   273507 AH70732 AH71495 <NULL>     <NULL> 
14 FungiDB-42_PplurivoraAV1007    Phytophthora plurivora                              639000 AH70774 AH71537 <NULL>     <NULL>
ADD REPLYlink modified 10 days ago • written 10 days ago by chirag.parsania0

Other than download, one more thing I would like to add is, no data given for species Candida glabrata. Though it is present in fungiDB online version

ADD REPLYlink written 10 days ago by chirag.parsania0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 209 users visited in the last hour