Entering edit mode
Hi Robert,
Here is an update on the new SIFT/PolyPhen annotations.
The SIFT tool has been abandoned and a few of the original maintainers
have started the PROVEAN tool. (See http://provean.jcvi.org/index.php
for details.) The PROVEAN group has provided pre-computed SIFT and
PROVEAN scores for dbSNP 137 which I've packaged up as
SIFT.Hsapiens.dbSNP137. This should be available via biocLite() by the
end of the week and requires VariantAnnotation 1.9.13.
The PolyPhen group has a new download of pre-computed scores but they
have completely changed the format and volume. It will take some time
to
package those up. Right now, adapting ensemblVEP to support multiple
versions and to remove the perl script burden is a higher priority
than
the pre-computed PolyPhen scores. I have this on the TODO but it will
be
awhile before I can get to it.
Valerie
On 09/24/2013 12:17 PM, Valerie Obenchain wrote:
> I will update the SIFT and PolyPhen databases for the upcoming
release.
>
> Valerie
>
>
> On 09/23/2013 02:21 PM, Robert Castelo wrote:
>> hi Valerie,
>>
>> On 9/23/13 9:41 PM, Valerie Obenchain wrote:
>>> Hi Robert,
>>>
>>> Thanks for reporting this. Now fixed in VariantAnnotation 1.7.47.
>>>
>> great! thanks for the quick fix.
>>
>>> Have you looked at the ensemblVEP package? It's a wrapper to
Ensembl's
>>> Variant Effect Predictor tool. We encourage the use of ensemblVEP
>>> instead of the SIFT and PolyPhen databases because it accesses the
>>> most current information. As you know, the SIFT and PolyPhen dbs
are
>>> becoming dated and we don't have plans to package newer versions.
>>>
>>> emsemblVEP requires that you download and install the script
located
>>> here,
>>>
>>>
http://uswest.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>>
>>> The variant_effect_predictor.pl executable must be in your path.
Let
>>> us know if you have trouble with the install/setup.
>> yes, i looked at it, and i think it is a great solution for
analysis of
>> a few hundred variants as it needs to acces the internet to
download the
>> information. However, i'm working on a package that eventually
needs to
>> annotate a few thousand variants and i find the dependency on an
>> external perl script that the end user must install, somewhat
troubling.
>> let me know if you have suggestions about this.
>>
>> for software packages that need to efficiently access SIFT and
PolyPhen
>> annotations from R, freezing the data regularly is, in my opinion,
a
>> much better solution. i was actually going to ask you if you could
>> update these two packages. As much as you want to keep an up to
date
>> version of the SNPloc.Hsapiens.* or TxDb.* packages, i'd do it for
SIFT
>> and Polyphen, unless there's some licensing issue that prevents
this, as
>> it happens now with OMIM.
>>
>> cheers,
>> robert.
>>
>>> Valerie
>>>
>>> On 09/20/2013 05:25 PM, Robert Castelo wrote:
>>>> Dear list,
>>>>
>>>> interrogating the TxDb.Hsapiens.UCSC.hg19.knownGene package with
no
>>>> result gives the following expected result:
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys="dummy",
>>>> keytype="GENEID", cols="SYMBOL")
>>>> [1] GENEID
>>>> <0 rows> (or 0-length row.names)
>>>>
>>>> however, when i try the same with the annotation packages
>>>> PolyPhen.Hsapiens.dbSNP131 and SIFT.Hsapiens.dbSNP132, the select
>>>> instruction breaks into an error:
>>>>
>>>> library(SIFT.Hsapiens.dbSNP132)
>>>> library(PolyPhen.Hsapiens.dbSNP131)
>>>>
>>>> select(SIFT.Hsapiens.dbSNP132, keys=c("dummy"))
>>>> Error in data.frame(RSID = unlist(rsid), PROTEINID =
>>>> unlist(protein_id), :
>>>> arguments imply differing number of rows: 1, 0
>>>>
>>>> select(PolyPhen.Hsapiens.dbSNP131, keys="dummy")
>>>> Error in `*tmp*`$RSID : $ operator is invalid for atomic vectors
>>>>
>>>> i guess these two annotation packages should work analogously to
>>>> TxDb.Hsapiens.UCSC.hg19.knownGene, and give just a 0-row
data.frame
>>>> object, right?
>>>>
>>>> these errors reproduce also with the current devel version of
BioC,
>>>> please find below both sessionInfo() outputs.
>>>>
>>>> cheers,
>>>> robert.
>>>>
>>>> =====RELEASE====
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets
>>>> methods base
>>>>
>>>> other attached packages:
>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
GenomicFeatures_1.12.3
>>>> [3] AnnotationDbi_1.22.6 Biobase_2.20.1
>>>> [5] PolyPhen.Hsapiens.dbSNP131_1.0.2
SIFT.Hsapiens.dbSNP132_1.0.2
>>>> [7] RSQLite_0.11.4 DBI_0.2-7
>>>> [9] VariantAnnotation_1.6.7 Rsamtools_1.12.4
>>>> [11] Biostrings_2.28.0 GenomicRanges_1.12.5
>>>> [13] IRanges_1.18.3 BiocGenerics_0.6.0
>>>> [15] vimcom_0.9-8 setwidth_1.0-3
>>>> [17] colorout_1.0-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] biomaRt_2.16.0 bitops_1.0-6 BSgenome_1.28.0
>>>> RCurl_1.95-4.1 rtracklayer_1.20.4
>>>> [6] stats4_3.0.1 tools_3.0.1 XML_3.95-0.2
zlibbioc_1.6.0
>>>>
>>>>
>>>>
>>>> =====DEVEL=====
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets
>>>> methods base
>>>>
>>>> other attached packages:
>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
GenomicFeatures_1.13.40
>>>> [3] AnnotationDbi_1.23.23 Biobase_2.21.7
>>>> [5] PolyPhen.Hsapiens.dbSNP131_1.0.2
SIFT.Hsapiens.dbSNP132_1.0.2
>>>> [7] RSQLite_0.11.4 DBI_0.2-7
>>>> [9] VariantAnnotation_1.7.46 Rsamtools_1.13.41
>>>> [11] Biostrings_2.29.19 GenomicRanges_1.13.44
>>>> [13] XVector_0.1.4 IRanges_1.19.37
>>>> [15] BiocGenerics_0.7.5 vimcom_0.9-8
>>>> [17] setwidth_1.0-3 colorout_1.0-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] biomaRt_2.17.3 bitops_1.0-6 BSgenome_1.29.1
>>>> RCurl_1.95-4.1 rtracklayer_1.21.12
>>>> [6] stats4_3.0.1 tools_3.0.1 XML_3.95-0.2
zlibbioc_1.7.0
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor