How to retrieve 'conservation score' sequence?
2
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 10 hours ago
Wageningen University, Wageningen, the …
Hi, I have a list of putative transcription factor binding sites, and in order to continue with the most relevant ones for further analyses i would like to filter on 'conservation score' (the assumption is that conserved sequences are more likely to be functional than less/non conserved sequences). I have read on this, and found out that both ENSEMBL (GERP score) and UCSC Browser (multiz alignment) provide this info (although calculated using different algorithms). Moreover, in both genome browsers i can view the score. However, i don't know how to retrieve the score for a list of sequences... I was thinking/hoping that e.g. biomart could be used for this, but i could not find the appropriate filter. I am not familiar enough yet with UCSC to find a suitable way of doing this. Therefore, any pointer on how to best tackle this issue would be appreciated! TIA, Guido ------------------------------------------------ Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 internet: http://nutrigene.4t.com <http: nutrigene.4t.com=""/> email: guido.hooiveld@wur.nl [[alternative HTML version deleted]]
Transcription biomaRt Transcription biomaRt • 2.7k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Tue, Dec 16, 2008 at 2:02 PM, Hooiveld, Guido <guido.hooiveld at="" wur.nl=""> wrote: > > Hi, > > I have a list of putative transcription factor binding sites, and in > order to continue with the most relevant ones for further analyses i > would like to filter on 'conservation score' (the assumption is that > conserved sequences are more likely to be functional than less/non > conserved sequences). I have read on this, and found out that both > ENSEMBL (GERP score) and UCSC Browser (multiz alignment) provide this > info (although calculated using different algorithms). Moreover, in both > genome browsers i can view the score. However, i don't know how to > retrieve the score for a list of sequences... > > I was thinking/hoping that e.g. biomart could be used for this, but i > could not find the appropriate filter. I am not familiar enough yet with > UCSC to find a suitable way of doing this. > Therefore, any pointer on how to best tackle this issue would be > appreciated! See here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons28way/ If you are in another species, you can go to that URL and find the phastCons data. There is information in the directory and on the UCSC site about how these are calculated. Hope that helps. Sean
ADD COMMENT
0
Entering edit mode
@michael-lawrence-2759
Last seen 9.6 years ago
The rtracklayer package is capable of downloading the conservation scores from UCSC. This is particularly easy with the devel version of rtracklayer: > library(rtracklayer) Loading required package: RCurl > session <- browserSession() > track(session, "multiz28way", GenomicRanges(10000, 20000, "chr1", "hg18")) A UCSCData object with 1 cols on 9979 ranges in 1 sequences trackLine: track name=Conservation description="Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)" type=wiggle_0 To get the phastCons values from 10000 to 20000 on chr1 in the human genome. I'm working on a way to make it fast to retrieve these values for a large number of ranges (e.g. genes). On Tue, Dec 16, 2008 at 11:02 AM, Hooiveld, Guido <guido.hooiveld@wur.nl>wrote: > > Hi, > > I have a list of putative transcription factor binding sites, and in > order to continue with the most relevant ones for further analyses i > would like to filter on 'conservation score' (the assumption is that > conserved sequences are more likely to be functional than less/non > conserved sequences). I have read on this, and found out that both > ENSEMBL (GERP score) and UCSC Browser (multiz alignment) provide this > info (although calculated using different algorithms). Moreover, in both > genome browsers i can view the score. However, i don't know how to > retrieve the score for a list of sequences... > > I was thinking/hoping that e.g. biomart could be used for this, but i > could not find the appropriate filter. I am not familiar enough yet with > UCSC to find a suitable way of doing this. > Therefore, any pointer on how to best tackle this issue would be > appreciated! > > TIA, > Guido > > > ------------------------------------------------ > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > internet: http://nutrigene.4t.com <http: nutrigene.4t.com=""/> > email: guido.hooiveld@wur.nl > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6