other human genomes, other SNP sets?
1
0
Entering edit mode
Paul Shannon ★ 1.1k
@paul-shannon-578
Last seen 9.6 years ago
Has anyone created a BSgenome object from the Ventner (HuRef), Watson, or other recently completed sequencing projects? Or SNPlocs data packages for these genomes? If not, can you offer any advice or cautions to me as I attempt to do so myself? Thanks - - Paul Shannon Institute for Systems Biology Seattle
Sequencing BSgenome SNPlocs BSgenome Sequencing BSgenome SNPlocs BSgenome • 1.0k views
ADD COMMENT
0
Entering edit mode
Paul Shannon ★ 1.1k
@paul-shannon-578
Last seen 9.6 years ago
'How to forge a BSgenome data package' answered my first question. Is there a comparable write up for building a SNPlocs data package? All I have found is Herve's comments to Praveen on Feb 12 2009: > The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved > from dbSNP, from this location to be precise: > > ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/ That flat file has snp information from other assemblies -- celera and HuRef, and that is some of the information I want. (See example below for one snp from ORM1 on chromosome 9.) If I want this level of detail, should I parse the original file myself? Are the parsing code and instructions for building a SNPlocs available? Thanks, - Paul rs1766074|human|9606|snp|genotype=NO|submitterlink=YES|updated 2004-10-04 13:51 ss2622917|SC_JCM|AL356796.4_74652|orient=+|ss_pick=YES SNP|alleles='C/T'|het=?|se(het)=? VAL|validated=NO|min_prob=?|max_prob=?|notwithdrawn CTG|assembly=Celera|chr=9|chr-pos=87735017|NW_924573.1|ctg- start=1222848|ctg-end=1222848|loctype=2|orient=- LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| prot_acc=NP_000598.2 LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 CTG|assembly=HuRef|chr=9|chr-pos=86692957|NW_001839236.2|ctg- start=2686784|ctg-end=2686784|loctype=2|orient=+ LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| frame=3|residue=E|aa_position=141|mrna_acc=NM_000607.2| prot_acc=NP_000598.2 LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| residue=E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 CTG|assembly=reference|chr=9|chr-pos=116127163|NT_008470.18|ctg- start=24408547|ctg-end=24408547|loctype=2|orient=- LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| prot_acc=NP_000598.2 LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 On Mar 4, 2009, at 5:12 AM, Paul Shannon wrote: > Has anyone created a BSgenome object from the Ventner (HuRef), > Watson, or other recently completed sequencing projects? Or SNPlocs > data packages for these genomes? > > If not, can you offer any advice or cautions to me as I attempt to > do so myself? > > Thanks - > > - Paul Shannon > Institute for Systems Biology > Seattle
ADD COMMENT
0
Entering edit mode
I am glad you have posed the question about SNP metadata. I would expect Herve to provide more information later, but here are a few comments 1) an updated locations package is available http://www.bioconductor.org/packages/2.4/data/annotation/html/SNPlocs. Hsapiens.dbSNP.20080617.html 2) in the SNPlocs* package(s) an inst/tools subdirectory includes the parsing and modeling tools for constructing the rda files -- i see a grep -v relevant to the Celera entries in there, and you would probably modify that 3) SNP metadata volume is a big concern for me. I would like to know of common use cases so that we can examine performance characteristics of different solutions. Some exploration of netCDF, SQLite and .rda has been conducted and for GGtools I decided derive an environment of tables of locations from SNPlocs* to facilitate plotting on genomic coordinates and export to browser tracks. ANother issue that has not been resolved AFAIK is the mapping between affy SNP identifiers propagated by crlmm and the rs-numbers used by dbSNP. until we know of a significant class of location use cases i do not see how to make decisions about additional tools or representations to be developed. for detailed query resolution SQLite seems like a good technology, but my use case involves full chromosome or genome location vectors and .rda on a relatively powerful machine seems adequate. 4) very detailed real-time queries on SNP metadata can be dealt with using biomaRt On Thu, Mar 5, 2009 at 8:13 AM, Paul Shannon <pshannon@systemsbiology.org>wrote: > 'How to forge a BSgenome data package' answered my first question. > > Is there a comparable write up for building a SNPlocs data package? > > All I have found is Herve's comments to Praveen on Feb 12 2009: > > The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved >> from dbSNP, from this location to be precise: >> >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/ >> > > That flat file has snp information from other assemblies -- celera and > HuRef, and that is some of the information I want. > (See example below for one snp from ORM1 on chromosome 9.) > > If I want this level of detail, should I parse the original file myself? > Are the parsing code and instructions for building a SNPlocs available? > > Thanks, > > - Paul > > > rs1766074|human|9606|snp|genotype=NO|submitterlink=YES|updated 2004-10-04 > 13:51 > ss2622917|SC_JCM|AL356796.4_74652|orient=+|ss_pick=YES > SNP|alleles='C/T'|het=?|se(het)=? > VAL|validated=NO|min_prob=?|max_prob=?|notwithdrawn > > > CTG|assembly=Celera|chr=9|chr-pos=87735017|NW_924573.1|ctg- start=1222848|ctg-end=1222848|loctype=2|orient=- > > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|frame=3| residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|residue= E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > > CTG|assembly=HuRef|chr=9|chr-pos=86692957|NW_001839236.2|ctg- start=2686784|ctg-end=2686784|loctype=2|orient=+ > > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|frame=3| residue=E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|residue= E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > > CTG|assembly=reference|chr=9|chr-pos=116127163|NT_008470.18|ctg- start=24408547|ctg-end=24408547|loctype=2|orient=- > > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|frame=3| residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|residue= E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > > > > On Mar 4, 2009, at 5:12 AM, Paul Shannon wrote: > > Has anyone created a BSgenome object from the Ventner (HuRef), Watson, or >> other recently completed sequencing projects? Or SNPlocs data packages for >> these genomes? >> >> If not, can you offer any advice or cautions to me as I attempt to do so >> myself? >> >> Thanks - >> >> - Paul Shannon >> Institute for Systems Biology >> Seattle >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Vince, Thanks for the pointer to the inst/tools code. Those scripts show how to create a new SNPLocs package, and can be easily modified to extract (for instance) the HuRef (Venter) snps. My needs for SNP metadata are not very clear to me at this point. Perhaps, before long, we could talk about extending the 3-column table (RefSNP_id, alleles_as_ambig, loc) to include more information. That said, Herve's package and your tips give me all I need for now. Thanks! - Paul On Mar 5, 2009, at 7:23 AM, Vincent Carey wrote: > I am glad you have posed the question about SNP metadata. I would > expect > Herve to provide more information later, but here are a few comments > > 1) an updated locations package is available http://www.bioconductor .org/packages/2.4/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20080617 .html > > 2) in the SNPlocs* package(s) an inst/tools subdirectory includes > the parsing > and modeling tools for constructing the rda files -- i see a grep -v > relevant to > the Celera entries in there, and you would probably modify that > > 3) SNP metadata volume is a big concern for me. I would like to > know of common use cases so that we can examine performance > characteristics of different solutions. > Some exploration of netCDF, SQLite and .rda has been conducted and > for GGtools > I decided derive an environment of tables of locations from SNPlocs* > to facilitate plotting on genomic coordinates and export to browser > tracks. ANother issue that > has not been resolved AFAIK is the mapping between affy SNP > identifiers propagated > by crlmm and the rs-numbers used by dbSNP. > > until we know of a significant class of location use cases i do not > see how to make > decisions about additional tools or representations to be > developed. for detailed > query resolution SQLite seems like a good technology, but my use > case involves > full chromosome or genome location vectors and .rda on a relatively > powerful machine seems adequate. > > 4) very detailed real-time queries on SNP metadata can be dealt with > using biomaRt > > On Thu, Mar 5, 2009 at 8:13 AM, Paul Shannon <pshannon at="" systemsbiology.org=""> > wrote: > 'How to forge a BSgenome data package' answered my first question. > > Is there a comparable write up for building a SNPlocs data package? > > All I have found is Herve's comments to Praveen on Feb 12 2009: > > The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved > from dbSNP, from this location to be precise: > > ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/ > > That flat file has snp information from other assemblies -- celera > and HuRef, and that is some of the information I want. > (See example below for one snp from ORM1 on chromosome 9.) > > If I want this level of detail, should I parse the original file > myself? > Are the parsing code and instructions for building a SNPlocs > available? > > Thanks, > > - Paul > > > rs1766074|human|9606|snp|genotype=NO|submitterlink=YES|updated > 2004-10-04 13:51 > ss2622917|SC_JCM|AL356796.4_74652|orient=+|ss_pick=YES > SNP|alleles='C/T'|het=?|se(het)=? > VAL|validated=NO|min_prob=?|max_prob=?|notwithdrawn > > CTG|assembly=Celera|chr=9|chr-pos=87735017|NW_924573.1|ctg- > start=1222848|ctg-end=1222848|loctype=2|orient=- > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| > frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| > prot_acc=NP_000598.2 > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| > residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > CTG|assembly=HuRef|chr=9|chr-pos=86692957|NW_001839236.2|ctg- > start=2686784|ctg-end=2686784|loctype=2|orient=+ > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| > frame=3|residue=E|aa_position=141|mrna_acc=NM_000607.2| > prot_acc=NP_000598.2 > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| > residue=E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > CTG|assembly=reference|chr=9|chr-pos=116127163|NT_008470.18|ctg- > start=24408547|ctg-end=24408547|loctype=2|orient=- > LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| > frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| > prot_acc=NP_000598.2 > LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| > residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2 > > > > > On Mar 4, 2009, at 5:12 AM, Paul Shannon wrote: > > Has anyone created a BSgenome object from the Ventner (HuRef), > Watson, or other recently completed sequencing projects? Or SNPlocs > data packages for these genomes? > > If not, can you offer any advice or cautions to me as I attempt to > do so myself? > > Thanks - > > - Paul Shannon > Institute for Systems Biology > Seattle > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6