Question: Help for snpMatrix
0
7.8 years ago by
MLSC MANIPAL120
MLSC MANIPAL120 wrote:
Hi all, I would like to get genotype information from HapMap project. Hence, I am using snpMatrix's read.Hapmap.data, but when i try to get genotype information, it converts to numerical code form. hence can somebody please let me know how to get genotype information as it is in the HapMap file? > library(snpMatrix) > chr3_CEU <- read.HapMap.data("file://genotypes_chr3_CEU_r27_nr.b36_fwd.txt") Reading 174 samples current line [0] : rs9755941 C/T chr3 3... current line [20000] : rs34867 C/T chr3 104... current line [40000] : rs7644151 A/T chr3 2... current line [60000] : rs7632008 C/T chr3 3... current line [80000] : rs403286 A/G chr3 55... current line [100000] : rs1459920 C/T chr3 6... current line [120000] : rs1384419 A/G chr3 8... current line [140000] : rs9811293 C/T chr3 1... current line [160000] : rs10470402 C/T chr3 ... current line [180000] : rs17373504 C/T chr3 ... current line [200000] : rs7611580 A/G chr3 1... current line [220000] : rs695907 C/G chr3 16... current line [240000] : rs4854927 C/T chr3 1... last line [259411] : rs3950775 A/C chr3 1... EOF reached after 259412 snps ...conversion complete... > chr6_CEU$snp.data[,"rs2274459"] Autosomal snp(s): NA06984 NA06985 NA06986 NA06989 NA06991 NA06993 NA06994 NA06995 NA06997 NA07000 "B/B" "B/B" "A/B" "B/B" "B/B" "B/B" "B/B" "A/B" "B/B" "B/B" NA07014 NA07019 NA07022 NA07029 NA07031 NA07034 NA07037 NA07045 NA07048 NA07051 "B/B" "" "B/B" "B/B" "B/B" "" "A/B" "B/B" "" "B/B" NA07055 NA07056 NA07345 NA07346 NA07347 NA07348 NA07349 NA07357 NA07435 NA10830 "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "A/A" "A/B" NA10831 NA10835 NA10836 NA10837 NA10838 NA10839 NA10840 NA10843 NA10845 NA10846 "A/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" NA10847 NA10850 NA10851 NA10852 NA10853 NA10854 NA10855 NA10856 NA10857 NA10859 "B/B" "B/B" "" "A/B" "B/B" "B/B" "B/B" "B/B" "" "A/B" NA10860 NA10861 NA10863 NA10864 NA10865 NA11829 NA11830 NA11831 NA11832 NA11839 "" "A/B" "B/B" "B/B" "B/B" "B/B" "A/B" "B/B" "B/B" "B/B" > rs2274459_CEU<-as.vector(chr6_CEU$snp.data[,"rs2274459"]) > rs2274459_CEU [1] 03 03 02 03 03 03 03 02 03 03 03 00 03 03 03 00 02 03 00 03 03 03 03 03 03 [26] 03 03 03 01 02 02 03 03 03 03 03 03 03 03 03 03 03 00 02 03 03 03 03 00 02 [51] 00 02 03 03 03 03 02 03 03 03 03 03 02 03 03 03 03 03 03 03 02 03 03 03 03 [76] 03 03 02 03 00 03 03 03 03 02 03 03 03 02 03 03 02 03 03 00 03 02 03 03 03 [101] 03 03 03 03 02 03 03 03 03 02 03 02 03 03 03 03 03 02 02 02 03 03 02 03 03 [126] 03 03 00 02 03 03 03 03 03 03 03 02 02 03 01 03 03 03 03 03 03 02 03 03 03 [151] 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 02 03 02 Regards, mlsc [[alternative HTML version deleted]]
hapmap • 789 views
modified 7.8 years ago by Vincent J. Carey, Jr.6.3k • written 7.8 years ago by MLSC MANIPAL120
0
7.8 years ago by
United States
Vincent J. Carey, Jr.6.3k wrote:
As I understand it, read.HapMap.data is now provided in the chopsticks package. We can import directly from NCBI, and then we look at various translations of the output below > library(chopsticks) > testurl = " http://hapmap.ncbi.nlm.nih.gov/genotypes/latest/forward/non- redundant/genotypes_chr21_CEU_r27_nr.b36_fwd.txt.gz " > dem = read.HapMap.data(testurl) trying URL ' http://hapmap.ncbi.nlm.nih.gov/genotypes/latest/forward/non- redundant/genotypes_chr21_CEU_r27_nr.b36_fwd.txt.gz ' Content type 'application/x-gzip' length 3309706 bytes (3.2 Mb) opened URL ================================================== downloaded 3.2 Mb Reading 174 samples current line [0] : rs28783163 C/T chr21... current line [20000] : rs8131382 C/T chr21 ... current line [40000] : rs9979691 A/G chr21 ... current line [50982] : rs10483083 C/T chr21... EOF reached after 50983 snps ...conversion complete... > names(dem) [1] "snp.data" "snp.support" > dem$snp.data A snp.matrix with 174 rows and 50983 columns Row names: NA06984 ... NA12892 Col names: rs28783163 ... rs10483083 > cdem = as(dem$snp.data, "character") > cdem[1:5,1:5] [,1] [,2] [,3] [,4] [,5] [1,] "A/A" "A/A" "" "" "" [2,] "A/A" "A/B" "A/A" "B/B" "A/B" [3,] "A/A" "A/A" "" "" "" [4,] "A/A" "A/A" "" "" "" [5,] "A/A" "A/B" "A/A" "B/B" "A/A" > mdem = as(dem$snp.data, "matrix") > mdem[1:5,1:5] rs28783163 rs28363862 rs885550 rs1468022 rs169757 NA06984 01 01 00 00 00 NA06985 01 02 01 03 02 NA06986 01 01 00 00 00 NA06989 01 01 00 00 00 NA06991 01 02 01 03 01 > ndem = as(dem$snp.data, "numeric") > ndem[1:5,1:5] rs28783163 rs28363862 rs885550 rs1468022 rs169757 NA06984 0 0 NA NA NA NA06985 0 1 0 2 1 NA06986 0 0 NA NA NA NA06989 0 0 NA NA NA NA06991 0 1 0 2 0 The character representation is fairly obviously an answer to your query, except we need to know that A refers to the alphabetically earlier nucleotide in the coding of a diallelic locus. The matrix representation shows that the internal representation is raw() -- a single byte encodes the genotype call. The numeric representation gives the count of B alleles. If you have further questions, please provide sessionInfo() output. Mine is > sessionInfo() R Under development (unstable) (2012-02-04 r58266) Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit) locale: [1] en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US .US-ASCII attached base packages: [1] splines stats graphics grDevices datasets utils tools [8] methods base other attached packages: [1] chopsticks_1.19.4 survival_2.36-12 BiocInstaller_1.3.7 [4] weaver_1.21.0 codetools_0.2-8 digest_0.5.1 On Mon, Feb 20, 2012 at 3:27 AM, MLSC <mlscmahe@gmail.com> wrote: > Hi all, > I would like to get genotype information from HapMap project. Hence, I am > using snpMatrix's read.Hapmap.data, but when i try to get genotype > information, it converts to numerical code form. hence can somebody please > let me know how to get genotype information as it is in the HapMap file? > > > library(snpMatrix) > > chr3_CEU <- > read.HapMap.data("file://genotypes_chr3_CEU_r27_nr.b36_fwd.txt") > Reading 174 samples > current line [0] : rs9755941 C/T chr3 3... > current line [20000] : rs34867 C/T chr3 104... > current line [40000] : rs7644151 A/T chr3 2... > current line [60000] : rs7632008 C/T chr3 3... > current line [80000] : rs403286 A/G chr3 55... > current line [100000] : rs1459920 C/T chr3 6... > current line [120000] : rs1384419 A/G chr3 8... > current line [140000] : rs9811293 C/T chr3 1... > current line [160000] : rs10470402 C/T chr3 ... > current line [180000] : rs17373504 C/T chr3 ... > current line [200000] : rs7611580 A/G chr3 1... > current line [220000] : rs695907 C/G chr3 16... > current line [240000] : rs4854927 C/T chr3 1... > last line [259411] : rs3950775 A/C chr3 1... > EOF reached after 259412 snps > ...conversion complete... > > > chr6_CEU$snp.data[,"rs2274459"] > Autosomal snp(s): > NA06984 NA06985 NA06986 NA06989 NA06991 NA06993 NA06994 NA06995 NA06997 > NA07000 > "B/B" "B/B" "A/B" "B/B" "B/B" "B/B" "B/B" "A/B" "B/B" > "B/B" > NA07014 NA07019 NA07022 NA07029 NA07031 NA07034 NA07037 NA07045 NA07048 > NA07051 > "B/B" "" "B/B" "B/B" "B/B" "" "A/B" "B/B" "" > "B/B" > NA07055 NA07056 NA07345 NA07346 NA07347 NA07348 NA07349 NA07357 NA07435 > NA10830 > "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "A/A" > "A/B" > NA10831 NA10835 NA10836 NA10837 NA10838 NA10839 NA10840 NA10843 NA10845 > NA10846 > "A/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" "B/B" > "B/B" > NA10847 NA10850 NA10851 NA10852 NA10853 NA10854 NA10855 NA10856 NA10857 > NA10859 > "B/B" "B/B" "" "A/B" "B/B" "B/B" "B/B" "B/B" "" > "A/B" > NA10860 NA10861 NA10863 NA10864 NA10865 NA11829 NA11830 NA11831 NA11832 > NA11839 > "" "A/B" "B/B" "B/B" "B/B" "B/B" "A/B" "B/B" "B/B" > "B/B" > > > rs2274459_CEU<-as.vector(chr6_CEU$snp.data[,"rs2274459"]) > > rs2274459_CEU > [1] 03 03 02 03 03 03 03 02 03 03 03 00 03 03 03 00 02 03 00 03 03 03 03 > 03 03 > [26] 03 03 03 01 02 02 03 03 03 03 03 03 03 03 03 03 03 00 02 03 03 03 03 > 00 02 > [51] 00 02 03 03 03 03 02 03 03 03 03 03 02 03 03 03 03 03 03 03 02 03 03 > 03 03 > [76] 03 03 02 03 00 03 03 03 03 02 03 03 03 02 03 03 02 03 03 00 03 02 03 > 03 03 > [101] 03 03 03 03 02 03 03 03 03 02 03 02 03 03 03 03 03 02 02 02 03 03 02 > 03 03 > [126] 03 03 00 02 03 03 03 03 03 03 03 02 02 03 01 03 03 03 03 03 03 02 03 > 03 03 > [151] 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 02 03 > 02 > > Regards, > mlsc > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]