sequences seemed to be not correctly cataloged in package "pd.genomewidesnp.6"
1
0
Entering edit mode
li lilingdu ▴ 450
@li-lilingdu-1884
Last seen 5.9 years ago
Dear List, I used the following R code to extrace sequence information of a particular probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets I tested there were only 2 unique PM sequences for each probeset. It appears that the PM sequences were not correctly catalogued. =============== library(pd.genomewidesnp.6) db(pd.genomewidesnp.6)->kao dbGetQuery(kao,"SELECT * from featureSet limit 100")$"man_fsetid"->probesets result<-vector("list",length(probesets)) names(result)<-probesets for(ind in 1:length(result)){ dbGetQuery(kao,paste("SELECT * from featureSet where man_fsetid='",names(result)[ind],"'",sep=""))$fsetid->fsetid dbGetQuery(kao, paste("select * from pmfeature where fsetid=",fsetid,sep=""))->pm.100 c(pm.100$fid)->totiao paste("fid=",paste(totiao,collapse=" or fid="),sep="")->totiao paste("SELECT * from sequence where ",totiao,sep="")->totiao dbGetQuery(kao,totiao)->seq result[[ind]]<-seq } sapply(result, function(xxx) length(unique(xxx$seq))) =================== sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] pd.genomewidesnp.6_0.4.2 oligoClasses_1.4.0 Biobase_2.2.1 RSQLite_0.7-1 [5] DBI_0.2-4 ============== LiGang [[alternative HTML version deleted]]
• 1.1k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
what is "not correct"? working directly from affymetrix annotation probe_tab text file, we have PROBESET_ID PROBE_X_POS PROBE_Y_POS PROBE_INTERROGATION_POSITION 1 SNP_A-1780270 2380 1757 3 2 SNP_A-1780270 2381 1757 3 3 SNP_A-1780270 2626 421 3 4 SNP_A-1780270 2627 421 3 5 SNP_A-1780270 540 1827 3 6 SNP_A-1780270 541 1827 3 7 SNP_A-1780270 694 338 3 8 SNP_A-1780270 695 338 3 PROBE_SEQUENCE TARGET_STRANDEDNESS PROBE_TYPE ALLELE 1 TTGTTAAGCAAGTGACTTATTTTAT f PM G 2 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 3 TTGTTAAGCAAGTGACTTATTTTAT f PM G 4 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 5 TTGTTAAGCAAGTGACTTATTTTAT f PM G 6 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 7 TTGTTAAGCAAGTGACTTATTTTAT f PM G 8 TTGTTAAGCAAGTGAGTTATTTTAT f PM C there are 4 replicates of each sequence checking the pd.genomewide package, following your code snippet, we have > dbGetQuery(kao, "select man_fsetid, fsetid from featureSet where man_fsetid = 'SNP_A-1780270'") man_fsetid fsetid 1 SNP_A-1780270 326067 > dbGetQuery(kao, "select * from pmfeature where fsetid = '326067'") fid strand allele fsetid pos x y 1 906535 0 1 326067 6 694 338 2 906536 0 0 326067 5 695 338 3 1130907 0 1 326067 8 2626 421 4 1130908 0 0 326067 7 2627 421 5 4711141 0 1 326067 4 2380 1757 6 4711142 0 0 326067 3 2381 1757 7 4896901 0 1 326067 2 540 1827 8 4896902 0 0 326067 1 541 1827 now we know the fids of the probes we looked at in the original data > dbGetQuery(kao, "select * from sequence where fid = '4711141'") fid offset tstrand tallele seq 1 4711141 3 f G TTGTTAAGCAAGTGACTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '4711142'") fid offset tstrand tallele seq 1 4711142 3 f C TTGTTAAGCAAGTGAGTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '1130907'") fid offset tstrand tallele seq 1 1130907 3 f G TTGTTAAGCAAGTGACTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '1130908'") fid offset tstrand tallele seq 1 1130908 3 f C TTGTTAAGCAAGTGAGTTATTTTAT what is incorrect? On Wed, Jan 7, 2009 at 2:22 PM, LiGang <luzifer.li@gmail.com> wrote: > Dear List, > > I used the following R code to extrace sequence information of a particular > probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets > I tested there were only 2 unique PM sequences for each probeset. It > appears that the PM sequences were not correctly catalogued. > > =============== > library(pd.genomewidesnp.6) > > db(pd.genomewidesnp.6)->kao > > dbGetQuery(kao,"SELECT * from featureSet limit > 100")$"man_fsetid"->probesets > result<-vector("list",length(probesets)) > names(result)<-probesets > > for(ind in 1:length(result)){ > > dbGetQuery(kao,paste("SELECT * from featureSet where > man_fsetid='",names(result)[ind],"'",sep=""))$fsetid->fsetid > > dbGetQuery(kao, paste("select * from pmfeature where > fsetid=",fsetid,sep=""))->pm.100 > > c(pm.100$fid)->totiao > paste("fid=",paste(totiao,collapse=" or fid="),sep="")->totiao > paste("SELECT * from sequence where ",totiao,sep="")->totiao > dbGetQuery(kao,totiao)->seq > > result[[ind]]<-seq > > } > > > sapply(result, function(xxx) length(unique(xxx$seq))) > > =================== > > sessionInfo() > > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] pd.genomewidesnp.6_0.4.2 oligoClasses_1.4.0 > Biobase_2.2.1 RSQLite_0.7-1 > [5] DBI_0.2-4 > ============== > > > LiGang > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 576 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6