RE : RE : maping SNPs
1
0
Entering edit mode
SimonNoël ▴ 450
@simonnoel-3455
Last seen 9.6 years ago
Thank you. Sorry for the delay. As I say, on my computer, thing run really slowy... I have no error but I have some warning and no result... library("IRanges") library("GenomicRanges") library("GenomicFeatures") #?? changer si une version plus r??cente de la librairie est t??l??charg??e. library(SNPlocs.Hsapiens.dbSNP.20101109) > getSNPcount() ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 c h10 1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129 995075 1158707 ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 c h20 1147722 1105364 815729 740129 657719 757926 641905 645646 520666 586708 ch21 ch22 chX chY chMT 338254 331060 529608 67438 624 > ch22snps <- getSNPlocs("ch22") > ch22snps[1:5, ] RefSNP_id alleles_as_ambig loc 1 56342815 K 16050353 2 7288968 S 16050994 3 6518357 M 16051107 4 7292503 R 16051209 5 6518368 Y 16051241 > > > makeGRangesFromRefSNPids <- function(myids) + { + ans_seqnames <- character(length(myids)) + ans_seqnames[] <- "unknown" + ans_locs <- integer(length(myids)) + for (seqname in names(getSNPcount())) { + locs <- getSNPlocs(seqname) + ids <- paste("rs", locs$RefSNP_id, sep="") + myrows <- match(myids, ids) + ans_seqnames[!is.na(myrows)] <- seqname + ans_locs[!is.na(myrows)] <- locs$loc[myrows] + } + GRanges(seqnames=ans_seqnames, + IRanges(start=ans_locs, width=1), + RefSNP_id=myids) + } > > > myids <- c("rs7547453", "rs2840542", "rs1999527", "rs4648545", "rs10915459", "rs16838750", "rs12128230", "rs999999999") > mysnps <- makeGRangesFromRefSNPids(myids) Warning message: In ans_locs[!is.na(myrows)] <- locs$loc[myrows] : number of items to replace is not a multiple of replacement length > mysnps # a GRanges object with 1 SNP per row GRangeswith 8 ranges and 1 elementMetadata value seqnames ranges strand | RefSNP_id <rle> <iranges> <rle> | <character> [1] ch1 [2195117, 2195117] * | rs7547453 [2] ch1 [2291680, 2291680] * | rs2840542 [3] ch1 [3256108, 3256108] * | rs1999527 [4] ch1 [3577321, 3577321] * | rs4648545 [5] ch1 [4230463, 4230463] * | rs10915459 [6] ch1 [4404344, 4404344] * | rs16838750 [7] ch1 [4501911, 4501911] * | rs12128230 [8] unknown [ 0, 0] * | rs999999999 seqlengths ch1 unknown NA NA > txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") Downloadthe refGene table ... OK Downloadthe refLink table ... OK Extractthe 'transcripts' data frame ... OK Extractthe 'splicings' data frame ... OK Downloadand preprocess the 'chrominfo' data frame ... OK Preparethe 'metadata' data frame ... OK Makethe TranscriptDb object ... OK Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers) > txdb TranscriptDbobject: | Db type: TranscriptDb | Data source: UCSC | Genome: hg19 | UCSC Table: refGene | Type of Gene ID: Entrez Gene ID | Full dataset: yes | transcript_nrow: 38098 | exon_nrow: 230201 | cds_nrow: 204683 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011) | GenomicFeatures version at creation time: 1.2.3 | RSQLite version at creation time: 0.9-4 | DBSCHEMAVERSION: 1.0 > > #voir pour exonsBy() ?? la place de tx > txExon = exonsBy(txdb, by= "tx") > txExon GRangesListof length 38098 $1 GRanges with 25 ranges and 3 elementMetadata values seqnames ranges strand | exon_id exon_name exon_rank <rle> <iranges> <rle> | <integer> <character> <integer> [1] chr1 [66999825, 67000051] + | 1 NA 1 [2] chr1 [67091530, 67091593] + | 2 NA 2 [3] chr1 [67098753, 67098777] + | 3 NA 3 [4] chr1 [67101627, 67101698] + | 4 NA 4 [5] chr1 [67105460, 67105516] + | 5 NA 5 [6] chr1 [67108493, 67108547] + | 6 NA 6 [7] chr1 [67109227, 67109402] + | 7 NA 7 [8] chr1 [67126196, 67126207] + | 8 NA 8 [9] chr1 [67133213, 67133224] + | 9 NA 9 ... ... ... ... ... ... ... ... [17] chr1 [67155873, 67155999] + | 17 NA 17 [18] chr1 [67161117, 67161176] + | 18 NA 18 [19] chr1 [67184977, 67185088] + | 19 NA 19 [20] chr1 [67194947, 67195102] + | 20 NA 20 [21] chr1 [67199431, 67199563] + | 21 NA 21 [22] chr1 [67205018, 67205220] + | 22 NA 22 [23] chr1 [67206341, 67206405] + | 23 NA 23 [24] chr1 [67206955, 67207119] + | 24 NA 24 [25] chr1 [67208756, 67210768] + | 25 NA 25 ... <38097 more elements> seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 > > tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id")) > tx # a GRanges object with 1 transcript per row GRangeswith 38098 ranges and 3 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [ 69091, 70008] + | 1021 NM_001005484 [2] chr1 [323892, 328581] + | 1023 NR_028327 [3] chr1 [323892, 328581] + | 1024 NR_028322 [4] chr1 [323892, 328581] + | 1025 NR_028325 [5] chr1 [367659, 368597] + | 1022 NM_001005277 [6] chr1 [367659, 368597] + | 1026 NM_001005221 [7] chr1 [367659, 368597] + | 1027 NM_001005224 [8] chr1 [763064, 789740] + | 174 NR_015368 [9] chr1 [861121, 879961] + | 1035 NM_152486 ... ... ... ... ... ... ... [38090] chrY [27177050, 27198251] - | 18991 NM_004678 [38091] chrY [27177050, 27198251] - | 18992 NM_001002760 [38092] chrY [27177050, 27198251] - | 18993 NM_001002761 [38093] chrY [27209230, 27246039] - | 18994 NR_001525 [38094] chrY [27209230, 27246039] - | 18995 NR_002177 [38095] chrY [27209230, 27246039] - | 18996 NR_002178 [38096] chrY [27329790, 27330920] - | 18997 NR_001526 [38097] chrY [27329790, 27330920] - | 18998 NR_002179 [38098] chrY [27329790, 27330920] - | 18999 NR_002180 gene_id <compressedcharacterlist> [1] 79501 [2] 100133331 [3] 100132287 [4] 100132062 [5] 81399 [6] 729759 [7] 26683 [8] 643837 [9] 148398 ... ... [38090] 9083 [38091] 442867 [38092] 442868 [38093] 114761 [38094] 474150 [38095] 474149 [38096] 252949 [38097] 474152 [38098] 474151 seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 > > seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps)) > map <- as.matrix(findOverlaps(mysnps, tx)) Message d'avis : In .local(query, subject, maxgap, minoverlap, type, select, ...) : No seqnames from the 'query' and 'subject' were identical > mapExon <- as.matrix(findOverlaps(mysnps, txExon)) Message d'avis : In .local(query, subject, maxgap, minoverlap, type, select, ...) : No seqnames from the 'query' and 'subject' were identical > > mapped_genes <- values(tx)$gene_id[map[, 2]] > mapped_snps <- rep.int(values(mysnps)$RefSNP_id[map[, 1]], elementLengths(mapped_genes)) > snp2gene <- unique(data.frame(snp_id=mapped_snps, gene_id=unlist(mapped_genes))) > rownames(snp2gene) <- NULL > snp2gene[1:4, ] snp_id gene_id NA <na> <na> NA.1 <na> <na> NA.2 <na> <na> NA.3 <na> <na> On our super computer, I can't update manualy the package. They are suposed to update automatically once per month according to our informactician. The result of sessionInfo() is : > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attachedbase packages: [1] stats graphics grDevices utils datasets methods base otherattached packages: [1] GenomicFeatures_1.2.3 [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2 [3] GenomicRanges_1.2.2 [4] IRanges_1.8.7 loadedvia a namespace (and not attached): [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18. 2 [5] DBI_0.2-5 RCurl_1.5-0 RSQLite_0.9-4 rtracklayer_1.10.6 [9] tools_2.12.0 XML_3.2-0 On my computer , something stragge happen. The package get unstalled when I run source([1]"http://bioconductor.org/biocLite.R") update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE) And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now get an error message... > source("[2]http://www.bioconductor.org/biocLite.R") BioC_mirror= [3]http://www.bioconductor.org Change using chooseBioCmirror(). > biocLite("SNPlocs.Hsapiens.dbSNP.20101109") Using R version 2.12.0, biocinstall version 2.7.4. InstallingBioconductor version 2.7 packages: [1] "SNPlocs.Hsapiens.dbSNP.20101109" Pleasewait... Message d'avis : In getDependencies(pkgs, dependencies, available, lib) : package ???SNPlocs.Hsapiens.dbSNP.20101109??? is not available But for sessionInfo, it's : > sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=French_Canada.1252 LC_CTYPE=French_Canada.1252 [3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C [5] LC_TIME=French_Canada.1252 attachedbase packages: [1] stats graphics grDevices utils datasets methods base otherattached packages: [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2 IRanges_1.8.8 loadedvia a namespace (and not attached): [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 [5] DBI_0.2-5 RCurl_1.5-0.1 RSQLite_0.9-4 rtracklayer_1.10.6 [9] XML_3.2-0.2 Now I have a super computer that can't run the program because of a subscript out of bounds and a personal computer than have a missing library... So now nothing work:( Other problem : why when I install again IRanges I got a version older than your for exemple? Simon No??l CdeC References 1. http://bioconductor.org/biocLite.R 2. http://www.bioconductor.org/biocLite.R 3. http://www.bioconductor.org/
SNP SNPlocs TranscriptDb IRanges GenomicFeatures SNP SNPlocs TranscriptDb IRanges • 1.5k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States
Hi Simon, Please see responses below. On 01/06/2011 09:47 AM, Simon Noël wrote: > Thank you. Sorry for the delay. As I say, on my > computer, thing run really slowy... I have no error but I have some > warning and no result... > > > > library("IRanges") > library("GenomicRanges") > library("GenomicFeatures") > #À changer si une version plus récente de la librairie est téléchargée. > library(SNPlocs.Hsapiens.dbSNP.20101109) > > > getSNPcount() > ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 > > 1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129 > 995075 1158707 > ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 ch20 > > 1147722 1105364 815729 740129 657719 757926 641905 645646 > 520666 586708 > ch21 ch22 chX chY chMT > 338254 331060 529608 67438 624 > > ch22snps <- getSNPlocs("ch22") > > ch22snps[1:5, ] > RefSNP_id alleles_as_ambig loc > 1 56342815 K 16050353 > 2 7288968 S 16050994 > 3 6518357 M 16051107 > 4 7292503 R 16051209 > 5 6518368 Y 16051241 > > > > > > makeGRangesFromRefSNPids <- function(myids) > + { > + ans_seqnames <- character(length(myids)) > + ans_seqnames[] <- "unknown" > + ans_locs <- integer(length(myids)) > + for (seqname in names(getSNPcount())) { > + locs <- getSNPlocs(seqname) > + ids <- paste("rs", locs$RefSNP_id, sep="") > + myrows <- match(myids, ids) > + ans_seqnames[!is.na(myrows)] <- seqname > + ans_locs[!is.na(myrows)] <- locs$loc[myrows] > + } > + GRanges(seqnames=ans_seqnames, > + IRanges(start=ans_locs, width=1), > + RefSNP_id=myids) > + } > > > > > > myids <- c("rs7547453", "rs2840542", "rs1999527", "rs4648545", > "rs10915459", "rs16838750", "rs12128230", "rs999999999") > > mysnps <- makeGRangesFromRefSNPids(myids) > Warning message: > In ans_locs[!is.na(myrows)] <- locs$loc[myrows] : > number of items to replace is not a multiple of replacement length > > mysnps # a GRanges object with 1 SNP per row > GRangeswith 8 ranges and 1 elementMetadata value > seqnames ranges strand | RefSNP_id > <rle> <iranges> <rle> | <character> > [1] ch1 [2195117, 2195117] * | rs7547453 > [2] ch1 [2291680, 2291680] * | rs2840542 > [3] ch1 [3256108, 3256108] * | rs1999527 > [4] ch1 [3577321, 3577321] * | rs4648545 > [5] ch1 [4230463, 4230463] * | rs10915459 > [6] ch1 [4404344, 4404344] * | rs16838750 > [7] ch1 [4501911, 4501911] * | rs12128230 > [8] unknown [ 0, 0] * | rs999999999 > seqlengths > ch1 unknown > NA NA > > txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") > Downloadthe refGene table ... OK > Downloadthe refLink table ... OK > Extractthe 'transcripts' data frame ... OK > Extractthe 'splicings' data frame ... OK > Downloadand preprocess the 'chrominfo' data frame ... OK > Preparethe 'metadata' data frame ... OK > Makethe TranscriptDb object ... OK > Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers) > > txdb > TranscriptDbobject: > | Db type: TranscriptDb > | Data source: UCSC > | Genome: hg19 > | UCSC Table: refGene > | Type of Gene ID: Entrez Gene ID > | Full dataset: yes > | transcript_nrow: 38098 > | exon_nrow: 230201 > | cds_nrow: 204683 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011) > | GenomicFeatures version at creation time: 1.2.3 > | RSQLite version at creation time: 0.9-4 > | DBSCHEMAVERSION: 1.0 > > > > #voir pour exonsBy() à la place de tx > > txExon = exonsBy(txdb, by= "tx") > > txExon > GRangesListof length 38098 > $1 > GRanges with 25 ranges and 3 elementMetadata values > seqnames ranges strand > | exon_id exon_name exon_rank > <rle> <iranges> <rle> | <integer> <character> > <integer> > [1] chr1 [66999825, 67000051] + | 1 > NA 1 > [2] chr1 [67091530, 67091593] + | 2 > NA 2 > [3] chr1 [67098753, 67098777] + | 3 > NA 3 > [4] chr1 [67101627, 67101698] + | 4 > NA 4 > [5] chr1 [67105460, 67105516] + | 5 > NA 5 > [6] chr1 [67108493, 67108547] + | 6 > NA 6 > [7] chr1 [67109227, 67109402] + | 7 > NA 7 > [8] chr1 [67126196, 67126207] + | 8 > NA 8 > [9] chr1 [67133213, 67133224] + | 9 > NA 9 > ... ... ... ... ... ... > ... ... > [17] chr1 [67155873, 67155999] + | 17 > NA 17 > [18] chr1 [67161117, 67161176] + | 18 > NA 18 > [19] chr1 [67184977, 67185088] + | 19 > NA 19 > [20] chr1 [67194947, 67195102] + | 20 > NA 20 > [21] chr1 [67199431, 67199563] + | 21 > NA 21 > [22] chr1 [67205018, 67205220] + | 22 > NA 22 > [23] chr1 [67206341, 67206405] + | 23 > NA 23 > [24] chr1 [67206955, 67207119] + | 24 > NA 24 > [25] chr1 [67208756, 67210768] + | 25 > NA 25 > ... > <38097 more elements> > > seqlengths > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > > > > tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id")) > > tx # a GRanges object with 1 transcript per row > GRangeswith 38098 ranges and 3 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr1 [ 69091, 70008] + | 1021 NM_001005484 > [2] chr1 [323892, 328581] + | 1023 NR_028327 > [3] chr1 [323892, 328581] + | 1024 NR_028322 > [4] chr1 [323892, 328581] + | 1025 NR_028325 > [5] chr1 [367659, 368597] + | 1022 NM_001005277 > [6] chr1 [367659, 368597] + | 1026 NM_001005221 > [7] chr1 [367659, 368597] + | 1027 NM_001005224 > [8] chr1 [763064, 789740] + | 174 NR_015368 > [9] chr1 [861121, 879961] + | 1035 NM_152486 > ... ... ... ... ... ... ... > [38090] chrY [27177050, 27198251] - | 18991 NM_004678 > [38091] chrY [27177050, 27198251] - | 18992 NM_001002760 > [38092] chrY [27177050, 27198251] - | 18993 NM_001002761 > [38093] chrY [27209230, 27246039] - | 18994 NR_001525 > [38094] chrY [27209230, 27246039] - | 18995 NR_002177 > [38095] chrY [27209230, 27246039] - | 18996 NR_002178 > [38096] chrY [27329790, 27330920] - | 18997 NR_001526 > [38097] chrY [27329790, 27330920] - | 18998 NR_002179 > [38098] chrY [27329790, 27330920] - | 18999 NR_002180 > gene_id > <compressedcharacterlist> > [1] 79501 > [2] 100133331 > [3] 100132287 > [4] 100132062 > [5] 81399 > [6] 729759 > [7] 26683 > [8] 643837 > [9] 148398 > ... ... > [38090] 9083 > [38091] 442867 > [38092] 442868 > [38093] 114761 > [38094] 474150 > [38095] 474149 > [38096] 252949 > [38097] 474152 > [38098] 474151 > seqlengths > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > > > > seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps)) > > map <- as.matrix(findOverlaps(mysnps, tx)) > Message d'avis : > In .local(query, subject, maxgap, minoverlap, type, select, ...) : > No seqnames from the 'query' and 'subject' were identical It looks like the seqnames in mysnps were not renamed to the 'chr' prefix. Prior to running findOverlaps, check the seqnames in both objects to make sure the ranges you are interested in mapping have identical seqnames, > unique(seqnames(mysnps)) [1] chr1 unknown Levels: chr1 unknown > unique(seqnames(tx)) [1] chr1 chr10 chr11 [4] chr12 chr13 chr14 .... and many more. Both objects have 'chr1' and when we run findOverlaps, > map <- as.matrix(findOverlaps(mysnps, tx)) Warning message: In .local(query, subject, maxgap, minoverlap, type, select, ...) : Only some seqnames from 'query' and 'subject' were not identical This warning makes sense because mysnps only has 'chr1' and 'unknown' to map to tx but there are many more seqnames in tx. Additionally, tx does not have an 'unknown' seqname. > map query subject [1,] 1 49 [2,] 2 1944 [3,] 3 62 [4,] 3 63 [5,] 4 67 > > mapExon <- as.matrix(findOverlaps(mysnps, txExon)) > Message d'avis : > In .local(query, subject, maxgap, minoverlap, type, select, ...) : > No seqnames from the 'query' and 'subject' were identical > > > > mapped_genes <- values(tx)$gene_id[map[, 2]] > > mapped_snps <- rep.int(values(mysnps)$RefSNP_id[map[, 1]], > elementLengths(mapped_genes)) > > snp2gene <- unique(data.frame(snp_id=mapped_snps, > gene_id=unlist(mapped_genes))) > > rownames(snp2gene) <- NULL > > snp2gene[1:4, ] > snp_id gene_id > NA <na> <na> > NA.1 <na> <na> > NA.2 <na> <na> > NA.3 <na> <na> > > On our super computer, I can't update manualy the package. They > are suposed to update automatically once per month according to our > informactician. The result of sessionInfo() is : > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-pc-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > attachedbase packages: > [1] stats graphics grDevices utils datasets methods base > otherattached packages: > [1] GenomicFeatures_1.2.3 > [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2 > [3] GenomicRanges_1.2.2 > [4] IRanges_1.8.7 > loadedvia a namespace (and not attached): > [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 > [5] DBI_0.2-5 RCurl_1.5-0 > RSQLite_0.9-4 rtracklayer_1.10.6 > [9] tools_2.12.0 XML_3.2-0 > > On my computer , something stragge happen. The > package get unstalled when I run > | source("http://bioconductor.org/biocLite.R") > update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE) > | > > And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now get > an error message... > > > source("http://www.bioconductor.org/biocLite.R") > BioC_mirror= http://www.bioconductor.org > Change using chooseBioCmirror(). > > biocLite("SNPlocs.Hsapiens.dbSNP.20101109") > Using R version 2.12.0, biocinstall version 2.7.4. > InstallingBioconductor version 2.7 packages: > [1] "SNPlocs.Hsapiens.dbSNP.20101109" > Pleasewait... > Message d'avis : > In getDependencies(pkgs, dependencies, available, lib) : > package ‘SNPlocs.Hsapiens.dbSNP.20101109’ is not available > > > > > > > > But for sessionInfo, it's : > > > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=French_Canada.1252 LC_CTYPE=French_Canada.1252 > [3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C > [5] LC_TIME=French_Canada.1252 > > attachedbase packages: > [1] stats graphics grDevices utils datasets methods base > > otherattached packages: > [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2 IRanges_1.8.8 > > loadedvia a namespace (and not attached): > [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 > > [5] DBI_0.2-5 RCurl_1.5-0.1 RSQLite_0.9-4 > rtracklayer_1.10.6 > [9] XML_3.2-0.2 > > > > Now I have a super computer that can't run the program because of > a subscript out of bounds and a personal computer than have a missing > library... So now nothing work:( Other problem : why when > I install again IRanges I got a version older than your for exemple? The code should run on the linux machine (super computer) once you address the seqnames problem above. You are having a problem installing SNPlocs.Hsapiens.dbSNP.20101109 on your Windows pc because a Windows binary of this package is not currently available. We should have this available and I will let you know as soon as it is ready. Sorry about that. I had a newer version of IRanges in my last email because I was using R-2.13, the devel branch. When using R-2.12 I have, > sessionInfo() R version 2.12.0 Patched (2010-10-18 r53360) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.2.3 [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2 [3] GenomicRanges_1.2.2 [4] IRanges_1.8.8 loaded via a namespace (and not attached): [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 [5] DBI_0.2-5 RCurl_1.5-0 RSQLite_0.9-4 rtracklayer_1.10.6 [9] tools_2.12.0 XML_3.2-0 Valerie > > Simon Noël > CdeC [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Simon, The windows binary of SNPlocs.Hsapiens.dbSNP.20101109 is now available. Valerie On 01/07/11 10:48, Valerie Obenchain wrote: > Hi Simon, > > Please see responses below. > > > > On 01/06/2011 09:47 AM, Simon No?l wrote: > >> Thank you. Sorry for the delay. As I say, on my >> computer, thing run really slowy... I have no error but I have some >> warning and no result... >> >> >> >> library("IRanges") >> library("GenomicRanges") >> library("GenomicFeatures") >> #? changer si une version plus r?cente de la librairie est t?l?charg?e. >> library(SNPlocs.Hsapiens.dbSNP.20101109) >> >> >>> getSNPcount() >>> >> ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 >> >> 1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129 >> 995075 1158707 >> ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 ch20 >> >> 1147722 1105364 815729 740129 657719 757926 641905 645646 >> 520666 586708 >> ch21 ch22 chX chY chMT >> 338254 331060 529608 67438 624 >> >>> ch22snps<- getSNPlocs("ch22") >>> ch22snps[1:5, ] >>> >> RefSNP_id alleles_as_ambig loc >> 1 56342815 K 16050353 >> 2 7288968 S 16050994 >> 3 6518357 M 16051107 >> 4 7292503 R 16051209 >> 5 6518368 Y 16051241 >> >>> >>> makeGRangesFromRefSNPids<- function(myids) >>> >> + { >> + ans_seqnames<- character(length(myids)) >> + ans_seqnames[]<- "unknown" >> + ans_locs<- integer(length(myids)) >> + for (seqname in names(getSNPcount())) { >> + locs<- getSNPlocs(seqname) >> + ids<- paste("rs", locs$RefSNP_id, sep="") >> + myrows<- match(myids, ids) >> + ans_seqnames[!is.na(myrows)]<- seqname >> + ans_locs[!is.na(myrows)]<- locs$loc[myrows] >> + } >> + GRanges(seqnames=ans_seqnames, >> + IRanges(start=ans_locs, width=1), >> + RefSNP_id=myids) >> + } >> >>> >>> myids<- c("rs7547453", "rs2840542", "rs1999527", "rs4648545", >>> >> "rs10915459", "rs16838750", "rs12128230", "rs999999999") >> >>> mysnps<- makeGRangesFromRefSNPids(myids) >>> >> Warning message: >> In ans_locs[!is.na(myrows)]<- locs$loc[myrows] : >> number of items to replace is not a multiple of replacement length >> >>> mysnps # a GRanges object with 1 SNP per row >>> >> GRangeswith 8 ranges and 1 elementMetadata value >> seqnames ranges strand | RefSNP_id >> <rle> <iranges> <rle> |<character> >> [1] ch1 [2195117, 2195117] * | rs7547453 >> [2] ch1 [2291680, 2291680] * | rs2840542 >> [3] ch1 [3256108, 3256108] * | rs1999527 >> [4] ch1 [3577321, 3577321] * | rs4648545 >> [5] ch1 [4230463, 4230463] * | rs10915459 >> [6] ch1 [4404344, 4404344] * | rs16838750 >> [7] ch1 [4501911, 4501911] * | rs12128230 >> [8] unknown [ 0, 0] * | rs999999999 >> seqlengths >> ch1 unknown >> NA NA >> >>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") >>> >> Downloadthe refGene table ... OK >> Downloadthe refLink table ... OK >> Extractthe 'transcripts' data frame ... OK >> Extractthe 'splicings' data frame ... OK >> Downloadand preprocess the 'chrominfo' data frame ... OK >> Preparethe 'metadata' data frame ... OK >> Makethe TranscriptDb object ... OK >> Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers) >> >>> txdb >>> >> TranscriptDbobject: >> | Db type: TranscriptDb >> | Data source: UCSC >> | Genome: hg19 >> | UCSC Table: refGene >> | Type of Gene ID: Entrez Gene ID >> | Full dataset: yes >> | transcript_nrow: 38098 >> | exon_nrow: 230201 >> | cds_nrow: 204683 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011) >> | GenomicFeatures version at creation time: 1.2.3 >> | RSQLite version at creation time: 0.9-4 >> | DBSCHEMAVERSION: 1.0 >> >>> #voir pour exonsBy() ? la place de tx >>> txExon = exonsBy(txdb, by= "tx") >>> txExon >>> >> GRangesListof length 38098 >> $1 >> GRanges with 25 ranges and 3 elementMetadata values >> seqnames ranges strand >> | exon_id exon_name exon_rank >> <rle> <iranges> <rle> |<integer> <character> >> <integer> >> [1] chr1 [66999825, 67000051] + | 1 >> NA 1 >> [2] chr1 [67091530, 67091593] + | 2 >> NA 2 >> [3] chr1 [67098753, 67098777] + | 3 >> NA 3 >> [4] chr1 [67101627, 67101698] + | 4 >> NA 4 >> [5] chr1 [67105460, 67105516] + | 5 >> NA 5 >> [6] chr1 [67108493, 67108547] + | 6 >> NA 6 >> [7] chr1 [67109227, 67109402] + | 7 >> NA 7 >> [8] chr1 [67126196, 67126207] + | 8 >> NA 8 >> [9] chr1 [67133213, 67133224] + | 9 >> NA 9 >> ... ... ... ... ... ... >> ... ... >> [17] chr1 [67155873, 67155999] + | 17 >> NA 17 >> [18] chr1 [67161117, 67161176] + | 18 >> NA 18 >> [19] chr1 [67184977, 67185088] + | 19 >> NA 19 >> [20] chr1 [67194947, 67195102] + | 20 >> NA 20 >> [21] chr1 [67199431, 67199563] + | 21 >> NA 21 >> [22] chr1 [67205018, 67205220] + | 22 >> NA 22 >> [23] chr1 [67206341, 67206405] + | 23 >> NA 23 >> [24] chr1 [67206955, 67207119] + | 24 >> NA 24 >> [25] chr1 [67208756, 67210768] + | 25 >> NA 25 >> ... >> <38097 more elements> >> >> seqlengths >> chr1 chr2 ... chr18_gl000207_random >> 249250621 243199373 ... 4262 >> >>> tx<- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id")) >>> tx # a GRanges object with 1 transcript per row >>> >> GRangeswith 38098 ranges and 3 elementMetadata values >> seqnames ranges strand | tx_id tx_name >> <rle> <iranges> <rle> |<integer> <character> >> [1] chr1 [ 69091, 70008] + | 1021 NM_001005484 >> [2] chr1 [323892, 328581] + | 1023 NR_028327 >> [3] chr1 [323892, 328581] + | 1024 NR_028322 >> [4] chr1 [323892, 328581] + | 1025 NR_028325 >> [5] chr1 [367659, 368597] + | 1022 NM_001005277 >> [6] chr1 [367659, 368597] + | 1026 NM_001005221 >> [7] chr1 [367659, 368597] + | 1027 NM_001005224 >> [8] chr1 [763064, 789740] + | 174 NR_015368 >> [9] chr1 [861121, 879961] + | 1035 NM_152486 >> ... ... ... ... ... ... ... >> [38090] chrY [27177050, 27198251] - | 18991 NM_004678 >> [38091] chrY [27177050, 27198251] - | 18992 NM_001002760 >> [38092] chrY [27177050, 27198251] - | 18993 NM_001002761 >> [38093] chrY [27209230, 27246039] - | 18994 NR_001525 >> [38094] chrY [27209230, 27246039] - | 18995 NR_002177 >> [38095] chrY [27209230, 27246039] - | 18996 NR_002178 >> [38096] chrY [27329790, 27330920] - | 18997 NR_001526 >> [38097] chrY [27329790, 27330920] - | 18998 NR_002179 >> [38098] chrY [27329790, 27330920] - | 18999 NR_002180 >> gene_id >> <compressedcharacterlist> >> [1] 79501 >> [2] 100133331 >> [3] 100132287 >> [4] 100132062 >> [5] 81399 >> [6] 729759 >> [7] 26683 >> [8] 643837 >> [9] 148398 >> ... ... >> [38090] 9083 >> [38091] 442867 >> [38092] 442868 >> [38093] 114761 >> [38094] 474150 >> [38095] 474149 >> [38096] 252949 >> [38097] 474152 >> [38098] 474151 >> seqlengths >> chr1 chr2 ... chr18_gl000207_random >> 249250621 243199373 ... 4262 >> >>> seqnames(mysnps)<- sub("ch", "chr", seqnames(mysnps)) >>> map<- as.matrix(findOverlaps(mysnps, tx)) >>> >> Message d'avis : >> In .local(query, subject, maxgap, minoverlap, type, select, ...) : >> No seqnames from the 'query' and 'subject' were identical >> > It looks like the seqnames in mysnps were not renamed to the 'chr' > prefix. Prior to running findOverlaps, check the seqnames in both > objects to make sure the ranges you are interested in mapping have > identical seqnames, > > > unique(seqnames(mysnps)) > [1] chr1 unknown > Levels: chr1 unknown > > > > unique(seqnames(tx)) > [1] chr1 chr10 chr11 > [4] chr12 chr13 chr14 > .... and many more. > > Both objects have 'chr1' and when we run findOverlaps, > > > map<- as.matrix(findOverlaps(mysnps, tx)) > Warning message: > In .local(query, subject, maxgap, minoverlap, type, select, ...) : > Only some seqnames from 'query' and 'subject' were not identical > > This warning makes sense because mysnps only has 'chr1' and 'unknown' to > map to tx but there are many more seqnames in tx. Additionally, tx does > not have an 'unknown' seqname. > > > map > query subject > [1,] 1 49 > [2,] 2 1944 > [3,] 3 62 > [4,] 3 63 > [5,] 4 67 > > > >>> mapExon<- as.matrix(findOverlaps(mysnps, txExon)) >>> >> Message d'avis : >> In .local(query, subject, maxgap, minoverlap, type, select, ...) : >> No seqnames from the 'query' and 'subject' were identical >> >>> mapped_genes<- values(tx)$gene_id[map[, 2]] >>> mapped_snps<- rep.int(values(mysnps)$RefSNP_id[map[, 1]], >>> >> elementLengths(mapped_genes)) >> >>> snp2gene<- unique(data.frame(snp_id=mapped_snps, >>> >> gene_id=unlist(mapped_genes))) >> >>> rownames(snp2gene)<- NULL >>> snp2gene[1:4, ] >>> >> snp_id gene_id >> NA<na> <na> >> NA.1<na> <na> >> NA.2<na> <na> >> NA.3<na> <na> >> >> On our super computer, I can't update manualy the package. They >> are suposed to update automatically once per month according to our >> informactician. The result of sessionInfo() is : >> >> >>> sessionInfo() >>> >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-pc-linux-gnu (64-bit) >> locale: >> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 >> [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >> attachedbase packages: >> [1] stats graphics grDevices utils datasets methods base >> otherattached packages: >> [1] GenomicFeatures_1.2.3 >> [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2 >> [3] GenomicRanges_1.2.2 >> [4] IRanges_1.8.7 >> loadedvia a namespace (and not attached): >> [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 >> [5] DBI_0.2-5 RCurl_1.5-0 >> RSQLite_0.9-4 rtracklayer_1.10.6 >> [9] tools_2.12.0 XML_3.2-0 >> >> On my computer , something stragge happen. The >> package get unstalled when I run >> | source("http://bioconductor.org/biocLite.R") >> update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE) >> | >> >> And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now get >> an error message... >> >> >>> source("http://www.bioconductor.org/biocLite.R") >>> >> BioC_mirror= http://www.bioconductor.org >> Change using chooseBioCmirror(). >> >>> biocLite("SNPlocs.Hsapiens.dbSNP.20101109") >>> >> Using R version 2.12.0, biocinstall version 2.7.4. >> InstallingBioconductor version 2.7 packages: >> [1] "SNPlocs.Hsapiens.dbSNP.20101109" >> Pleasewait... >> Message d'avis : >> In getDependencies(pkgs, dependencies, available, lib) : >> package ?SNPlocs.Hsapiens.dbSNP.20101109? is not available >> >> >> >> >> >> >> >> But for sessionInfo, it's : >> >> >> >> >>> sessionInfo() >>> >> R version 2.12.0 (2010-10-15) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=French_Canada.1252 LC_CTYPE=French_Canada.1252 >> [3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C >> [5] LC_TIME=French_Canada.1252 >> >> attachedbase packages: >> [1] stats graphics grDevices utils datasets methods base >> >> otherattached packages: >> [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2 IRanges_1.8.8 >> >> loadedvia a namespace (and not attached): >> [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2 >> >> [5] DBI_0.2-5 RCurl_1.5-0.1 RSQLite_0.9-4 >> rtracklayer_1.10.6 >> [9] XML_3.2-0.2 >> >> >> >> Now I have a super computer that can't run the program because of >> a subscript out of bounds and a personal computer than have a missing >> library... So now nothing work:( Other problem : why when >> I install again IRanges I got a version older than your for exemple? >> > The code should run on the linux machine (super computer) once you > address the seqnames problem above. You are having a problem installing > SNPlocs.Hsapiens.dbSNP.20101109 on your Windows pc because a Windows > binary of this package is not currently available. We should have this > available and I will let you know as soon as it is ready. Sorry about that. > > I had a newer version of IRanges in my last email because I was using > R-2.13, the devel branch. When using R-2.12 I have, > > >> sessionInfo() >> > R version 2.12.0 Patched (2010-10-18 r53360) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.2.3 > [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2 > [3] GenomicRanges_1.2.2 > [4] IRanges_1.8.8 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 > BSgenome_1.18.2 > [5] DBI_0.2-5 RCurl_1.5-0 RSQLite_0.9-4 > rtracklayer_1.10.6 > [9] tools_2.12.0 XML_3.2-0 > > Valerie > >> >> Simon No?l >> CdeC >> > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6