Search
Question: QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5
0
gravatar for Guido Hooiveld
3.8 years ago by
Guido Hooiveld2.2k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.2k wrote:
Hello, I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". Is this to be changed in QuasR, or the BSgenome? Thanks, Guido > library(QuasR) > library(BSgenome) > library(Rsamtools) > library(rtracklayer) > library(GenomicFeatures) > library(BSgenome.Rnorvegicus.UCSC.rn5) > sampleFile <- "samples_GH2.txt" > genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" > > proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) alignment files missing - need to: create alignment index for the genome create 18 genomic alignment(s) will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not found > # The info is there, so this does work, but it cannot be effectuated in QuasR > Rnorvegicus Rat genome | | organism: Rattus norvegicus (Rat) | provider: UCSC | provider version: rn5 | release date: Mar. 2012 | release name: RGSC 5.0 | | single sequences (see '?seqnames'): | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM | | multiple sequences (see '?mseqnames'): | random chrUn upstream1000 upstream2000 upstream5000 | | (use the '$' or '[[' operator to access a given sequence) > seqlengths(Rnorvegicus) chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 chr17 chr18 chr19 chr20 chrX chrM 92503511 87229863 72914587 57791882 154597545 16313 > > genomeFile <- "Rnorvegicus" > proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) The specified genome is not a fasta file or an installed BSgenome. Connecting to Bioconductor and searching for a matching genome (internet connection required)...OK Bioconductor version 2.14 (BiocInstaller 1.13.3), ?biocLite for help Error: Rnorvegicus is not available in Bioconductor. Type available.genomes() for a complete list > > sessionInfo() R Under development (unstable) (2013-11-19 r64265) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 [5] Biobase_2.23.5 rtracklayer_1.23.14 [7] Rsamtools_1.15.29 BSgenome_1.31.12 [9] Biostrings_2.31.14 XVector_0.3.7 [11] QuasR_1.3.12 Rbowtie_1.3.0 [13] GenomicRanges_1.15.31 IRanges_1.21.32 [15] BiocGenerics_0.9.3 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.5 [3] BiocParallel_0.5.8 biomaRt_2.19.3 [5] bitops_1.0-6 brew_1.0-6 [7] codetools_0.2-8 DBI_0.2-7 [9] digest_0.6.4 fail_1.2 [11] foreach_1.4.1 GenomicAlignments_0.99.26 [13] grid_3.1.0 hwriter_1.3 [15] iterators_1.0.6 lattice_0.20-24 [17] latticeExtra_0.6-26 plyr_1.8.1 [19] RColorBrewer_1.0-5 Rcpp_0.11.0 [21] RCurl_1.95-4.1 RSQLite_0.11.4 [23] sendmailR_1.1-2 ShortRead_1.21.14 [25] stats4_3.1.0 stringr_0.6.2 [27] tools_3.1.0 XML_3.98-1.1 [29] zlibbioc_1.9.0 > [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.8 years ago by Hervé Pagès ♦♦ 13k • written 3.8 years ago by Guido Hooiveld2.2k
0
gravatar for Hervé Pagès
3.8 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:
Hi Guido, When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99). Note that starting with BioC 2.14 (which will be released in April, but corresponds to BioC devel at the moment), many BSgenome packages exist in 2 flavors: raw genome or masked genome. For example, for rn5, there is now BSgenome.Rnorvegicus.UCSC.rn5 raw genome BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details: https://stat.ethz.ch/pipermail/bioc-devel/2014-January/005150.html I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5. Cheers, H. On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: > Hello, > I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. > Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. > I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". > > Is this to be changed in QuasR, or the BSgenome? > > Thanks, > Guido > > >> library(QuasR) >> library(BSgenome) >> library(Rsamtools) >> library(rtracklayer) >> library(GenomicFeatures) >> library(BSgenome.Rnorvegicus.UCSC.rn5) >> sampleFile <- "samples_GH2.txt" >> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >> >> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) > alignment files missing - need to: > create alignment index for the genome > create 18 genomic alignment(s) > will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s > Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not found >> > > # The info is there, so this does work, but it cannot be effectuated in QuasR >> Rnorvegicus > Rat genome > | > | organism: Rattus norvegicus (Rat) > | provider: UCSC > | provider version: rn5 > | release date: Mar. 2012 > | release name: RGSC 5.0 > | > | single sequences (see '?seqnames'): > | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM > | > | multiple sequences (see '?mseqnames'): > | random chrUn upstream1000 upstream2000 upstream5000 > | > | (use the '$' or '[[' operator to access a given sequence) >> seqlengths(Rnorvegicus) > chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 > 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 > chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 > 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 > chr17 chr18 chr19 chr20 chrX chrM > 92503511 87229863 72914587 57791882 154597545 16313 >> > >> genomeFile <- "Rnorvegicus" >> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) > The specified genome is not a fasta file or an installed BSgenome. > Connecting to Bioconductor and searching for a matching genome (internet connection required)...OK > Bioconductor version 2.14 (BiocInstaller 1.13.3), ?biocLite for help > Error: Rnorvegicus is not available in Bioconductor. Type available.genomes() for a complete list >> > >> sessionInfo() > R Under development (unstable) (2013-11-19 r64265) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 > [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 > [5] Biobase_2.23.5 rtracklayer_1.23.14 > [7] Rsamtools_1.15.29 BSgenome_1.31.12 > [9] Biostrings_2.31.14 XVector_0.3.7 > [11] QuasR_1.3.12 Rbowtie_1.3.0 > [13] GenomicRanges_1.15.31 IRanges_1.21.32 > [15] BiocGenerics_0.9.3 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.5 > [3] BiocParallel_0.5.8 biomaRt_2.19.3 > [5] bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 > [9] digest_0.6.4 fail_1.2 > [11] foreach_1.4.1 GenomicAlignments_0.99.26 > [13] grid_3.1.0 hwriter_1.3 > [15] iterators_1.0.6 lattice_0.20-24 > [17] latticeExtra_0.6-26 plyr_1.8.1 > [19] RColorBrewer_1.0-5 Rcpp_0.11.0 > [21] RCurl_1.95-4.1 RSQLite_0.11.4 > [23] sendmailR_1.1-2 ShortRead_1.21.14 > [25] stats4_3.1.0 stringr_0.6.2 > [27] tools_3.1.0 XML_3.98-1.1 > [29] zlibbioc_1.9.0 >> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENTlink written 3.8 years ago by Hervé Pagès ♦♦ 13k
Hi Herve, Good point. I checked and version 1.3.17 was installed because that (still) is the latest (binary) version of the package available for Windows. I meanwhile re-installed the BSgenome package from source, and now QuasR is working on my Win7 machine as it should be (thus with v1.3.99). Based on your comments I am currently using the masked file, because that is the equivalent of the old file. Thanks again, Guido -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Sunday, March 02, 2014 02:57 To: Hooiveld, Guido; bioconductor at r-project.org Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5 Hi Guido, When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99). Note that starting with BioC 2.14 (which will be released in April, but corresponds to BioC devel at the moment), many BSgenome packages exist in 2 flavors: raw genome or masked genome. For example, for rn5, there is now BSgenome.Rnorvegicus.UCSC.rn5 raw genome BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details: https://stat.ethz.ch/pipermail/bioc-devel/2014-January/005150.html I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5. Cheers, H. On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: > Hello, > I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. > Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. > I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". > > Is this to be changed in QuasR, or the BSgenome? > > Thanks, > Guido > > >> library(QuasR) >> library(BSgenome) >> library(Rsamtools) >> library(rtracklayer) >> library(GenomicFeatures) >> library(BSgenome.Rnorvegicus.UCSC.rn5) >> sampleFile <- "samples_GH2.txt" >> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >> >> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) > alignment files missing - need to: > create alignment index for the genome > create 18 genomic alignment(s) > will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s > Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not > found >> > > # The info is there, so this does work, but it cannot be effectuated > in QuasR >> Rnorvegicus > Rat genome > | > | organism: Rattus norvegicus (Rat) > | provider: UCSC > | provider version: rn5 > | release date: Mar. 2012 > | release name: RGSC 5.0 > | > | single sequences (see '?seqnames'): > | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM > | > | multiple sequences (see '?mseqnames'): > | random chrUn upstream1000 upstream2000 upstream5000 > | > | (use the '$' or '[[' operator to access a given sequence) >> seqlengths(Rnorvegicus) > chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 > 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 > chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 > 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 > chr17 chr18 chr19 chr20 chrX chrM > 92503511 87229863 72914587 57791882 154597545 16313 >> > >> genomeFile <- "Rnorvegicus" >> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) > The specified genome is not a fasta file or an installed BSgenome. > Connecting to Bioconductor and searching for a matching genome > (internet connection required)...OK Bioconductor version 2.14 > (BiocInstaller 1.13.3), ?biocLite for help > Error: Rnorvegicus is not available in Bioconductor. Type > available.genomes() for a complete list >> > >> sessionInfo() > R Under development (unstable) (2013-11-19 r64265) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United > States.1252 [3] LC_MONETARY=English_United States.1252 [4] > LC_NUMERIC=C [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 > [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 > [5] Biobase_2.23.5 rtracklayer_1.23.14 > [7] Rsamtools_1.15.29 BSgenome_1.31.12 > [9] Biostrings_2.31.14 XVector_0.3.7 > [11] QuasR_1.3.12 Rbowtie_1.3.0 > [13] GenomicRanges_1.15.31 IRanges_1.21.32 > [15] BiocGenerics_0.9.3 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.5 > [3] BiocParallel_0.5.8 biomaRt_2.19.3 > [5] bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 > [9] digest_0.6.4 fail_1.2 > [11] foreach_1.4.1 GenomicAlignments_0.99.26 > [13] grid_3.1.0 hwriter_1.3 > [15] iterators_1.0.6 lattice_0.20-24 > [17] latticeExtra_0.6-26 plyr_1.8.1 > [19] RColorBrewer_1.0-5 Rcpp_0.11.0 > [21] RCurl_1.95-4.1 RSQLite_0.11.4 > [23] sendmailR_1.1-2 ShortRead_1.21.14 > [25] stats4_3.1.0 stringr_0.6.2 > [27] tools_3.1.0 XML_3.98-1.1 > [29] zlibbioc_1.9.0 >> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 3.8 years ago by Guido Hooiveld2.2k
Hi Guido and Herve, You were both spot on. In the development version 1.3.9 of QuasR, we adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >= 1.3.9 only works with these. One clarification regarding the treatment of masks in QuasR: - QuasR <= 1.2.x has ignored masks in BSgenome packages during alignment - QuasR >= 1.3.9 now handles BSgenome objects with or without masks, so that the following the statement: qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5") is equivalent to the old behaviour (no masking), but the statement: qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked") now aligns against a masked genome. I hope this helps. Cheers, Michael On 03.03.2014 14:26, Hooiveld, Guido wrote: > Hi Herve, > Good point. > I checked and version 1.3.17 was installed because that (still) is the latest (binary) version of the package available for Windows. I meanwhile re-installed the BSgenome package from source, and now QuasR is working on my Win7 machine as it should be (thus with v1.3.99). Based on your comments I am currently using the masked file, because that is the equivalent of the old file. > > Thanks again, > Guido > > > -----Original Message----- > From: Hervé Pagès [mailto:hpages at fhcrc.org] > Sent: Sunday, March 02, 2014 02:57 > To: Hooiveld, Guido; bioconductor at r-project.org > Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5 > > Hi Guido, > > When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99). > > Note that starting with BioC 2.14 (which will be released in April, but corresponds to BioC devel at the moment), many BSgenome packages exist in 2 flavors: raw genome or masked genome. For example, for rn5, there is now > > BSgenome.Rnorvegicus.UCSC.rn5 raw genome > BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome > > BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old > BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details: > > https://stat.ethz.ch/pipermail/bioc- devel/2014-January/005150.html > > I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5. > > Cheers, > H. > > > On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: >> Hello, >> I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. >> Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. >> I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". >> >> Is this to be changed in QuasR, or the BSgenome? >> >> Thanks, >> Guido >> >> >>> library(QuasR) >>> library(BSgenome) >>> library(Rsamtools) >>> library(rtracklayer) >>> library(GenomicFeatures) >>> library(BSgenome.Rnorvegicus.UCSC.rn5) >>> sampleFile <- "samples_GH2.txt" >>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >>> >>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> alignment files missing - need to: >> create alignment index for the genome >> create 18 genomic alignment(s) >> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s >> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not >> found >>> >> >> # The info is there, so this does work, but it cannot be effectuated >> in QuasR >>> Rnorvegicus >> Rat genome >> | >> | organism: Rattus norvegicus (Rat) >> | provider: UCSC >> | provider version: rn5 >> | release date: Mar. 2012 >> | release name: RGSC 5.0 >> | >> | single sequences (see '?seqnames'): >> | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >> | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM >> | >> | multiple sequences (see '?mseqnames'): >> | random chrUn upstream1000 upstream2000 upstream5000 >> | >> | (use the '$' or '[[' operator to access a given sequence) >>> seqlengths(Rnorvegicus) >> chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 >> 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 >> chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 >> 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 >> chr17 chr18 chr19 chr20 chrX chrM >> 92503511 87229863 72914587 57791882 154597545 16313 >>> >> >>> genomeFile <- "Rnorvegicus" >>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> The specified genome is not a fasta file or an installed BSgenome. >> Connecting to Bioconductor and searching for a matching genome >> (internet connection required)...OK Bioconductor version 2.14 >> (BiocInstaller 1.13.3), ?biocLite for help >> Error: Rnorvegicus is not available in Bioconductor. Type >> available.genomes() for a complete list >>> >> >>> sessionInfo() >> R Under development (unstable) (2013-11-19 r64265) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United >> States.1252 [3] LC_MONETARY=English_United States.1252 [4] >> LC_NUMERIC=C [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 >> [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 >> [5] Biobase_2.23.5 rtracklayer_1.23.14 >> [7] Rsamtools_1.15.29 BSgenome_1.31.12 >> [9] Biostrings_2.31.14 XVector_0.3.7 >> [11] QuasR_1.3.12 Rbowtie_1.3.0 >> [13] GenomicRanges_1.15.31 IRanges_1.21.32 >> [15] BiocGenerics_0.9.3 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.5 >> [3] BiocParallel_0.5.8 biomaRt_2.19.3 >> [5] bitops_1.0-6 brew_1.0-6 >> [7] codetools_0.2-8 DBI_0.2-7 >> [9] digest_0.6.4 fail_1.2 >> [11] foreach_1.4.1 GenomicAlignments_0.99.26 >> [13] grid_3.1.0 hwriter_1.3 >> [15] iterators_1.0.6 lattice_0.20-24 >> [17] latticeExtra_0.6-26 plyr_1.8.1 >> [19] RColorBrewer_1.0-5 Rcpp_0.11.0 >> [21] RCurl_1.95-4.1 RSQLite_0.11.4 >> [23] sendmailR_1.1-2 ShortRead_1.21.14 >> [25] stats4_3.1.0 stringr_0.6.2 >> [27] tools_3.1.0 XML_3.98-1.1 >> [29] zlibbioc_1.9.0 >>> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- -------------------------------------------- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stadler at fmi.ch
ADD REPLYlink written 3.8 years ago by Michael Stadler310
Dear Michael, Sorry to bother you with this, but I face a problem using QuasR which I can't solve: I would like to summarize my reads into a count table, but I got stuck... An error is thrown that some queries cannot be found. I generated my project essentially as described in the vignette using the unmasked BS.genome file in R-dev, and then would like to annotate it using BioC's rat TxDb. Could this be due to a mismatch between the content of the BSgenome and TxDb files? (the information content of the former is dated later than the latter in R-dev)? Any suggestion would be appreciated! Thanks, Guido sampleFile <- "samples_GH2.txt" genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" proj2 <- qAlign(sampleFile=sampleFile, genome=genomeFile) > geneLevels <- qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene,reportLevel="gene") Error in qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene, reportLevel = "gene") : sequence levels in 'query' not found in alignment files: chr1_AABR06109291_random, chr1_AABR06109292_random, chr1_AABR06109293_random, chr1_AABR06109294_random, chr1_AABR06109295_random, chr1_AABR06109296_random, chr1_AABR06109297_random, chr1_AABR06109298_random, chr1_AABR06109299_random, chr1_AABR06109300_random, chr1_AABR06109301_random, chr1_AABR06109302_random, chr1_AABR06109303_random, chr1_AABR06109307_random, chr1_AABR06109308_random, chr1_AABR06109309_random, chr1_AABR06109310_random, chr1_AABR06109311_random, chr1_AABR06109312_random, chr1_AABR06109313_random, chr1_AABR06109314_random, chr1_AABR06109315_random, chr1_AABR06109316_random, chr1_AABR06109317_random, chr1_AABR06109322_random, chr1_AABR06109323_random, chr1_AABR06109324_random, chr1_AABR06109325_random, chr1_AABR06109331_random, chr1_AABR06109332_random, chr1_AABR06109333_random, chr1_AABR06109334_random, chr1_AABR06109335_random, chr1_AABR06109336_random, chr1_AABR06109337_random, chr1_AABR06109340_rando > > sessionInfo() R Under development (unstable) (2013-11-19 r64265) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 BiocInstaller_1.13.3 [3] QuasR_1.3.13 Rbowtie_1.3.1 [5] rtracklayer_1.23.15 GenomicFeatures_1.15.9 [7] AnnotationDbi_1.25.14 Biobase_2.23.6 [9] GenomicRanges_1.15.38 GenomeInfoDb_0.99.19 [11] IRanges_1.21.34 BiocGenerics_0.9.3 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.5 BiocParallel_0.5.8 [4] biomaRt_2.19.3 Biostrings_2.31.14 bitops_1.0-6 [7] brew_1.0-6 BSgenome_1.31.12 codetools_0.2-8 [10] DBI_0.2-7 digest_0.6.4 fail_1.2 [13] foreach_1.4.1 GenomicAlignments_0.99.32 grid_3.1.0 [16] hwriter_1.3 iterators_1.0.6 lattice_0.20-24 [19] latticeExtra_0.6-26 plyr_1.8.1 RColorBrewer_1.0-5 [22] Rcpp_0.11.0 RCurl_1.95-4.1 Rsamtools_1.15.33 [25] RSQLite_0.11.4 sendmailR_1.1-2 ShortRead_1.21.16 [28] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 [31] XML_3.98-1.1 XVector_0.3.7 zlibbioc_1.9.0 > -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Michael Stadler Sent: Tuesday, March 04, 2014 13:20 To: bioconductor at r-project.org Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5 Hi Guido and Herve, You were both spot on. In the development version 1.3.9 of QuasR, we adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >= 1.3.9 only works with these. One clarification regarding the treatment of masks in QuasR: - QuasR <= 1.2.x has ignored masks in BSgenome packages during alignment - QuasR >= 1.3.9 now handles BSgenome objects with or without masks, so that the following the statement: qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5") is equivalent to the old behaviour (no masking), but the statement: qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked") now aligns against a masked genome. I hope this helps. Cheers, Michael On 03.03.2014 14:26, Hooiveld, Guido wrote: > Hi Herve, > Good point. > I checked and version 1.3.17 was installed because that (still) is the latest (binary) version of the package available for Windows. I meanwhile re-installed the BSgenome package from source, and now QuasR is working on my Win7 machine as it should be (thus with v1.3.99). Based on your comments I am currently using the masked file, because that is the equivalent of the old file. > > Thanks again, > Guido > > > -----Original Message----- > From: Hervé Pagès [mailto:hpages at fhcrc.org] > Sent: Sunday, March 02, 2014 02:57 > To: Hooiveld, Guido; bioconductor at r-project.org > Subject: Re: [BioC] QuasR: problem accessing > BSgenome.Rnorvegicus.UCSC.rn5 > > Hi Guido, > > When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99). > > Note that starting with BioC 2.14 (which will be released in April, > but corresponds to BioC devel at the moment), many BSgenome packages > exist in 2 flavors: raw genome or masked genome. For example, for rn5, > there is now > > BSgenome.Rnorvegicus.UCSC.rn5 raw genome > BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome > > BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old > BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details: > > https://stat.ethz.ch/pipermail/bioc- devel/2014-January/005150.html > > I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5. > > Cheers, > H. > > > On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: >> Hello, >> I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. >> Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. >> I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". >> >> Is this to be changed in QuasR, or the BSgenome? >> >> Thanks, >> Guido >> >> >>> library(QuasR) >>> library(BSgenome) >>> library(Rsamtools) >>> library(rtracklayer) >>> library(GenomicFeatures) >>> library(BSgenome.Rnorvegicus.UCSC.rn5) >>> sampleFile <- "samples_GH2.txt" >>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >>> >>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> alignment files missing - need to: >> create alignment index for the genome >> create 18 genomic alignment(s) >> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s >> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not >> found >>> >> >> # The info is there, so this does work, but it cannot be effectuated >> in QuasR >>> Rnorvegicus >> Rat genome >> | >> | organism: Rattus norvegicus (Rat) >> | provider: UCSC >> | provider version: rn5 >> | release date: Mar. 2012 >> | release name: RGSC 5.0 >> | >> | single sequences (see '?seqnames'): >> | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >> | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM >> | >> | multiple sequences (see '?mseqnames'): >> | random chrUn upstream1000 upstream2000 upstream5000 >> | >> | (use the '$' or '[[' operator to access a given sequence) >>> seqlengths(Rnorvegicus) >> chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 >> 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 >> chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 >> 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 >> chr17 chr18 chr19 chr20 chrX chrM >> 92503511 87229863 72914587 57791882 154597545 16313 >>> >> >>> genomeFile <- "Rnorvegicus" >>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> The specified genome is not a fasta file or an installed BSgenome. >> Connecting to Bioconductor and searching for a matching genome >> (internet connection required)...OK Bioconductor version 2.14 >> (BiocInstaller 1.13.3), ?biocLite for help >> Error: Rnorvegicus is not available in Bioconductor. Type >> available.genomes() for a complete list >>> >> >>> sessionInfo() >> R Under development (unstable) (2013-11-19 r64265) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United >> States.1252 [3] LC_MONETARY=English_United States.1252 [4] >> LC_NUMERIC=C [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 >> [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 >> [5] Biobase_2.23.5 rtracklayer_1.23.14 >> [7] Rsamtools_1.15.29 BSgenome_1.31.12 >> [9] Biostrings_2.31.14 XVector_0.3.7 >> [11] QuasR_1.3.12 Rbowtie_1.3.0 >> [13] GenomicRanges_1.15.31 IRanges_1.21.32 >> [15] BiocGenerics_0.9.3 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.5 >> [3] BiocParallel_0.5.8 biomaRt_2.19.3 >> [5] bitops_1.0-6 brew_1.0-6 >> [7] codetools_0.2-8 DBI_0.2-7 >> [9] digest_0.6.4 fail_1.2 >> [11] foreach_1.4.1 GenomicAlignments_0.99.26 >> [13] grid_3.1.0 hwriter_1.3 >> [15] iterators_1.0.6 lattice_0.20-24 >> [17] latticeExtra_0.6-26 plyr_1.8.1 >> [19] RColorBrewer_1.0-5 Rcpp_0.11.0 >> [21] RCurl_1.95-4.1 RSQLite_0.11.4 >> [23] sendmailR_1.1-2 ShortRead_1.21.14 >> [25] stats4_3.1.0 stringr_0.6.2 >> [27] tools_3.1.0 XML_3.98-1.1 >> [29] zlibbioc_1.9.0 >>> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- -------------------------------------------- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stadler at fmi.ch _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 3.8 years ago by Guido Hooiveld2.2k
Hi Guido Michael is on holiday this week. In the meantime, I will try to reply to my best knowledge - I am sure, Michael will reply with a better answer, once he is back. We hardly work with rat, so I just downloaded the corresponding BSgenome and TxD packages (from devel). And indeed there is a mismatch wrt the seqlevels of the BSgenome and TxDb files: > seqlevels(Rnorvegicus) [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" [10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" [19] "chr19" "chr20" "chrX" "chrM" > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene)[1:30] [1] "chr1" "chr2" [3] "chr3" "chr4" [5] "chr5" "chr6" [7] "chr7" "chr8" [9] "chr9" "chr10" [11] "chr11" "chr12" [13] "chr13" "chr14" [15] "chr15" "chr16" [17] "chr17" "chr18" [19] "chr19" "chr20" [21] "chrX" "chrM" [23] "chr1_AABR06109291_random" "chr1_AABR06109292_random" [25] "chr1_AABR06109293_random" "chr1_AABR06109294_random" [27] "chr1_AABR06109295_random" "chr1_AABR06109296_random" [29] "chr1_AABR06109297_random" "chr1_AABR06109298_random" > As a quick fix, I recommend to restrict the seq levels of the TxDb, eg: seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene, force=TRUE) <- seqlevels(Rnorvegicus) Hope this helps, Regards, Hans-Rudolf > sessionInfo() R Under development (unstable) (2014-01-17 r64817) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 [2] GenomicFeatures_1.15.9 [3] AnnotationDbi_1.25.14 [4] Biobase_2.23.6 [5] BSgenome.Rnorvegicus.UCSC.rn5_1.3.99 [6] BSgenome_1.31.12 [7] Biostrings_2.31.14 [8] XVector_0.3.7 [9] GenomicRanges_1.15.38 [10] GenomeInfoDb_0.99.19 [11] IRanges_1.21.34 [12] BiocGenerics_0.9.3 [13] BiocInstaller_1.13.3 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.5 [3] BiocParallel_0.5.17 biomaRt_2.19.3 [5] bitops_1.0-6 brew_1.0-6 [7] codetools_0.2-8 DBI_0.2-7 [9] digest_0.6.4 fail_1.2 [11] foreach_1.4.1 GenomicAlignments_0.99.32 [13] iterators_1.0.6 plyr_1.8.1 [15] Rcpp_0.11.0 RCurl_1.95-4.1 [17] Rsamtools_1.15.33 RSQLite_0.11.4 [19] rtracklayer_1.23.16 sendmailR_1.1-2 [21] stats4_3.1.0 stringr_0.6.2 [23] tools_3.1.0 XML_3.98-1.1 [25] zlibbioc_1.9.0 > On 03/12/2014 12:20 PM, Hooiveld, Guido wrote: > Dear Michael, > Sorry to bother you with this, but I face a problem using QuasR which I can't solve: > I would like to summarize my reads into a count table, but I got stuck... An error is thrown that some queries cannot be found. > I generated my project essentially as described in the vignette using the unmasked BS.genome file in R-dev, and then would like to annotate it using BioC's rat TxDb. Could this be due to a mismatch between the content of the BSgenome and TxDb files? (the information content of the former is dated later than the latter in R-dev)? > Any suggestion would be appreciated! > > Thanks, > Guido > > sampleFile <- "samples_GH2.txt" > genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" > proj2 <- qAlign(sampleFile=sampleFile, genome=genomeFile) > > >> geneLevels <- qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene,reportLevel="gene") > Error in qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene, reportLevel = "gene") : > sequence levels in 'query' not found in alignment files: chr1_AABR06109291_random, chr1_AABR06109292_random, chr1_AABR06109293_random, chr1_AABR06109294_random, chr1_AABR06109295_random, chr1_AABR06109296_random, chr1_AABR06109297_random, chr1_AABR06109298_random, chr1_AABR06109299_random, chr1_AABR06109300_random, chr1_AABR06109301_random, chr1_AABR06109302_random, chr1_AABR06109303_random, chr1_AABR06109307_random, chr1_AABR06109308_random, chr1_AABR06109309_random, chr1_AABR06109310_random, chr1_AABR06109311_random, chr1_AABR06109312_random, chr1_AABR06109313_random, chr1_AABR06109314_random, chr1_AABR06109315_random, chr1_AABR06109316_random, chr1_AABR06109317_random, chr1_AABR06109322_random, chr1_AABR06109323_random, chr1_AABR06109324_random, chr1_AABR06109325_random, chr1_AABR06109331_random, chr1_AABR06109332_random, chr1_AABR06109333_random, chr1_AABR06109334_random, chr1_AABR06109335_random, chr1_AABR06109336_random, chr1_AABR06109337_random, chr1_AABR06109340_ rando >> > >> sessionInfo() > R Under development (unstable) (2013-11-19 r64265) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 BiocInstaller_1.13.3 > [3] QuasR_1.3.13 Rbowtie_1.3.1 > [5] rtracklayer_1.23.15 GenomicFeatures_1.15.9 > [7] AnnotationDbi_1.25.14 Biobase_2.23.6 > [9] GenomicRanges_1.15.38 GenomeInfoDb_0.99.19 > [11] IRanges_1.21.34 BiocGenerics_0.9.3 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.5 BiocParallel_0.5.8 > [4] biomaRt_2.19.3 Biostrings_2.31.14 bitops_1.0-6 > [7] brew_1.0-6 BSgenome_1.31.12 codetools_0.2-8 > [10] DBI_0.2-7 digest_0.6.4 fail_1.2 > [13] foreach_1.4.1 GenomicAlignments_0.99.32 grid_3.1.0 > [16] hwriter_1.3 iterators_1.0.6 lattice_0.20-24 > [19] latticeExtra_0.6-26 plyr_1.8.1 RColorBrewer_1.0-5 > [22] Rcpp_0.11.0 RCurl_1.95-4.1 Rsamtools_1.15.33 > [25] RSQLite_0.11.4 sendmailR_1.1-2 ShortRead_1.21.16 > [28] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 > [31] XML_3.98-1.1 XVector_0.3.7 zlibbioc_1.9.0 >> > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Michael Stadler > Sent: Tuesday, March 04, 2014 13:20 > To: bioconductor at r-project.org > Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5 > > Hi Guido and Herve, > > You were both spot on. In the development version 1.3.9 of QuasR, we adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >= > 1.3.9 only works with these. > > One clarification regarding the treatment of masks in QuasR: > > - QuasR <= 1.2.x has ignored masks in BSgenome packages > during alignment > > - QuasR >= 1.3.9 now handles BSgenome objects with or without masks, > so that the following the statement: > > qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5") > > is equivalent to the old behaviour (no masking), but the statement: > > qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked") > > now aligns against a masked genome. > > I hope this helps. > > Cheers, > Michael > > > > On 03.03.2014 14:26, Hooiveld, Guido wrote: >> Hi Herve, >> Good point. >> I checked and version 1.3.17 was installed because that (still) is the latest (binary) version of the package available for Windows. I meanwhile re-installed the BSgenome package from source, and now QuasR is working on my Win7 machine as it should be (thus with v1.3.99). Based on your comments I am currently using the masked file, because that is the equivalent of the old file. >> >> Thanks again, >> Guido >> >> >> -----Original Message----- >> From: Hervé Pagès [mailto:hpages at fhcrc.org] >> Sent: Sunday, March 02, 2014 02:57 >> To: Hooiveld, Guido; bioconductor at r-project.org >> Subject: Re: [BioC] QuasR: problem accessing >> BSgenome.Rnorvegicus.UCSC.rn5 >> >> Hi Guido, >> >> When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99). >> >> Note that starting with BioC 2.14 (which will be released in April, >> but corresponds to BioC devel at the moment), many BSgenome packages >> exist in 2 flavors: raw genome or masked genome. For example, for rn5, >> there is now >> >> BSgenome.Rnorvegicus.UCSC.rn5 raw genome >> BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome >> >> BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old >> BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details: >> >> https://stat.ethz.ch/pipermail/bioc- devel/2014-January/005150.html >> >> I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5. >> >> Cheers, >> H. >> >> >> On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: >>> Hello, >>> I am using R-dev, and would like to run QuasR to align a RNA-seq experiment. >>> Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR. >>> I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected) "BSgenome.Rnorvegicus.UCSC.rn5". >>> >>> Is this to be changed in QuasR, or the BSgenome? >>> >>> Thanks, >>> Guido >>> >>> >>>> library(QuasR) >>>> library(BSgenome) >>>> library(Rsamtools) >>>> library(rtracklayer) >>>> library(GenomicFeatures) >>>> library(BSgenome.Rnorvegicus.UCSC.rn5) >>>> sampleFile <- "samples_GH2.txt" >>>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >>>> >>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>> alignment files missing - need to: >>> create alignment index for the genome >>> create 18 genomic alignment(s) >>> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s >>> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not >>> found >>>> >>> >>> # The info is there, so this does work, but it cannot be effectuated >>> in QuasR >>>> Rnorvegicus >>> Rat genome >>> | >>> | organism: Rattus norvegicus (Rat) >>> | provider: UCSC >>> | provider version: rn5 >>> | release date: Mar. 2012 >>> | release name: RGSC 5.0 >>> | >>> | single sequences (see '?seqnames'): >>> | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >>> | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM >>> | >>> | multiple sequences (see '?mseqnames'): >>> | random chrUn upstream1000 upstream2000 upstream5000 >>> | >>> | (use the '$' or '[[' operator to access a given sequence) >>>> seqlengths(Rnorvegicus) >>> chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 >>> 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389 >>> chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 >>> 121549591 112200500 93518069 54450796 118718031 115151701 114627140 90051983 >>> chr17 chr18 chr19 chr20 chrX chrM >>> 92503511 87229863 72914587 57791882 154597545 16313 >>>> >>> >>>> genomeFile <- "Rnorvegicus" >>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>> The specified genome is not a fasta file or an installed BSgenome. >>> Connecting to Bioconductor and searching for a matching genome >>> (internet connection required)...OK Bioconductor version 2.14 >>> (BiocInstaller 1.13.3), ?biocLite for help >>> Error: Rnorvegicus is not available in Bioconductor. Type >>> available.genomes() for a complete list >>>> >>> >>>> sessionInfo() >>> R Under development (unstable) (2013-11-19 r64265) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United >>> States.1252 [3] LC_MONETARY=English_United States.1252 [4] >>> LC_NUMERIC=C [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] BiocInstaller_1.13.3 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 >>> [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 >>> [5] Biobase_2.23.5 rtracklayer_1.23.14 >>> [7] Rsamtools_1.15.29 BSgenome_1.31.12 >>> [9] Biostrings_2.31.14 XVector_0.3.7 >>> [11] QuasR_1.3.12 Rbowtie_1.3.0 >>> [13] GenomicRanges_1.15.31 IRanges_1.21.32 >>> [15] BiocGenerics_0.9.3 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.5 >>> [3] BiocParallel_0.5.8 biomaRt_2.19.3 >>> [5] bitops_1.0-6 brew_1.0-6 >>> [7] codetools_0.2-8 DBI_0.2-7 >>> [9] digest_0.6.4 fail_1.2 >>> [11] foreach_1.4.1 GenomicAlignments_0.99.26 >>> [13] grid_3.1.0 hwriter_1.3 >>> [15] iterators_1.0.6 lattice_0.20-24 >>> [17] latticeExtra_0.6-26 plyr_1.8.1 >>> [19] RColorBrewer_1.0-5 Rcpp_0.11.0 >>> [21] RCurl_1.95-4.1 RSQLite_0.11.4 >>> [23] sendmailR_1.1-2 ShortRead_1.21.14 >>> [25] stats4_3.1.0 stringr_0.6.2 >>> [27] tools_3.1.0 XML_3.98-1.1 >>> [29] zlibbioc_1.9.0 >>>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > -------------------------------------------- > Michael Stadler, PhD > Head of Computational Biology > Friedrich Miescher Institute > Basel (Switzerland) > Phone : +41 61 697 6492 > Fax : +41 61 697 3976 > Mail : michael.stadler at fmi.ch > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLYlink written 3.8 years ago by Hotz, Hans-Rudolf380
Hi Guido, Hans-Rudolf, FWIW the "random" sequences are actually part of the BSgenome package for rn5, but they are in the "multiple sequence" section: > library(BSgenome.Rnorvegicus.UCSC.rn5) > BSgenome.Rnorvegicus.UCSC.rn5 Rat genome | | organism: Rattus norvegicus (Rat) | provider: UCSC | provider version: rn5 | release date: Mar. 2012 | release name: RGSC 5.0 | | single sequences (see '?seqnames'): | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM | | multiple sequences (see '?mseqnames'): | random chrUn upstream1000 upstream2000 upstream5000 | | (use the '$' or '[[' operator to access a given sequence) > mseqnames(BSgenome.Rnorvegicus.UCSC.rn5) [1] "random" "chrUn" "upstream1000" "upstream2000" "upstream5000" > BSgenome.Rnorvegicus.UCSC.rn5$random A DNAStringSet instance of length 1278 width seq names [1] 1013 AGTCTTGAACTCTTCTTCGTT...GGAATGCTACAACCTAGAAAT chr10_AABR0611010... [2] 1765 CAGGAGTAAAGTCTTCTGAAC...ACTAAATCCCCAACCCCGGTG chr10_JH620367_ra... [3] 780 AGCACACAATCTGGGAGAATA...GTTCAGAAGACTTTACCCCGG chr10_AABR0611010... [4] 4563 AAGACTGGAGAGATGGCTCAG...TGCTCGCCAGCTCGAGCTGGA chr10_AABR0611010... [5] 2250 TTGTTAGAGGTGGAGTTATGT...ATCAAAAGTTTAAGATTACCA chr10_AABR0611010... ... ... ... [1274] 15658 GGATAGTAAGTATAGAAGAGA...TAGGAACACAACTTTGAAGAA chrX_JH620457_random [1275] 548 ATCAGATAGGTTTAATGCAGA...GAAATCAGGGACTAGACAAGG chrX_AABR06110855... [1276] 1012 AAAGATATTCTGTAATTTGGT...AGCCATCAGGTTGTCTCTGGA chrX_AABR06110856... [1277] 2100 TTCTCATGACAAATTTGCTTT...AGAAGGACAGGCAACCCTTTC chrX_JH620458_random [1278] 5429 TTCTCTTGGGAAATTTTAGCT...TAGAGTGTTATTCCCTTCCCG chrX_JH620459_random > table(substr(names(BSgenome.Rnorvegicus.UCSC.rn5$random), 1, 5)) chr1_ chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2_ chr20 81 47 54 35 50 41 82 43 45 34 22 123 34 chr3_ chr4_ chr5_ chr6_ chr7_ chr8_ chr9_ chrX_ 65 78 78 40 99 50 36 141 With most other BSgenome packages, the random sequences are in the "single sequence" section because there are generally very few of them (< 10 or 15). For rn5 I put them in the "multiple sequence" section to avoid cluttering the "single sequence" section. The "multiple sequence" section was originally introduced to the BSgenome packages for hosting the upstream sequences. Then I started to use it to store other groups of sequences that are not the main top-level sequences of the assembly. However this division between "single sequence" and "multiple sequence" sections is kind of arbitrary and can be confusing. Also now that the upstream sequences are deprecated, there is no strong use case anymore for the "multiple sequence" section. So my plan is to get rid of it during the next devel cycle (in BioC 2.15). Cheers, H. On 03/12/2014 05:39 AM, Hans-Rudolf Hotz wrote: > Hi Guido > > Michael is on holiday this week. In the meantime, I will try to reply to > my best knowledge - I am sure, Michael will reply with a better answer, > once he is back. > > We hardly work with rat, so I just downloaded the corresponding BSgenome > and TxD packages (from devel). And indeed there is a mismatch wrt the > seqlevels of the BSgenome and TxDb files: > > > seqlevels(Rnorvegicus) > [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" > [10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" > "chr18" > [19] "chr19" "chr20" "chrX" "chrM" > > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene)[1:30] > [1] "chr1" "chr2" > [3] "chr3" "chr4" > [5] "chr5" "chr6" > [7] "chr7" "chr8" > [9] "chr9" "chr10" > [11] "chr11" "chr12" > [13] "chr13" "chr14" > [15] "chr15" "chr16" > [17] "chr17" "chr18" > [19] "chr19" "chr20" > [21] "chrX" "chrM" > [23] "chr1_AABR06109291_random" "chr1_AABR06109292_random" > [25] "chr1_AABR06109293_random" "chr1_AABR06109294_random" > [27] "chr1_AABR06109295_random" "chr1_AABR06109296_random" > [29] "chr1_AABR06109297_random" "chr1_AABR06109298_random" > > > > As a quick fix, I recommend to restrict the seq levels of the TxDb, eg: > > > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene, force=TRUE) <- > seqlevels(Rnorvegicus) > > > Hope this helps, > Regards, Hans-Rudolf > > > > > sessionInfo() > R Under development (unstable) (2014-01-17 r64817) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 > [2] GenomicFeatures_1.15.9 > [3] AnnotationDbi_1.25.14 > [4] Biobase_2.23.6 > [5] BSgenome.Rnorvegicus.UCSC.rn5_1.3.99 > [6] BSgenome_1.31.12 > [7] Biostrings_2.31.14 > [8] XVector_0.3.7 > [9] GenomicRanges_1.15.38 > [10] GenomeInfoDb_0.99.19 > [11] IRanges_1.21.34 > [12] BiocGenerics_0.9.3 > [13] BiocInstaller_1.13.3 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.5 > [3] BiocParallel_0.5.17 biomaRt_2.19.3 > [5] bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 > [9] digest_0.6.4 fail_1.2 > [11] foreach_1.4.1 GenomicAlignments_0.99.32 > [13] iterators_1.0.6 plyr_1.8.1 > [15] Rcpp_0.11.0 RCurl_1.95-4.1 > [17] Rsamtools_1.15.33 RSQLite_0.11.4 > [19] rtracklayer_1.23.16 sendmailR_1.1-2 > [21] stats4_3.1.0 stringr_0.6.2 > [23] tools_3.1.0 XML_3.98-1.1 > [25] zlibbioc_1.9.0 > > > > > > > On 03/12/2014 12:20 PM, Hooiveld, Guido wrote: >> Dear Michael, >> Sorry to bother you with this, but I face a problem using QuasR which >> I can't solve: >> I would like to summarize my reads into a count table, but I got >> stuck... An error is thrown that some queries cannot be found. >> I generated my project essentially as described in the vignette using >> the unmasked BS.genome file in R-dev, and then would like to annotate >> it using BioC's rat TxDb. Could this be due to a mismatch between the >> content of the BSgenome and TxDb files? (the information content of >> the former is dated later than the latter in R-dev)? >> Any suggestion would be appreciated! >> >> Thanks, >> Guido >> >> sampleFile <- "samples_GH2.txt" >> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >> proj2 <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> >> >>> geneLevels <- qCount(proj2, >>> TxDb.Rnorvegicus.UCSC.rn5.refGene,reportLevel="gene") >> Error in qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene, reportLevel >> = "gene") : >> sequence levels in 'query' not found in alignment files: >> chr1_AABR06109291_random, chr1_AABR06109292_random, >> chr1_AABR06109293_random, chr1_AABR06109294_random, >> chr1_AABR06109295_random, chr1_AABR06109296_random, >> chr1_AABR06109297_random, chr1_AABR06109298_random, >> chr1_AABR06109299_random, chr1_AABR06109300_random, >> chr1_AABR06109301_random, chr1_AABR06109302_random, >> chr1_AABR06109303_random, chr1_AABR06109307_random, >> chr1_AABR06109308_random, chr1_AABR06109309_random, >> chr1_AABR06109310_random, chr1_AABR06109311_random, >> chr1_AABR06109312_random, chr1_AABR06109313_random, >> chr1_AABR06109314_random, chr1_AABR06109315_random, >> chr1_AABR06109316_random, chr1_AABR06109317_random, >> chr1_AABR06109322_random, chr1_AABR06109323_random, >> chr1_AABR06109324_random, chr1_AABR06109325_random, >> chr1_AABR06109331_random, chr1_AABR06109332_random, >> chr1_AABR06109333_random, chr1_AABR06109334_random, >> chr1_AABR06109335_random, chr1_AABR06109336_random, >> chr1_AABR06109337_random, chr1_AABR06109340_ > rando >>> >> >>> sessionInfo() >> R Under development (unstable) (2013-11-19 r64265) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 >> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 BiocInstaller_1.13.3 >> [3] QuasR_1.3.13 Rbowtie_1.3.1 >> [5] rtracklayer_1.23.15 GenomicFeatures_1.15.9 >> [7] AnnotationDbi_1.25.14 Biobase_2.23.6 >> [9] GenomicRanges_1.15.38 GenomeInfoDb_0.99.19 >> [11] IRanges_1.21.34 BiocGenerics_0.9.3 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.5 >> BiocParallel_0.5.8 >> [4] biomaRt_2.19.3 Biostrings_2.31.14 bitops_1.0-6 >> [7] brew_1.0-6 BSgenome_1.31.12 codetools_0.2-8 >> [10] DBI_0.2-7 digest_0.6.4 fail_1.2 >> [13] foreach_1.4.1 GenomicAlignments_0.99.32 grid_3.1.0 >> [16] hwriter_1.3 iterators_1.0.6 lattice_0.20-24 >> [19] latticeExtra_0.6-26 plyr_1.8.1 >> RColorBrewer_1.0-5 >> [22] Rcpp_0.11.0 RCurl_1.95-4.1 >> Rsamtools_1.15.33 >> [25] RSQLite_0.11.4 sendmailR_1.1-2 >> ShortRead_1.21.16 >> [28] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 >> [31] XML_3.98-1.1 XVector_0.3.7 zlibbioc_1.9.0 >>> >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Stadler >> Sent: Tuesday, March 04, 2014 13:20 >> To: bioconductor at r-project.org >> Subject: Re: [BioC] QuasR: problem accessing >> BSgenome.Rnorvegicus.UCSC.rn5 >> >> Hi Guido and Herve, >> >> You were both spot on. In the development version 1.3.9 of QuasR, we >> adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >= >> 1.3.9 only works with these. >> >> One clarification regarding the treatment of masks in QuasR: >> >> - QuasR <= 1.2.x has ignored masks in BSgenome packages >> during alignment >> >> - QuasR >= 1.3.9 now handles BSgenome objects with or without masks, >> so that the following the statement: >> >> qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5") >> >> is equivalent to the old behaviour (no masking), but the statement: >> >> qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked") >> >> now aligns against a masked genome. >> >> I hope this helps. >> >> Cheers, >> Michael >> >> >> >> On 03.03.2014 14:26, Hooiveld, Guido wrote: >>> Hi Herve, >>> Good point. >>> I checked and version 1.3.17 was installed because that (still) is >>> the latest (binary) version of the package available for Windows. I >>> meanwhile re-installed the BSgenome package from source, and now >>> QuasR is working on my Win7 machine as it should be (thus with >>> v1.3.99). Based on your comments I am currently using the masked >>> file, because that is the equivalent of the old file. >>> >>> Thanks again, >>> Guido >>> >>> >>> -----Original Message----- >>> From: Hervé Pagès [mailto:hpages at fhcrc.org] >>> Sent: Sunday, March 02, 2014 02:57 >>> To: Hooiveld, Guido; bioconductor at r-project.org >>> Subject: Re: [BioC] QuasR: problem accessing >>> BSgenome.Rnorvegicus.UCSC.rn5 >>> >>> Hi Guido, >>> >>> When using BioC devel, things can move fast so it's important that >>> you update your packages often (with biocLite()) in order to keep >>> everything in sync. In your case it looks like the version of the >>> BSgenome package you have (1.3.17) is lagging behind the version >>> currently in BioC devel (1.3.99). >>> >>> Note that starting with BioC 2.14 (which will be released in April, >>> but corresponds to BioC devel at the moment), many BSgenome packages >>> exist in 2 flavors: raw genome or masked genome. For example, for rn5, >>> there is now >>> >>> BSgenome.Rnorvegicus.UCSC.rn5 raw genome >>> BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome >>> >>> BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old >>> BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already >>> masked. However, in BioC <= 2.13, there was no non-masked version of >>> rn5. See announcement here for more details: >>> >>> https://stat.ethz.ch/pipermail/bioc- devel/2014-January/005150.html >>> >>> I don't know if QuasR cares about the masks though. Maybe they're >>> just ignored, in which case I guess you could just stick to >>> BSgenome.Rnorvegicus.UCSC.rn5. >>> >>> Cheers, >>> H. >>> >>> >>> On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: >>>> Hello, >>>> I am using R-dev, and would like to run QuasR to align a RNA-seq >>>> experiment. >>>> Unfortunately, I can't get past the indexing step because somehow >>>> BSgenome cannot be accessed by QuasR. >>>> I think this is due because it can be accessed by using >>>> "Rnorvegicus" rather than by (the expected) >>>> "BSgenome.Rnorvegicus.UCSC.rn5". >>>> >>>> Is this to be changed in QuasR, or the BSgenome? >>>> >>>> Thanks, >>>> Guido >>>> >>>> >>>>> library(QuasR) >>>>> library(BSgenome) >>>>> library(Rsamtools) >>>>> library(rtracklayer) >>>>> library(GenomicFeatures) >>>>> library(BSgenome.Rnorvegicus.UCSC.rn5) >>>>> sampleFile <- "samples_GH2.txt" >>>>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >>>>> >>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>>> alignment files missing - need to: >>>> create alignment index for the genome >>>> create 18 genomic alignment(s) >>>> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s >>>> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not >>>> found >>>>> >>>> >>>> # The info is there, so this does work, but it cannot be effectuated >>>> in QuasR >>>>> Rnorvegicus >>>> Rat genome >>>> | >>>> | organism: Rattus norvegicus (Rat) >>>> | provider: UCSC >>>> | provider version: rn5 >>>> | release date: Mar. 2012 >>>> | release name: RGSC 5.0 >>>> | >>>> | single sequences (see '?seqnames'): >>>> | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>>> chr10 chr11 >>>> | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 >>>> chrX chrM >>>> | >>>> | multiple sequences (see '?mseqnames'): >>>> | random chrUn upstream1000 upstream2000 >>>> upstream5000 >>>> | >>>> | (use the '$' or '[[' operator to access a given sequence) >>>>> seqlengths(Rnorvegicus) >>>> chr1 chr2 chr3 chr4 chr5 chr6 >>>> chr7 chr8 >>>> 290094216 285068071 183740530 248343840 177180328 156897508 >>>> 143501887 132457389 >>>> chr9 chr10 chr11 chr12 chr13 chr14 >>>> chr15 chr16 >>>> 121549591 112200500 93518069 54450796 118718031 115151701 >>>> 114627140 90051983 >>>> chr17 chr18 chr19 chr20 chrX chrM >>>> 92503511 87229863 72914587 57791882 154597545 16313 >>>>> >>>> >>>>> genomeFile <- "Rnorvegicus" >>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>>> The specified genome is not a fasta file or an installed BSgenome. >>>> Connecting to Bioconductor and searching for a matching genome >>>> (internet connection required)...OK Bioconductor version 2.14 >>>> (BiocInstaller 1.13.3), ?biocLite for help >>>> Error: Rnorvegicus is not available in Bioconductor. Type >>>> available.genomes() for a complete list >>>>> >>>> >>>>> sessionInfo() >>>> R Under development (unstable) (2013-11-19 r64265) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United >>>> States.1252 [3] LC_MONETARY=English_United States.1252 [4] >>>> LC_NUMERIC=C [5] LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] BiocInstaller_1.13.3 >>>> BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 >>>> [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 >>>> [5] Biobase_2.23.5 rtracklayer_1.23.14 >>>> [7] Rsamtools_1.15.29 BSgenome_1.31.12 >>>> [9] Biostrings_2.31.14 XVector_0.3.7 >>>> [11] QuasR_1.3.12 Rbowtie_1.3.0 >>>> [13] GenomicRanges_1.15.31 IRanges_1.21.32 >>>> [15] BiocGenerics_0.9.3 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BatchJobs_1.2 BBmisc_1.5 >>>> [3] BiocParallel_0.5.8 biomaRt_2.19.3 >>>> [5] bitops_1.0-6 brew_1.0-6 >>>> [7] codetools_0.2-8 DBI_0.2-7 >>>> [9] digest_0.6.4 fail_1.2 >>>> [11] foreach_1.4.1 GenomicAlignments_0.99.26 >>>> [13] grid_3.1.0 hwriter_1.3 >>>> [15] iterators_1.0.6 lattice_0.20-24 >>>> [17] latticeExtra_0.6-26 plyr_1.8.1 >>>> [19] RColorBrewer_1.0-5 Rcpp_0.11.0 >>>> [21] RCurl_1.95-4.1 RSQLite_0.11.4 >>>> [23] sendmailR_1.1-2 ShortRead_1.21.14 >>>> [25] stats4_3.1.0 stringr_0.6.2 >>>> [27] tools_3.1.0 XML_3.98-1.1 >>>> [29] zlibbioc_1.9.0 >>>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> -------------------------------------------- >> Michael Stadler, PhD >> Head of Computational Biology >> Friedrich Miescher Institute >> Basel (Switzerland) >> Phone : +41 61 697 6492 >> Fax : +41 61 697 3976 >> Mail : michael.stadler at fmi.ch >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 3.8 years ago by Hervé Pagès ♦♦ 13k
Thanks very much for this info, gentlemen! Forcing to restrict the seq levels of the TxDb seems indeed to do the trick (at least as far as I can judge now). Regards, Guido -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Wednesday, March 12, 2014 19:03 To: Hans-Rudolf Hotz; Hooiveld, Guido; 'Michael Stadler'; bioconductor at r-project.org Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5 Hi Guido, Hans-Rudolf, FWIW the "random" sequences are actually part of the BSgenome package for rn5, but they are in the "multiple sequence" section: > library(BSgenome.Rnorvegicus.UCSC.rn5) > BSgenome.Rnorvegicus.UCSC.rn5 Rat genome | | organism: Rattus norvegicus (Rat) | provider: UCSC | provider version: rn5 | release date: Mar. 2012 | release name: RGSC 5.0 | | single sequences (see '?seqnames'): | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chrX chrM | | multiple sequences (see '?mseqnames'): | random chrUn upstream1000 upstream2000 upstream5000 | | (use the '$' or '[[' operator to access a given sequence) > mseqnames(BSgenome.Rnorvegicus.UCSC.rn5) [1] "random" "chrUn" "upstream1000" "upstream2000" "upstream5000" > BSgenome.Rnorvegicus.UCSC.rn5$random A DNAStringSet instance of length 1278 width seq names [1] 1013 AGTCTTGAACTCTTCTTCGTT...GGAATGCTACAACCTAGAAAT chr10_AABR0611010... [2] 1765 CAGGAGTAAAGTCTTCTGAAC...ACTAAATCCCCAACCCCGGTG chr10_JH620367_ra... [3] 780 AGCACACAATCTGGGAGAATA...GTTCAGAAGACTTTACCCCGG chr10_AABR0611010... [4] 4563 AAGACTGGAGAGATGGCTCAG...TGCTCGCCAGCTCGAGCTGGA chr10_AABR0611010... [5] 2250 TTGTTAGAGGTGGAGTTATGT...ATCAAAAGTTTAAGATTACCA chr10_AABR0611010... ... ... ... [1274] 15658 GGATAGTAAGTATAGAAGAGA...TAGGAACACAACTTTGAAGAA chrX_JH620457_random [1275] 548 ATCAGATAGGTTTAATGCAGA...GAAATCAGGGACTAGACAAGG chrX_AABR06110855... [1276] 1012 AAAGATATTCTGTAATTTGGT...AGCCATCAGGTTGTCTCTGGA chrX_AABR06110856... [1277] 2100 TTCTCATGACAAATTTGCTTT...AGAAGGACAGGCAACCCTTTC chrX_JH620458_random [1278] 5429 TTCTCTTGGGAAATTTTAGCT...TAGAGTGTTATTCCCTTCCCG chrX_JH620459_random > table(substr(names(BSgenome.Rnorvegicus.UCSC.rn5$random), 1, 5)) chr1_ chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2_ chr20 81 47 54 35 50 41 82 43 45 34 22 123 34 chr3_ chr4_ chr5_ chr6_ chr7_ chr8_ chr9_ chrX_ 65 78 78 40 99 50 36 141 With most other BSgenome packages, the random sequences are in the "single sequence" section because there are generally very few of them (< 10 or 15). For rn5 I put them in the "multiple sequence" section to avoid cluttering the "single sequence" section. The "multiple sequence" section was originally introduced to the BSgenome packages for hosting the upstream sequences. Then I started to use it to store other groups of sequences that are not the main top-level sequences of the assembly. However this division between "single sequence" and "multiple sequence" sections is kind of arbitrary and can be confusing. Also now that the upstream sequences are deprecated, there is no strong use case anymore for the "multiple sequence" section. So my plan is to get rid of it during the next devel cycle (in BioC 2.15). Cheers, H. On 03/12/2014 05:39 AM, Hans-Rudolf Hotz wrote: > Hi Guido > > Michael is on holiday this week. In the meantime, I will try to reply > to my best knowledge - I am sure, Michael will reply with a better > answer, once he is back. > > We hardly work with rat, so I just downloaded the corresponding > BSgenome and TxD packages (from devel). And indeed there is a mismatch > wrt the seqlevels of the BSgenome and TxDb files: > > > seqlevels(Rnorvegicus) > [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" > [10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" > "chr18" > [19] "chr19" "chr20" "chrX" "chrM" > > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene)[1:30] > [1] "chr1" "chr2" > [3] "chr3" "chr4" > [5] "chr5" "chr6" > [7] "chr7" "chr8" > [9] "chr9" "chr10" > [11] "chr11" "chr12" > [13] "chr13" "chr14" > [15] "chr15" "chr16" > [17] "chr17" "chr18" > [19] "chr19" "chr20" > [21] "chrX" "chrM" > [23] "chr1_AABR06109291_random" "chr1_AABR06109292_random" > [25] "chr1_AABR06109293_random" "chr1_AABR06109294_random" > [27] "chr1_AABR06109295_random" "chr1_AABR06109296_random" > [29] "chr1_AABR06109297_random" "chr1_AABR06109298_random" > > > > As a quick fix, I recommend to restrict the seq levels of the TxDb, eg: > > > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene, force=TRUE) <- > seqlevels(Rnorvegicus) > > > Hope this helps, > Regards, Hans-Rudolf > > > > > sessionInfo() > R Under development (unstable) (2014-01-17 r64817) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 > [2] GenomicFeatures_1.15.9 > [3] AnnotationDbi_1.25.14 > [4] Biobase_2.23.6 > [5] BSgenome.Rnorvegicus.UCSC.rn5_1.3.99 > [6] BSgenome_1.31.12 > [7] Biostrings_2.31.14 > [8] XVector_0.3.7 > [9] GenomicRanges_1.15.38 > [10] GenomeInfoDb_0.99.19 > [11] IRanges_1.21.34 > [12] BiocGenerics_0.9.3 > [13] BiocInstaller_1.13.3 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.5 > [3] BiocParallel_0.5.17 biomaRt_2.19.3 > [5] bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 > [9] digest_0.6.4 fail_1.2 > [11] foreach_1.4.1 GenomicAlignments_0.99.32 > [13] iterators_1.0.6 plyr_1.8.1 > [15] Rcpp_0.11.0 RCurl_1.95-4.1 > [17] Rsamtools_1.15.33 RSQLite_0.11.4 > [19] rtracklayer_1.23.16 sendmailR_1.1-2 > [21] stats4_3.1.0 stringr_0.6.2 > [23] tools_3.1.0 XML_3.98-1.1 > [25] zlibbioc_1.9.0 > > > > > > > On 03/12/2014 12:20 PM, Hooiveld, Guido wrote: >> Dear Michael, >> Sorry to bother you with this, but I face a problem using QuasR which >> I can't solve: >> I would like to summarize my reads into a count table, but I got >> stuck... An error is thrown that some queries cannot be found. >> I generated my project essentially as described in the vignette using >> the unmasked BS.genome file in R-dev, and then would like to annotate >> it using BioC's rat TxDb. Could this be due to a mismatch between the >> content of the BSgenome and TxDb files? (the information content of >> the former is dated later than the latter in R-dev)? >> Any suggestion would be appreciated! >> >> Thanks, >> Guido >> >> sampleFile <- "samples_GH2.txt" >> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >> proj2 <- qAlign(sampleFile=sampleFile, genome=genomeFile) >> >> >>> geneLevels <- qCount(proj2, >>> TxDb.Rnorvegicus.UCSC.rn5.refGene,reportLevel="gene") >> Error in qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene, reportLevel >> = "gene") : >> sequence levels in 'query' not found in alignment files: >> chr1_AABR06109291_random, chr1_AABR06109292_random, >> chr1_AABR06109293_random, chr1_AABR06109294_random, >> chr1_AABR06109295_random, chr1_AABR06109296_random, >> chr1_AABR06109297_random, chr1_AABR06109298_random, >> chr1_AABR06109299_random, chr1_AABR06109300_random, >> chr1_AABR06109301_random, chr1_AABR06109302_random, >> chr1_AABR06109303_random, chr1_AABR06109307_random, >> chr1_AABR06109308_random, chr1_AABR06109309_random, >> chr1_AABR06109310_random, chr1_AABR06109311_random, >> chr1_AABR06109312_random, chr1_AABR06109313_random, >> chr1_AABR06109314_random, chr1_AABR06109315_random, >> chr1_AABR06109316_random, chr1_AABR06109317_random, >> chr1_AABR06109322_random, chr1_AABR06109323_random, >> chr1_AABR06109324_random, chr1_AABR06109325_random, >> chr1_AABR06109331_random, chr1_AABR06109332_random, >> chr1_AABR06109333_random, chr1_AABR06109334_random, >> chr1_AABR06109335_random, chr1_AABR06109336_random, >> chr1_AABR06109337_random, chr1_AABR06109340_ > rando >>> >> >>> sessionInfo() >> R Under development (unstable) (2013-11-19 r64265) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 >> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] >> LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 BiocInstaller_1.13.3 >> [3] QuasR_1.3.13 Rbowtie_1.3.1 >> [5] rtracklayer_1.23.15 GenomicFeatures_1.15.9 >> [7] AnnotationDbi_1.25.14 Biobase_2.23.6 >> [9] GenomicRanges_1.15.38 GenomeInfoDb_0.99.19 >> [11] IRanges_1.21.34 BiocGenerics_0.9.3 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.5 >> BiocParallel_0.5.8 >> [4] biomaRt_2.19.3 Biostrings_2.31.14 bitops_1.0-6 >> [7] brew_1.0-6 BSgenome_1.31.12 codetools_0.2-8 >> [10] DBI_0.2-7 digest_0.6.4 fail_1.2 >> [13] foreach_1.4.1 GenomicAlignments_0.99.32 grid_3.1.0 >> [16] hwriter_1.3 iterators_1.0.6 lattice_0.20-24 >> [19] latticeExtra_0.6-26 plyr_1.8.1 >> RColorBrewer_1.0-5 >> [22] Rcpp_0.11.0 RCurl_1.95-4.1 >> Rsamtools_1.15.33 >> [25] RSQLite_0.11.4 sendmailR_1.1-2 >> ShortRead_1.21.16 >> [28] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 >> [31] XML_3.98-1.1 XVector_0.3.7 zlibbioc_1.9.0 >>> >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael >> Stadler >> Sent: Tuesday, March 04, 2014 13:20 >> To: bioconductor at r-project.org >> Subject: Re: [BioC] QuasR: problem accessing >> BSgenome.Rnorvegicus.UCSC.rn5 >> >> Hi Guido and Herve, >> >> You were both spot on. In the development version 1.3.9 of QuasR, we >> adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >= >> 1.3.9 only works with these. >> >> One clarification regarding the treatment of masks in QuasR: >> >> - QuasR <= 1.2.x has ignored masks in BSgenome packages >> during alignment >> >> - QuasR >= 1.3.9 now handles BSgenome objects with or without masks, >> so that the following the statement: >> >> qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5") >> >> is equivalent to the old behaviour (no masking), but the statement: >> >> qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked") >> >> now aligns against a masked genome. >> >> I hope this helps. >> >> Cheers, >> Michael >> >> >> >> On 03.03.2014 14:26, Hooiveld, Guido wrote: >>> Hi Herve, >>> Good point. >>> I checked and version 1.3.17 was installed because that (still) is >>> the latest (binary) version of the package available for Windows. I >>> meanwhile re-installed the BSgenome package from source, and now >>> QuasR is working on my Win7 machine as it should be (thus with >>> v1.3.99). Based on your comments I am currently using the masked >>> file, because that is the equivalent of the old file. >>> >>> Thanks again, >>> Guido >>> >>> >>> -----Original Message----- >>> From: Hervé Pagès [mailto:hpages at fhcrc.org] >>> Sent: Sunday, March 02, 2014 02:57 >>> To: Hooiveld, Guido; bioconductor at r-project.org >>> Subject: Re: [BioC] QuasR: problem accessing >>> BSgenome.Rnorvegicus.UCSC.rn5 >>> >>> Hi Guido, >>> >>> When using BioC devel, things can move fast so it's important that >>> you update your packages often (with biocLite()) in order to keep >>> everything in sync. In your case it looks like the version of the >>> BSgenome package you have (1.3.17) is lagging behind the version >>> currently in BioC devel (1.3.99). >>> >>> Note that starting with BioC 2.14 (which will be released in April, >>> but corresponds to BioC devel at the moment), many BSgenome packages >>> exist in 2 flavors: raw genome or masked genome. For example, for >>> rn5, there is now >>> >>> BSgenome.Rnorvegicus.UCSC.rn5 raw genome >>> BSgenome.Rnorvegicus.UCSC.rn5.masked masked genome >>> >>> BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old >>> BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already >>> masked. However, in BioC <= 2.13, there was no non-masked version of >>> rn5. See announcement here for more details: >>> >>> >>> https://stat.ethz.ch/pipermail/bioc-devel/2014-January/005150.html >>> >>> I don't know if QuasR cares about the masks though. Maybe they're >>> just ignored, in which case I guess you could just stick to >>> BSgenome.Rnorvegicus.UCSC.rn5. >>> >>> Cheers, >>> H. >>> >>> >>> On 02/28/2014 03:44 PM, Hooiveld, Guido wrote: >>>> Hello, >>>> I am using R-dev, and would like to run QuasR to align a RNA-seq >>>> experiment. >>>> Unfortunately, I can't get past the indexing step because somehow >>>> BSgenome cannot be accessed by QuasR. >>>> I think this is due because it can be accessed by using >>>> "Rnorvegicus" rather than by (the expected) >>>> "BSgenome.Rnorvegicus.UCSC.rn5". >>>> >>>> Is this to be changed in QuasR, or the BSgenome? >>>> >>>> Thanks, >>>> Guido >>>> >>>> >>>>> library(QuasR) >>>>> library(BSgenome) >>>>> library(Rsamtools) >>>>> library(rtracklayer) >>>>> library(GenomicFeatures) >>>>> library(BSgenome.Rnorvegicus.UCSC.rn5) >>>>> sampleFile <- "samples_GH2.txt" >>>>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5" >>>>> >>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>>> alignment files missing - need to: >>>> create alignment index for the genome >>>> create 18 genomic alignment(s) will start in >>>> ..9s..8s..7s..6s..5s..4s..3s..2s..1s >>>> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not >>>> found >>>>> >>>> >>>> # The info is there, so this does work, but it cannot be >>>> effectuated in QuasR >>>>> Rnorvegicus >>>> Rat genome >>>> | >>>> | organism: Rattus norvegicus (Rat) >>>> | provider: UCSC >>>> | provider version: rn5 >>>> | release date: Mar. 2012 >>>> | release name: RGSC 5.0 >>>> | >>>> | single sequences (see '?seqnames'): >>>> | chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>>> chr10 chr11 >>>> | chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 >>>> chrX chrM >>>> | >>>> | multiple sequences (see '?mseqnames'): >>>> | random chrUn upstream1000 upstream2000 >>>> upstream5000 >>>> | >>>> | (use the '$' or '[[' operator to access a given sequence) >>>>> seqlengths(Rnorvegicus) >>>> chr1 chr2 chr3 chr4 chr5 chr6 >>>> chr7 chr8 >>>> 290094216 285068071 183740530 248343840 177180328 156897508 >>>> 143501887 132457389 >>>> chr9 chr10 chr11 chr12 chr13 chr14 >>>> chr15 chr16 >>>> 121549591 112200500 93518069 54450796 118718031 115151701 >>>> 114627140 90051983 >>>> chr17 chr18 chr19 chr20 chrX chrM >>>> 92503511 87229863 72914587 57791882 154597545 16313 >>>>> >>>> >>>>> genomeFile <- "Rnorvegicus" >>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile) >>>> The specified genome is not a fasta file or an installed BSgenome. >>>> Connecting to Bioconductor and searching for a matching genome >>>> (internet connection required)...OK Bioconductor version 2.14 >>>> (BiocInstaller 1.13.3), ?biocLite for help >>>> Error: Rnorvegicus is not available in Bioconductor. Type >>>> available.genomes() for a complete list >>>>> >>>> >>>>> sessionInfo() >>>> R Under development (unstable) (2013-11-19 r64265) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 [2] >>>> LC_CTYPE=English_United >>>> States.1252 [3] LC_MONETARY=English_United States.1252 [4] >>>> LC_NUMERIC=C [5] LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] BiocInstaller_1.13.3 >>>> BSgenome.Rnorvegicus.UCSC.rn5_1.3.17 >>>> [3] GenomicFeatures_1.15.7 AnnotationDbi_1.25.9 >>>> [5] Biobase_2.23.5 rtracklayer_1.23.14 >>>> [7] Rsamtools_1.15.29 BSgenome_1.31.12 >>>> [9] Biostrings_2.31.14 XVector_0.3.7 >>>> [11] QuasR_1.3.12 Rbowtie_1.3.0 >>>> [13] GenomicRanges_1.15.31 IRanges_1.21.32 >>>> [15] BiocGenerics_0.9.3 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BatchJobs_1.2 BBmisc_1.5 >>>> [3] BiocParallel_0.5.8 biomaRt_2.19.3 >>>> [5] bitops_1.0-6 brew_1.0-6 >>>> [7] codetools_0.2-8 DBI_0.2-7 >>>> [9] digest_0.6.4 fail_1.2 >>>> [11] foreach_1.4.1 GenomicAlignments_0.99.26 >>>> [13] grid_3.1.0 hwriter_1.3 >>>> [15] iterators_1.0.6 lattice_0.20-24 >>>> [17] latticeExtra_0.6-26 plyr_1.8.1 >>>> [19] RColorBrewer_1.0-5 Rcpp_0.11.0 >>>> [21] RCurl_1.95-4.1 RSQLite_0.11.4 >>>> [23] sendmailR_1.1-2 ShortRead_1.21.14 >>>> [25] stats4_3.1.0 stringr_0.6.2 >>>> [27] tools_3.1.0 XML_3.98-1.1 >>>> [29] zlibbioc_1.9.0 >>>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> -------------------------------------------- >> Michael Stadler, PhD >> Head of Computational Biology >> Friedrich Miescher Institute >> Basel (Switzerland) >> Phone : +41 61 697 6492 >> Fax : +41 61 697 3976 >> Mail : michael.stadler at fmi.ch >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 3.8 years ago by Guido Hooiveld2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 420 users visited in the last hour