MEDIPS.createSet error
1
0
Entering edit mode
Vining, Kelly ▴ 220
@vining-kelly-5029
Last seen 8.2 years ago
Hello, I'm coming back to a MEDIPS project after a long pause, and am having trouble reading in my .bam files. I'm using a custom reference genome that Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure how to address the error message below. I hope it doesn't mean there is something wrong with my reference genome... > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Calculating genomic coordinates...Error in vector(length = supersize_chr[length(chromosomes)], mode = "character") : vector size cannot be NA/NaN Thanks for help, Kelly V.
0
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 4.8 years ago
USA/La Jolla/UCSD
Dear Kelly, this error typically occurs when the chromosome names in your bam file do not entirely match the chromosome names in your BSgenome object. I suppose that in your bam file there are reads matching to a chromosome with a name that cannot be found in the BSgenome object. Lukas On Fri, Mar 28, 2014 at 10:42 AM, Vining, Kelly < Kelly.Vining@oregonstate.edu> wrote: > Hello, > I'm coming back to a MEDIPS project after a long pause, and am having > trouble reading in my .bam files. I'm using a custom reference genome that > Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure > how to address the error message below. I hope it doesn't mean there is > something wrong with my reference genome... > > > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", > BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Calculating genomic coordinates...Error in vector(length = > supersize_chr[length(chromosomes)], mode = "character") : > vector size cannot be NA/NaN > > Thanks for help, > Kelly V. > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Lukas, Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. How can I compare the set of chromosome names between the bam file and the reference genome? > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Error in as.environment(pos) : no item called "paste("package:", BSgenome, sep = "")" on the search list In addition: Warning message: In ls(paste("package:", BSgenome, sep = "")) : âpaste("package:", BSgenome, sep = "")â converted to character string Thanks, ________________________________ From: Lukas Chavez [lukas.chavez.mailings@googlemail.com] Sent: Friday, March 28, 2014 10:58 AM To: Vining, Kelly Cc: bioconductor@r-project.org Subject: Re: [BioC] MEDIPS.createSet error Dear Kelly, this error typically occurs when the chromosome names in your bam file do not entirely match the chromosome names in your BSgenome object. I suppose that in your bam file there are reads matching to a chromosome with a name that cannot be found in the BSgenome object. Lukas On Fri, Mar 28, 2014 at 10:42 AM, Vining, Kelly <kelly.vining@oregonstate.edu<mailto:kelly.vining@oregonstate.edu>> wrote: Hello, I'm coming back to a MEDIPS project after a long pause, and am having trouble reading in my .bam files. I'm using a custom reference genome that Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure how to address the error message below. I hope it doesn't mean there is something wrong with my reference genome... > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Calculating genomic coordinates...Error in vector(length = supersize_chr[length(chromosomes)], mode = "character") : vector size cannot be NA/NaN Thanks for help, Kelly V. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
0
Entering edit mode
Hi, On Mon, Mar 31, 2014 at 12:01 PM, Vining, Kelly <kelly.vining at="" oregonstate.edu=""> wrote: > Hi Lukas, > Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. > > How can I compare the set of chromosome names between the bam file and the reference genome? > >> budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Error in as.environment(pos) : > no item called "paste("package:", BSgenome, sep = "")" on the search list > In addition: Warning message: > In ls(paste("package:", BSgenome, sep = "")) : > ?paste("package:", BSgenome, sep = "")? converted to character string I'm not so sure that's what this error is telling you. First check if "BSgenome" is a real thing in your workspace -- you are passing some variable named "BSgenome" as the BSgenome parameter to the createSet function. Skimming the vignette, it looks your BSgenoome variable should be set to the name of a BSgenome package to use/load. So, in the same R session where this call is failing, what is the output of print(BSgenome). Note how the vignette has this line: BSgenome="BSgenome.Hsapiens.UCSC.hg19" Which uses the hg19 genome as the reference. Anyway, if BSgenome is set correctly for you, try to load it first to make sure it is installed, eg: R> library(BSgenome, character.only=TRUE) Finally, the BSgenome that you load and the BAM file you are parsing need concordant names for their chromosomes. You can check the chromosome info in each by invoking the seqinfo method. If I was using the "BSgenome.Hsapiens.UCSC.hg19" package, I'd do: R> library(Rsamtools) R> library(BSgenome.Hsapiens.UCSC.hg19) R> si.bsg <- seqinfo(BSgenome.Hsapiens.UCSC.hg19) R> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Now look to see that chromosomes (seqnames) in si.bsg and si.bam are concordant. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech
0
Entering edit mode
Hi, Thanks for the advice thus far! To confirm what is in my BSgenome variable, I did this: > class(BSgenome) [1] "BSgenome" attr(,"package") [1] "BSgenome" > names(BSgenome) [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" "Chr09" [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" "Chr18" [19] "Chr19" "scaf" And then: > print(BSgenome) Black cottonwood genome | | organism: Populus trichocarpa (Black cottonwood) | provider: Phytozome (JGI) | provider version: 3.0 | release date: January 2010 | release name: Populus trichocarpa v3.0 | | single sequences (see '?seqnames'): | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 | | multiple sequences (see '?mseqnames'): | scaf | | (use the '' or '[[' operator to access a given sequence) So that looks ok. Interestingly, when I followed the vignette and did the equivalent of BSgenome="BSgenome.Hsapiens.UCSC.hg19" That didn't work for me. It only worked without quotes. If I included quotes, it just assigned a character vector to that variable. Then, following your advice: > si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) > si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : error in evaluating the argument 'x' in selecting a method for function 'seqinfo': Error: could not find function "BamFile" Hmmm.... Continuing down the rabbit hole... --Kelly ________________________________________ From: mailinglist.honeypot@gmail.com [mailinglist.honeypot@gmail.com] on behalf of Steve Lianoglou [lianoglou.steve@gene.com] Sent: Monday, March 31, 2014 12:20 PM To: Vining, Kelly Cc: Lukas Chavez; bioconductor at r-project.org Subject: Re: [BioC] MEDIPS.createSet error Hi, On Mon, Mar 31, 2014 at 12:01 PM, Vining, Kelly <kelly.vining at="" oregonstate.edu=""> wrote: > Hi Lukas, > Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. > > How can I compare the set of chromosome names between the bam file and the reference genome? > >> budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Error in as.environment(pos) : > no item called "paste("package:", BSgenome, sep = "")" on the search list > In addition: Warning message: > In ls(paste("package:", BSgenome, sep = "")) : > ?paste("package:", BSgenome, sep = "")? converted to character string I'm not so sure that's what this error is telling you. First check if "BSgenome" is a real thing in your workspace -- you are passing some variable named "BSgenome" as the BSgenome parameter to the createSet function. Skimming the vignette, it looks your BSgenoome variable should be set to the name of a BSgenome package to use/load. So, in the same R session where this call is failing, what is the output of print(BSgenome). Note how the vignette has this line: BSgenome="BSgenome.Hsapiens.UCSC.hg19" Which uses the hg19 genome as the reference. Anyway, if BSgenome is set correctly for you, try to load it first to make sure it is installed, eg: R> library(BSgenome, character.only=TRUE) Finally, the BSgenome that you load and the BAM file you are parsing need concordant names for their chromosomes. You can check the chromosome info in each by invoking the seqinfo method. If I was using the "BSgenome.Hsapiens.UCSC.hg19" package, I'd do: R> library(Rsamtools) R> library(BSgenome.Hsapiens.UCSC.hg19) R> si.bsg <- seqinfo(BSgenome.Hsapiens.UCSC.hg19) R> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Now look to see that chromosomes (seqnames) in si.bsg and si.bam are concordant. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech ADD REPLY 0 Entering edit mode Hi, Caveat being that I've never used MEDIPS, and I'm just going along with the vignette. So, comments inline: On 31 Mar 2014, at 13:09, Vining, Kelly wrote: > Hi, > Thanks for the advice thus far! To confirm what is in my BSgenome > variable, I did this: > >> class(BSgenome) > [1] "BSgenome" > attr(,"package") > [1] "BSgenome" >> names(BSgenome) > [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" > "Chr09" > [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" > "Chr18" > [19] "Chr19" "scaf" > > And then: >> print(BSgenome) > Black cottonwood genome > | > | organism: Populus trichocarpa (Black cottonwood) > | provider: Phytozome (JGI) > | provider version: 3.0 > | release date: January 2010 > | release name: Populus trichocarpa v3.0 > | > | single sequences (see '?seqnames'): > | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 > Chr10 Chr11 > | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 > | > | multiple sequences (see '?mseqnames'): > | scaf > | > | (use the '' or '[[' operator to access a given sequence) > > > So that looks ok. Interestingly, when I followed the vignette and did > the equivalent of > BSgenome="BSgenome.Hsapiens.UCSC.hg19" The vignette suggests that BSgenome should be the package name of the BSgenome package to open, so yours should be something like "BSgenome.Ptrichocarpa.XXX.YY" or something -- I guess you built this packge by yourself, or something, so you'd know its name ... > That didn't work for me. It only worked without quotes. If I included > quotes, it just assigned a character vector to that variable. Sorry, what exactly didn't work for you? Can you show me the code that failed? > Then, following your advice: > >> si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) >> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) > Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : > error in evaluating the argument 'x' in selecting a method for > function 'seqinfo': Error: could not find function "BamFile" The BamFile function is defined in the Rsamtools package, you need to load that first (the first line of code I suggested you run was to load the Rsamtools package). Load it first, then redo the seqinfo(BamFile(...)) stuff. -steve