MEDIPS.createSet error

0

Entering edit mode

Vining, Kelly ▴ 220

@vining-kelly-5029

Last seen 9.6 years ago

Hello, I'm coming back to a MEDIPS project after a long pause, and am having trouble reading in my .bam files. I'm using a custom reference genome that Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure how to address the error message below. I hope it doesn't mean there is something wrong with my reference genome... > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Calculating genomic coordinates...Error in vector(length = supersize_chr[length(chromosomes)], mode = "character") : vector size cannot be NA/NaN Thanks for help, Kelly V.

Alignment BSgenome BSgenome MEDIPS Alignment BSgenome BSgenome MEDIPS • 2.1k views

ADD COMMENT • link updated 10.1 years ago by Lukas Chavez ▴ 570 • written 10.1 years ago by Vining, Kelly ▴ 220

0

Entering edit mode

Lukas Chavez ▴ 570

@lukas-chavez-5781

Last seen 6.2 years ago

USA/La Jolla/UCSD

Dear Kelly, this error typically occurs when the chromosome names in your bam file do not entirely match the chromosome names in your BSgenome object. I suppose that in your bam file there are reads matching to a chromosome with a name that cannot be found in the BSgenome object. Lukas On Fri, Mar 28, 2014 at 10:42 AM, Vining, Kelly < Kelly.Vining@oregonstate.edu> wrote: > Hello, > I'm coming back to a MEDIPS project after a long pause, and am having > trouble reading in my .bam files. I'm using a custom reference genome that > Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure > how to address the error message below. I hope it doesn't mean there is > something wrong with my reference genome... > > > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", > BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Calculating genomic coordinates...Error in vector(length = > supersize_chr[length(chromosomes)], mode = "character") : > vector size cannot be NA/NaN > > Thanks for help, > Kelly V. > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.1 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Hi Lukas, Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. How can I compare the set of chromosome names between the bam file and the reference genome? > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Error in as.environment(pos) : no item called "paste("package:", BSgenome, sep = "")" on the search list In addition: Warning message: In ls(paste("package:", BSgenome, sep = "")) : âpaste("package:", BSgenome, sep = "")â converted to character string Thanks, ________________________________ From: Lukas Chavez [lukas.chavez.mailings@googlemail.com] Sent: Friday, March 28, 2014 10:58 AM To: Vining, Kelly Cc: bioconductor@r-project.org Subject: Re: [BioC] MEDIPS.createSet error Dear Kelly, this error typically occurs when the chromosome names in your bam file do not entirely match the chromosome names in your BSgenome object. I suppose that in your bam file there are reads matching to a chromosome with a name that cannot be found in the BSgenome object. Lukas On Fri, Mar 28, 2014 at 10:42 AM, Vining, Kelly <kelly.vining@oregonstate.edu<mailto:kelly.vining@oregonstate.edu>> wrote: Hello, I'm coming back to a MEDIPS project after a long pause, and am having trouble reading in my .bam files. I'm using a custom reference genome that Lukas Chavez kindly helped me to make into a BSgenome object. I'm not sure how to address the error message below. I hope it doesn't mean there is something wrong with my reference genome... > budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) Reading bam alignment FallBud_brep1_aln_sorted.bam Total number of imported short reads: 21038273 Extending reads... Creating GRange Object... Extract unique regions... Number of unique short reads: 17855868 Calculating genomic coordinates...Error in vector(length = supersize_chr[length(chromosomes)], mode = "character") : vector size cannot be NA/NaN Thanks for help, Kelly V. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Vining, Kelly ▴ 220

0

Entering edit mode

Hi, On Mon, Mar 31, 2014 at 12:01 PM, Vining, Kelly <kelly.vining at="" oregonstate.edu=""> wrote: > Hi Lukas, > Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. > > How can I compare the set of chromosome names between the bam file and the reference genome? > >> budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Error in as.environment(pos) : > no item called "paste("package:", BSgenome, sep = "")" on the search list > In addition: Warning message: > In ls(paste("package:", BSgenome, sep = "")) : > ?paste("package:", BSgenome, sep = "")? converted to character string I'm not so sure that's what this error is telling you. First check if "BSgenome" is a real thing in your workspace -- you are passing some variable named "BSgenome" as the BSgenome parameter to the createSet function. Skimming the vignette, it looks your BSgenoome variable should be set to the name of a BSgenome package to use/load. So, in the same R session where this call is failing, what is the output of `print(BSgenome)`. Note how the vignette has this line: BSgenome="BSgenome.Hsapiens.UCSC.hg19" Which uses the hg19 genome as the reference. Anyway, if BSgenome is set correctly for you, try to load it first to make sure it is installed, eg: R> library(BSgenome, character.only=TRUE) Finally, the BSgenome that you load and the BAM file you are parsing need concordant names for their chromosomes. You can check the chromosome info in each by invoking the `seqinfo` method. If I was using the "BSgenome.Hsapiens.UCSC.hg19" package, I'd do: R> library(Rsamtools) R> library(BSgenome.Hsapiens.UCSC.hg19) R> si.bsg <- seqinfo(BSgenome.Hsapiens.UCSC.hg19) R> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Now look to see that chromosomes (seqnames) in si.bsg and si.bam are concordant. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 10.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi, Thanks for the advice thus far! To confirm what is in my BSgenome variable, I did this: > class(BSgenome) [1] "BSgenome" attr(,"package") [1] "BSgenome" > names(BSgenome) [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" "Chr09" [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" "Chr18" [19] "Chr19" "scaf" And then: > print(BSgenome) Black cottonwood genome | | organism: Populus trichocarpa (Black cottonwood) | provider: Phytozome (JGI) | provider version: 3.0 | release date: January 2010 | release name: Populus trichocarpa v3.0 | | single sequences (see '?seqnames'): | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 | | multiple sequences (see '?mseqnames'): | scaf | | (use the '$' or '[[' operator to access a given sequence) So that looks ok. Interestingly, when I followed the vignette and did the equivalent of BSgenome="BSgenome.Hsapiens.UCSC.hg19" That didn't work for me. It only worked without quotes. If I included quotes, it just assigned a character vector to that variable. Then, following your advice: > si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) > si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : error in evaluating the argument 'x' in selecting a method for function 'seqinfo': Error: could not find function "BamFile" Hmmm.... Continuing down the rabbit hole... --Kelly ________________________________________ From: mailinglist.honeypot@gmail.com [mailinglist.honeypot@gmail.com] on behalf of Steve Lianoglou [lianoglou.steve@gene.com] Sent: Monday, March 31, 2014 12:20 PM To: Vining, Kelly Cc: Lukas Chavez; bioconductor at r-project.org Subject: Re: [BioC] MEDIPS.createSet error Hi, On Mon, Mar 31, 2014 at 12:01 PM, Vining, Kelly <kelly.vining at="" oregonstate.edu=""> wrote: > Hi Lukas, > Well, I'm basically back to the original error I had when I started this work back in December. At that point, I'd decided that my bam files must have had chromosome or scaffold names that didn't exist in the reference genome. So I went back to the original data set and redid all of the alignments to make sure the version of the reference genome I was using was the same. It seems that I something is still not matching up. > > How can I compare the set of chromosome names between the bam file and the reference genome? > >> budFall1_MEDIP = MEDIPS.createSet(file="FallBud_brep1_aln_sorted.bam", BSgenome=BSgenome, extend=300, uniq=TRUE, shift=0, window_size=1000) > Reading bam alignment FallBud_brep1_aln_sorted.bam > Total number of imported short reads: 21038273 > Extending reads... > Creating GRange Object... > Extract unique regions... > Number of unique short reads: 17855868 > Error in as.environment(pos) : > no item called "paste("package:", BSgenome, sep = "")" on the search list > In addition: Warning message: > In ls(paste("package:", BSgenome, sep = "")) : > ?paste("package:", BSgenome, sep = "")? converted to character string I'm not so sure that's what this error is telling you. First check if "BSgenome" is a real thing in your workspace -- you are passing some variable named "BSgenome" as the BSgenome parameter to the createSet function. Skimming the vignette, it looks your BSgenoome variable should be set to the name of a BSgenome package to use/load. So, in the same R session where this call is failing, what is the output of `print(BSgenome)`. Note how the vignette has this line: BSgenome="BSgenome.Hsapiens.UCSC.hg19" Which uses the hg19 genome as the reference. Anyway, if BSgenome is set correctly for you, try to load it first to make sure it is installed, eg: R> library(BSgenome, character.only=TRUE) Finally, the BSgenome that you load and the BAM file you are parsing need concordant names for their chromosomes. You can check the chromosome info in each by invoking the `seqinfo` method. If I was using the "BSgenome.Hsapiens.UCSC.hg19" package, I'd do: R> library(Rsamtools) R> library(BSgenome.Hsapiens.UCSC.hg19) R> si.bsg <- seqinfo(BSgenome.Hsapiens.UCSC.hg19) R> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) Now look to see that chromosomes (seqnames) in si.bsg and si.bam are concordant. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 10.1 years ago Vining, Kelly ▴ 220

0

Entering edit mode

Hi, Caveat being that I've never used MEDIPS, and I'm just going along with the vignette. So, comments inline: On 31 Mar 2014, at 13:09, Vining, Kelly wrote: > Hi, > Thanks for the advice thus far! To confirm what is in my BSgenome > variable, I did this: > >> class(BSgenome) > [1] "BSgenome" > attr(,"package") > [1] "BSgenome" >> names(BSgenome) > [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" > "Chr09" > [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" > "Chr18" > [19] "Chr19" "scaf" > > And then: >> print(BSgenome) > Black cottonwood genome > | > | organism: Populus trichocarpa (Black cottonwood) > | provider: Phytozome (JGI) > | provider version: 3.0 > | release date: January 2010 > | release name: Populus trichocarpa v3.0 > | > | single sequences (see '?seqnames'): > | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 > Chr10 Chr11 > | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 > | > | multiple sequences (see '?mseqnames'): > | scaf > | > | (use the '$' or '[[' operator to access a given sequence) > > > So that looks ok. Interestingly, when I followed the vignette and did > the equivalent of > BSgenome="BSgenome.Hsapiens.UCSC.hg19" The vignette suggests that BSgenome should be the package name of the BSgenome package to open, so yours should be something like "BSgenome.Ptrichocarpa.XXX.YY" or something -- I guess you built this packge by yourself, or something, so you'd know its name ... > That didn't work for me. It only worked without quotes. If I included > quotes, it just assigned a character vector to that variable. Sorry, what exactly didn't work for you? Can you show me the code that failed? > Then, following your advice: > >> si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) >> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) > Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : > error in evaluating the argument 'x' in selecting a method for > function 'seqinfo': Error: could not find function "BamFile" The BamFile function is defined in the Rsamtools package, you need to load that first (the first line of code I suggested you run was to load the Rsamtools package). Load it first, then redo the seqinfo(BamFile(...)) stuff. -steve

ADD REPLY • link 10.1 years ago Steve Lianoglou ★ 13k

Login before adding your answer.