Search
Question: MEDIPS.createSet error
0
4.1 years ago by
Vining, Kelly220
Vining, Kelly220 wrote:
Thanks, Steve, for your advice - appreciate you working with a package that isn't familiar to you. Thanks also for catching that I didn't have Rsamtools installed. I think this output that you suggested I generate gives an idea about what might be going on: > si.bsg Seqinfo of length 19 seqnames seqlengths isCircular genome Chr01 50495391 FALSE 3.0 Chr02 25263035 FALSE 3.0 Chr03 21816808 FALSE 3.0 Chr04 24267051 FALSE 3.0 Chr05 25890704 FALSE 3.0 ... ... ... ... Chr15 15278577 FALSE 3.0 Chr16 14494361 FALSE 3.0 Chr17 16080358 FALSE 3.0 Chr18 16958300 FALSE 3.0 Chr19 15942145 FALSE 3.0 > si.bam Seqinfo of length 1446 seqnames seqlengths isCircular genome Chr01 50495391 <na> <na> Chr02 25263035 <na> <na> Chr03 21816808 <na> <na> Chr04 24267051 <na> <na> Chr05 25890704 <na> <na> ... ... ... ... scaffold_3584 3654 <na> <na> scaffold_3595 2008 <na> <na> scaffold_3648 2796 <na> <na> scaffold_3664 3053 <na> <na> scaffold_3681 1022 <na> <na> When I created the reference genome structure, the scaffolds were all entered in a single, multiple seq object, following instructions. But in my bam files, the scaffold hits are as shown above. So that could be one problem, but I don't know how to solve that other than filtering out all of the alignments to the scaffolds from the bam files. I really would like to retain the scaffold alignments and not lose all of that precious data. The other potential issue is the "NA" entries in the "isCircular" and "genome" columns. I'm not sure whether those would be a problem. Continued thanks, --Kelly ________________________________________ From: Steve Lianoglou [lianoglou.steve@gene.com] Sent: Monday, March 31, 2014 1:15 PM To: Vining, Kelly Cc: Lukas Chavez; bioconductor at r-project.org Subject: Re: [BioC] MEDIPS.createSet error Hi, Caveat being that I've never used MEDIPS, and I'm just going along with the vignette. So, comments inline: On 31 Mar 2014, at 13:09, Vining, Kelly wrote: > Hi, > Thanks for the advice thus far! To confirm what is in my BSgenome > variable, I did this: > >> class(BSgenome) > [1] "BSgenome" > attr(,"package") > [1] "BSgenome" >> names(BSgenome) > [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" > "Chr09" > [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" > "Chr18" > [19] "Chr19" "scaf" > > And then: >> print(BSgenome) > Black cottonwood genome > | > | organism: Populus trichocarpa (Black cottonwood) > | provider: Phytozome (JGI) > | provider version: 3.0 > | release date: January 2010 > | release name: Populus trichocarpa v3.0 > | > | single sequences (see '?seqnames'): > | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 > Chr10 Chr11 > | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 > | > | multiple sequences (see '?mseqnames'): > | scaf > | > | (use the '' or '[[' operator to access a given sequence) > > > So that looks ok. Interestingly, when I followed the vignette and did > the equivalent of > BSgenome="BSgenome.Hsapiens.UCSC.hg19" The vignette suggests that BSgenome should be the package name of the BSgenome package to open, so yours should be something like "BSgenome.Ptrichocarpa.XXX.YY" or something -- I guess you built this packge by yourself, or something, so you'd know its name ... > That didn't work for me. It only worked without quotes. If I included > quotes, it just assigned a character vector to that variable. Sorry, what exactly didn't work for you? Can you show me the code that failed? > Then, following your advice: > >> si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) >> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) > Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : > error in evaluating the argument 'x' in selecting a method for > function 'seqinfo': Error: could not find function "BamFile" The BamFile function is defined in the Rsamtools package, you need to load that first (the first line of code I suggested you run was to load the Rsamtools package). Load it first, then redo the seqinfo(BamFile(...)) stuff. -steve ADD COMMENTlink modified 4.1 years ago by Lukas Chavez470 • written 4.1 years ago by Vining, Kelly220 0 4.1 years ago by Lukas Chavez470 USA/La Jolla/UCSD Lukas Chavez470 wrote: Thank you Steve for explaining in detail how to look at the content of the BSgenome object and the bam file. When MEDIPS imports the bam file, it will try to extract information for the 'scaffold_' chromosomes from the BSgenome object. The initially reported error Calculating genomic coordinates...Error in vector(length = supersize_chr[length( chromosomes)], mode = "character") : vector size cannot be NA/NaN occurs because MEDIPS is trying to find the length of a chromosome given in the bam file, but does not find the chromosome in the BSgenome object. Subsequently, MEDIPS tries to calculate genomic coordinates for the chromosome wide windows, but the length of the chromosome is NA. I understand that it will be necessary to catch this error and add a warning message in a future version of MEDIPS. If you want to keep your hits on the 'scaffold_' chromosomes in the bam file, and to include all scaffolds in your further analysis, you will have to recreate your BSgenome object treating each scaffold as an individual chromosome just as the other assembled chromosomes. Lukas On Mon, Mar 31, 2014 at 3:05 PM, Vining, Kelly <kelly.vining@oregonstate.edu> wrote: > Thanks, Steve, for your advice - appreciate you working with a package > that isn't familiar to you. > Thanks also for catching that I didn't have Rsamtools installed. > > I think this output that you suggested I generate gives an idea about what > might be going on: > > > > si.bsg > Seqinfo of length 19 > seqnames seqlengths isCircular genome > Chr01 50495391 FALSE 3.0 > Chr02 25263035 FALSE 3.0 > Chr03 21816808 FALSE 3.0 > Chr04 24267051 FALSE 3.0 > Chr05 25890704 FALSE 3.0 > ... ... ... ... > Chr15 15278577 FALSE 3.0 > Chr16 14494361 FALSE 3.0 > Chr17 16080358 FALSE 3.0 > Chr18 16958300 FALSE 3.0 > Chr19 15942145 FALSE 3.0 > > si.bam > Seqinfo of length 1446 > seqnames seqlengths isCircular genome > Chr01 50495391 <na> <na> > Chr02 25263035 <na> <na> > Chr03 21816808 <na> <na> > Chr04 24267051 <na> <na> > Chr05 25890704 <na> <na> > ... ... ... ... > scaffold_3584 3654 <na> <na> > scaffold_3595 2008 <na> <na> > scaffold_3648 2796 <na> <na> > scaffold_3664 3053 <na> <na> > scaffold_3681 1022 <na> <na> > > > When I created the reference genome structure, the scaffolds were all > entered in a single, multiple seq object, following instructions. But in my > bam files, the scaffold hits are as shown above. So that could be one > problem, but I don't know how to solve that other than filtering out all of > the alignments to the scaffolds from the bam files. I really would like to > retain the scaffold alignments and not lose all of that precious data. > > The other potential issue is the "NA" entries in the "isCircular" and > "genome" columns. I'm not sure whether those would be a problem. > > Continued thanks, > --Kelly > > ________________________________________ > From: Steve Lianoglou [lianoglou.steve@gene.com] > Sent: Monday, March 31, 2014 1:15 PM > To: Vining, Kelly > Cc: Lukas Chavez; bioconductor@r-project.org > Subject: Re: [BioC] MEDIPS.createSet error > > Hi, > > Caveat being that I've never used MEDIPS, and I'm just going along with > the vignette. > > So, comments inline: > > > On 31 Mar 2014, at 13:09, Vining, Kelly wrote: > > > Hi, > > Thanks for the advice thus far! To confirm what is in my BSgenome > > variable, I did this: > > > >> class(BSgenome) > > [1] "BSgenome" > > attr(,"package") > > [1] "BSgenome" > >> names(BSgenome) > > [1] "Chr01" "Chr02" "Chr03" "Chr04" "Chr05" "Chr06" "Chr07" "Chr08" > > "Chr09" > > [10] "Chr10" "Chr11" "Chr12" "Chr13" "Chr14" "Chr15" "Chr16" "Chr17" > > "Chr18" > > [19] "Chr19" "scaf" > > > > And then: > >> print(BSgenome) > > Black cottonwood genome > > | > > | organism: Populus trichocarpa (Black cottonwood) > > | provider: Phytozome (JGI) > > | provider version: 3.0 > > | release date: January 2010 > > | release name: Populus trichocarpa v3.0 > > | > > | single sequences (see '?seqnames'): > > | Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 > > Chr10 Chr11 > > | Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 > > | > > | multiple sequences (see '?mseqnames'): > > | scaf > > | > > | (use the '' or '[[' operator to access a given sequence) > > > > > > So that looks ok. Interestingly, when I followed the vignette and did > > the equivalent of > > BSgenome="BSgenome.Hsapiens.UCSC.hg19" > > The vignette suggests that BSgenome should be the package name of the > BSgenome package to open, so yours should be something like > "BSgenome.Ptrichocarpa.XXX.YY" or something -- I guess you built this > packge by yourself, or something, so you'd know its name ... > > > That didn't work for me. It only worked without quotes. If I included > > quotes, it just assigned a character vector to that variable. > > Sorry, what exactly didn't work for you? Can you show me the code that > failed? > > > Then, following your advice: > > > >> si.bsg <- seqinfo(BSgenome.Ptrichocarpa.Phytozome.v3) > >> si.bam <- seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) > > Error in seqinfo(BamFile("FallBud_brep1_aln_sorted.bam")) : > > error in evaluating the argument 'x' in selecting a method for > > function 'seqinfo': Error: could not find function "BamFile" > > The BamFile function is defined in the Rsamtools package, you need to > load that first (the first line of code I suggested you run was to load > the Rsamtools package). Load it first, then redo the > seqinfo(BamFile(...)) stuff. > > -steve > [[alternative HTML version deleted]]