Pasilla data for "Counting with summarizeOverlaps/GenomicRanges"
2
0
Entering edit mode
@darwin-sorento-dichmann-5702
Last seen 10.3 years ago
Greetings, I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, but the pasilla.bam files ("treated1.bam", "untreated1.bam", "untreated2.bam") are not with in the package and the provided link for download is dead (http://www.embl.de/~reyes/Graveley/bam). Anybody know where I can get those data or have a copy? I also tried following the GEO accessions from the original publication, but all I found was GFFs and BEDs, no bams. Any help is greatly appreciated. Best, Darwin ________________________________ Darwin Sorento Dichmann, M.S., PhD University of California, Berkeley Harland Lab Molecular and Cell Biology 571 Life Sciences Addition Berkeley, CA 94720 Phone# (510) 643-7830 Fax# (510) 643-6791 E-mail: dichmann@berkeley.edu Please send Fedex packages to: 163 Life Sciences Addition, attn: Harland lab room 571 [[alternative HTML version deleted]]
• 2.5k views
ADD COMMENT
0
Entering edit mode
Paul Shannon ▴ 750
@paul-shannon-5161
Last seen 10.3 years ago
Hi Darwin, Please give us a little bit more detail on the tutorial you wish to follow. Is it one of the vignettes that accompany GenomicRanges? Also, your sessionInfo() may provide indispensable clues allowing us to help out. Thanks - - Paul On Jan 10, 2013, at 8:33 PM, Darwin Sorento Dichmann wrote: > Greetings, > > I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, but the pasilla.bam files ("treated1.bam", "untreated1.bam", "untreated2.bam") are not with in the package and the provided link for download is dead (http://www.embl.de/~reyes/Graveley/bam). > > Anybody know where I can get those data or have a copy? I also tried following the GEO accessions from the original publication, but all I found was GFFs and BEDs, no bams. > > Any help is greatly appreciated. > > Best, > Darwin > ________________________________ > Darwin Sorento Dichmann, M.S., PhD > University of California, Berkeley > Harland Lab > Molecular and Cell Biology > 571 Life Sciences Addition > Berkeley, CA 94720 > Phone# (510) 643-7830 > Fax# (510) 643-6791 > E-mail: dichmann at berkeley.edu > > Please send Fedex packages to: > 163 Life Sciences Addition, attn: Harland lab room 571 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
The original poster seems to be referring to a line in the pasilla vignette The SAM alignment files from which \Rpackage{pasilla} was generated are available at \url{http://www.embl.de/~reyes/Graveley/bam}. and indeed it seems this link is dead. but it doesn't really matter for the vignette as the pasillaBamSubset package has the bam files needed. On Fri, Jan 11, 2013 at 12:13 AM, Paul Shannon <pshannon@fhcrc.org> wrote: > Hi Darwin, > > Please give us a little bit more detail on the tutorial you wish to > follow. Is it one of the vignettes that accompany GenomicRanges? > > Also, your sessionInfo() may provide indispensable clues allowing us to > help out. > > Thanks - > > - Paul > > On Jan 10, 2013, at 8:33 PM, Darwin Sorento Dichmann wrote: > > > Greetings, > > > > I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, > but the pasilla.bam files ("treated1.bam", "untreated1.bam", > "untreated2.bam") are not with in the package and the provided link for > download is dead (http://www.embl.de/~reyes/Graveley/bam). > > > > Anybody know where I can get those data or have a copy? I also tried > following the GEO accessions from the original publication, but all I found > was GFFs and BEDs, no bams. > > > > Any help is greatly appreciated. > > > > Best, > > Darwin > > ________________________________ > > Darwin Sorento Dichmann, M.S., PhD > > University of California, Berkeley > > Harland Lab > > Molecular and Cell Biology > > 571 Life Sciences Addition > > Berkeley, CA 94720 > > Phone# (510) 643-7830 > > Fax# (510) 643-6791 > > E-mail: dichmann@berkeley.edu > > > > Please send Fedex packages to: > > 163 Life Sciences Addition, attn: Harland lab room 571 > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, Jan 10, 2013 at 9:26 PM, Vincent Carey <stvjc at="" channing.harvard.edu=""> wrote: > The original poster seems to be referring to a line in the pasilla vignette > > The SAM alignment files from which \Rpackage{pasilla} was generated are > available at \url{http://www.embl.de/~reyes/Graveley/bam}. > > and indeed it seems this link is dead. but it doesn't really matter for > the vignette as the > pasillaBamSubset package has the bam files needed. > You might also look at the pasilla package. Dan > On Fri, Jan 11, 2013 at 12:13 AM, Paul Shannon <pshannon at="" fhcrc.org=""> wrote: > >> Hi Darwin, >> >> Please give us a little bit more detail on the tutorial you wish to >> follow. Is it one of the vignettes that accompany GenomicRanges? >> >> Also, your sessionInfo() may provide indispensable clues allowing us to >> help out. >> >> Thanks - >> >> - Paul >> >> On Jan 10, 2013, at 8:33 PM, Darwin Sorento Dichmann wrote: >> >> > Greetings, >> > >> > I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, >> but the pasilla.bam files ("treated1.bam", "untreated1.bam", >> "untreated2.bam") are not with in the package and the provided link for >> download is dead (http://www.embl.de/~reyes/Graveley/bam). >> > >> > Anybody know where I can get those data or have a copy? I also tried >> following the GEO accessions from the original publication, but all I found >> was GFFs and BEDs, no bams. >> > >> > Any help is greatly appreciated. >> > >> > Best, >> > Darwin >> > ________________________________ >> > Darwin Sorento Dichmann, M.S., PhD >> > University of California, Berkeley >> > Harland Lab >> > Molecular and Cell Biology >> > 571 Life Sciences Addition >> > Berkeley, CA 94720 >> > Phone# (510) 643-7830 >> > Fax# (510) 643-6791 >> > E-mail: dichmann at berkeley.edu >> > >> > Please send Fedex packages to: >> > 163 Life Sciences Addition, attn: Harland lab room 571 >> > >> > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Dan, Thanks for your suggestion. I did look at the pasilla package before I emailed the list, but it doesn't seem to contain bam files: ---- lsa-579-005:~ darwin$ ls -l /Library/Frameworks/R.framework/Versions/2 .15/Resources/library/pasilla/extdata total 46184 -rw-r--r-- 1 darwin admin 13589253 Oct 2 13:04 Dmel.BDGP5.25.62.DEXSeq.chr.gff -rw-r--r-- 1 darwin admin 552 Oct 2 13:04 geneIDsinsubset.txt -rw-r--r-- 1 darwin admin 498373 Oct 2 13:04 pasilla_gene_counts.tsv -rw-r--r-- 1 darwin admin 1370030 Oct 2 13:04 treated1fb.txt -rw-r--r-- 1 darwin admin 1359480 Oct 2 13:04 treated2fb.txt -rw-r--r-- 1 darwin admin 1361730 Oct 2 13:04 treated3fb.txt -rw-r--r-- 1 darwin admin 1363665 Oct 2 13:04 untreated1fb.txt -rw-r--r-- 1 darwin admin 1371149 Oct 2 13:04 untreated2fb.txt -rw-r--r-- 1 darwin admin 1358790 Oct 2 13:04 untreated3fb.txt -rw-r--r-- 1 darwin admin 1359414 Oct 2 13:04 untreated4fb.txt ----- lsa-579-005:~ darwin$ head /Library/Frameworks/R.framework/Versions/2. 15/Resources/library/pasilla/extdata/treated1fb.txt FBgn0000003:001 0 FBgn0000008:001 0 FBgn0000008:002 0 FBgn0000008:003 0 FBgn0000008:004 1 FBgn0000008:005 4 FBgn0000008:006 1 FBgn0000008:007 18 FBgn0000008:008 4 FBgn0000008:009 16 Or are those '*fb.txt' files some sort of "debinaried" bam files? Again, thanks for your suggestion. Best wishes, Darwin On Jan 10, 2013, at 9:38 PM, Dan Tenenbaum wrote: > On Thu, Jan 10, 2013 at 9:26 PM, Vincent Carey > <stvjc at="" channing.harvard.edu=""> wrote: >> The original poster seems to be referring to a line in the pasilla vignette >> >> The SAM alignment files from which \Rpackage{pasilla} was generated are >> available at \url{http://www.embl.de/~reyes/Graveley/bam}. >> >> and indeed it seems this link is dead. but it doesn't really matter for >> the vignette as the >> pasillaBamSubset package has the bam files needed. >> > > You might also look at the pasilla package. > Dan > > >> On Fri, Jan 11, 2013 at 12:13 AM, Paul Shannon <pshannon at="" fhcrc.org=""> wrote: >> >>> Hi Darwin, >>> >>> Please give us a little bit more detail on the tutorial you wish to >>> follow. Is it one of the vignettes that accompany GenomicRanges? >>> >>> Also, your sessionInfo() may provide indispensable clues allowing us to >>> help out. >>> >>> Thanks - >>> >>> - Paul >>> >>> On Jan 10, 2013, at 8:33 PM, Darwin Sorento Dichmann wrote: >>> >>>> Greetings, >>>> >>>> I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, >>> but the pasilla.bam files ("treated1.bam", "untreated1.bam", >>> "untreated2.bam") are not with in the package and the provided link for >>> download is dead (http://www.embl.de/~reyes/Graveley/bam). >>>> >>>> Anybody know where I can get those data or have a copy? I also tried >>> following the GEO accessions from the original publication, but all I found >>> was GFFs and BEDs, no bams. >>>> >>>> Any help is greatly appreciated. >>>> >>>> Best, >>>> Darwin >>>> ________________________________ >>>> Darwin Sorento Dichmann, M.S., PhD >>>> University of California, Berkeley >>>> Harland Lab >>>> Molecular and Cell Biology >>>> 571 Life Sciences Addition >>>> Berkeley, CA 94720 >>>> Phone# (510) 643-7830 >>>> Fax# (510) 643-6791 >>>> E-mail: dichmann at berkeley.edu >>>> >>>> Please send Fedex packages to: >>>> 163 Life Sciences Addition, attn: Harland lab room 571 >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
Hi Darwin, As Vince mentioned, the bam files are no longer available at the location specified in the summarizeOverlaps vignette. This location was taken from the DEXSeq vignette which has since been updtated to point to the GEO location, http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18508 Available file types include GFF, SAM and BEDGRAPH. The SAM can be easily converted to BAM with samtools samtools view -h -o outputFile.sam inputFile.bam As an fyi, we have a Bioconductor data package 'pasillaBamSubset' which includes a portion of chromosome 4 from the untreated1 (single-end) and untreated3 (paired-end) files. You may find these smaller files useful for testing. Thanks for the reminder of the dead link. I will update the vignette. Valerie On 01/10/2013 08:33 PM, Darwin Sorento Dichmann wrote: > Greetings, > > I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, but the pasilla.bam files ("treated1.bam", "untreated1.bam", "untreated2.bam") are not with in the package and the provided link for download is dead (http://www.embl.de/~reyes/Graveley/bam). > > Anybody know where I can get those data or have a copy? I also tried following the GEO accessions from the original publication, but all I found was GFFs and BEDs, no bams. > > Any help is greatly appreciated. > > Best, > Darwin > ________________________________ > Darwin Sorento Dichmann, M.S., PhD > University of California, Berkeley > Harland Lab > Molecular and Cell Biology > 571 Life Sciences Addition > Berkeley, CA 94720 > Phone# (510) 643-7830 > Fax# (510) 643-6791 > E-mail: dichmann at berkeley.edu > > Please send Fedex packages to: > 163 Life Sciences Addition, attn: Harland lab room 571 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On 01/11/2013 09:09 AM, Valerie Obenchain wrote: > Hi Darwin, > > As Vince mentioned, the bam files are no longer available at the location > specified in the summarizeOverlaps vignette. This location was taken from the > DEXSeq vignette which has since been updtated to point to the GEO location, > > http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18508 > > Available file types include GFF, SAM and BEDGRAPH. The SAM can be easily > converted to BAM with samtools > > samtools view -h -o outputFile.sam inputFile.bam and Rsamtools::asBam Martin > > > As an fyi, we have a Bioconductor data package 'pasillaBamSubset' which includes > a portion of chromosome 4 from the untreated1 (single-end) and untreated3 > (paired-end) files. You may find these smaller files useful for testing. > > Thanks for the reminder of the dead link. I will update the vignette. > > > Valerie > > > > On 01/10/2013 08:33 PM, Darwin Sorento Dichmann wrote: >> Greetings, >> >> I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, but >> the pasilla.bam files ("treated1.bam", "untreated1.bam", "untreated2.bam") are >> not with in the package and the provided link for download is dead >> (http://www.embl.de/~reyes/Graveley/bam). >> >> Anybody know where I can get those data or have a copy? I also tried following >> the GEO accessions from the original publication, but all I found was GFFs and >> BEDs, no bams. >> >> Any help is greatly appreciated. >> >> Best, >> Darwin >> ________________________________ >> Darwin Sorento Dichmann, M.S., PhD >> University of California, Berkeley >> Harland Lab >> Molecular and Cell Biology >> 571 Life Sciences Addition >> Berkeley, CA 94720 >> Phone# (510) 643-7830 >> Fax# (510) 643-6791 >> E-mail: dichmann at berkeley.edu >> >> Please send Fedex packages to: >> 163 Life Sciences Addition, attn: Harland lab room 571 >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On Fri, Jan 11, 2013 at 9:13 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > and Rsamtools::asBam > which is an awesomely simple and effective function, since it handles a bunch of annoying SAM/BAM-related impedimentia by default! anyone who stumbles on a SAM file at some point will be happy to make its acquaintance thank you Herve and Martin for anticipating everyone else's common case :-) -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear everyone, Just for the record, I have updated the link to the bam files in the pasilla vignette. The new url it should not die anymore. Bests, Alejandro > On Fri, Jan 11, 2013 at 9:13 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > >> and Rsamtools::asBam >> > which is an awesomely simple and effective function, since it handles a > bunch of annoying SAM/BAM-related impedimentia by default! > > anyone who stumbles on a SAM file at some point will be happy to make its > acquaintance > thank you Herve and Martin for anticipating everyone else's common case :-) > >
ADD REPLY
0
Entering edit mode
On 01/11/2013 10:16 AM, Tim Triche, Jr. wrote: > On Fri, Jan 11, 2013 at 9:13 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > >> and Rsamtools::asBam >> > > which is an awesomely simple and effective function, since it handles a > bunch of annoying SAM/BAM-related impedimentia by default! > > anyone who stumbles on a SAM file at some point will be happy to make its > acquaintance > thank you Herve and Martin Well, Martin in that case. Like 95% of the Rsamtools package. So thanks Martin! :-) H. > for anticipating everyone else's common case :-) > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Valerie, Thanks for your response. I did try looking at the GEO location, but since the names of the 82 directories do not provide any information as to what files they contain (at least as far as I could tell), I did not proceed further with that:-P I did try the pasillaBamSubset yesterday before I emailed the list, but I had problems reading the files using BamFileList. I tried: --- > fls <- c("untreated1_chr4.bam", "untreated3_chr4.bam") > path <- "/Library/Frameworks/R.framework/Versions/2.15/Resources/lib rary/pasillaBamSubset/extdata/" > bamlst <- BamFileList(fls, index=character()) > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") Error in .io_check_exists(path(con)) : file(s) do not exist: 'untreated1_chr4.bam' --- I assume that it's because the path to the files (as stored in 'path') is not passed on to bamlst through BamFileList(?). The above was modified from the vignette, which reads: ---- > fls <- c("treated1.bam", "untreated1.bam", "untreated2.bam") > path <- "pathToBAMFiles" > bamlst <- BamFileList(fls, index=character()) > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") ---- I also tried supplying the full path to 'fls' like this: fls <- c("/Library/Frameworks/R.framework/Versions/2.15/Resources/libr ary/pasillaBamSubset/extdata/untreated1_chr4.bam", "/Library/Framework s/R.framework/Versions/2.15/Resources/library/pasillaBamSubset/extdata /untreated3_chr4.bam") bamlst <- BamFileList(fls, index=character()) genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") which works in the sense that the bams are read, but later yields a pretty messy 'design' matrix: > design condition replicate type countfiles 1 untreated 1 single-read /Library/Frameworks/R.framework/Vers ions/2.15/Resources/library/pasillaBamSubset/extdata/untreated1_chr4.b am 2 untreated 3 single-read /Library/Frameworks/R.framework/Vers ions/2.15/Resources/library/pasillaBamSubset/extdata/untreated3_chr4.b am And lots of warnings when applying 'summarizeOverlaps': --- > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 2: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). --- >From what I understand the end results should be a 'CountDataSet' object for use in further downstream analysis, and it looks like I do end up with that: --- > geneCDS CountDataSet (storageMode: environment) assayData: 82 features, 2 samples element names: counts protocolData: none phenoData sampleNames: /Library/Frameworks/R.framework/Versions/2.15/Resources /library/pasillaBamSubset/extdata/untreated1_chr4.bam /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pa sillaBamSubset/extdata/untreated3_chr4.bam varLabels: sizeFactor condition ... countfiles (5 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' pubMedIds: 20921232 Annotation: --- I'll look into how to query CountDataSets to see if it really worked. I would appreciate if someone could help me understand how to supply the path to bam files to BamFileList in a smarter way than I did above. I look forward to applying these great tools. And thank you everybody for your suggestions! Best wishes, Darwin On Jan 11, 2013, at 9:09 AM, Valerie Obenchain wrote: > Hi Darwin, > > As Vince mentioned, the bam files are no longer available at the location specified in the summarizeOverlaps vignette. This location was taken from the DEXSeq vignette which has since been updtated to point to the GEO location, > > http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18508 > > Available file types include GFF, SAM and BEDGRAPH. The SAM can be easily converted to BAM with samtools > > samtools view -h -o outputFile.sam inputFile.bam > > > As an fyi, we have a Bioconductor data package 'pasillaBamSubset' which includes a portion of chromosome 4 from the untreated1 (single- end) and untreated3 (paired-end) files. You may find these smaller files useful for testing. > > Thanks for the reminder of the dead link. I will update the vignette. > > > Valerie > > > > On 01/10/2013 08:33 PM, Darwin Sorento Dichmann wrote: >> Greetings, >> >> I wish to follow the tutorial for summarizeOverlaps from GenomicRanges, but the pasilla.bam files ("treated1.bam", "untreated1.bam", "untreated2.bam") are not with in the package and the provided link for download is dead (http://www.embl.de/~reyes/Graveley/bam). >> >> Anybody know where I can get those data or have a copy? I also tried following the GEO accessions from the original publication, but all I found was GFFs and BEDs, no bams. >> >> Any help is greatly appreciated. >> >> Best, >> Darwin >> ________________________________ >> Darwin Sorento Dichmann, M.S., PhD >> University of California, Berkeley >> Harland Lab >> Molecular and Cell Biology >> 571 Life Sciences Addition >> Berkeley, CA 94720 >> Phone# (510) 643-7830 >> Fax# (510) 643-6791 >> E-mail: dichmann@berkeley.edu >> >> Please send Fedex packages to: >> 163 Life Sciences Addition, attn: Harland lab room 571 >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Darwin, There are a couple of issues going on here. Single-end and paired-end reads need to be handled separately. The BamFileList should hold either single or paired-end and the 'singleEnd' argument needs to be specified. See the ?BamFileList man page for examples. The output of summarizeOverlaps is a SummarizedExperiment object not a countDataSet. To read more about this see the ?summarizeOverlaps man page. In the summarizeOverlaps vignette, a countDataSet is created from the counts in the assays() slot of the SummarizedExperiment object. More below. On 01/11/13 13:08, Darwin Sorento Dichmann wrote: > Hi Valerie, > > Thanks for your response. I did try looking at the GEO location, but > since the names of the 82 directories do not provide any information > as to what files they contain (at least as far as I could tell), I did > not proceed further with that:-P > > I did try the pasillaBamSubset yesterday before I emailed the list, > but I had problems reading the files using BamFileList. I tried: > > --- > > fls <- c("untreated1_chr4.bam", "untreated3_chr4.bam") > > path <- > "/Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/" > > bamlst <- BamFileList(fls, index=character()) > > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") > Error in .io_check_exists(path(con)) : file(s) do not exist: > 'untreated1_chr4.bam' > --- > I assume that it's because the path to the files (as stored in 'path') > is not passed on to bamlst through BamFileList(?). The above was > modified from the vignette, which reads: > > ---- > > fls<- c("treated1.bam", "untreated1.bam", "untreated2.bam") > > path<- "pathToBAMFiles" > > bamlst<- BamFileList(fls, index=character()) > > genehits<- summarizeOverlaps(chr4genes, bamlst, mode="Union") > ---- > I also tried supplying the full path to 'fls' like this: > > fls <- > c("/Library/Frameworks/R.framework/Versions/2.15/Resources/library/p asillaBamSubset/extdata/untreated1_chr4.bam", > "/Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/untreated3_chr4.bam") A better way to do this is with system.file. See ?BamFileList for an example. fl <- system.file("extdata", "untreated1_chr4.bam", package="pasillaBamSubset", mustWork=TRUE) > bamlst <- BamFileList(fls, index=character()) > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") > > which works in the sense that the bams are read, but later yields a > pretty messy 'design' matrix: > > design > condition replicate type > countfiles > 1 untreated 1 single-read > /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pasi llaBamSubset/extdata/untreated1_chr4.bam > 2 untreated 3 single-read > /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pasi llaBamSubset/extdata/untreated3_chr4.bam This is correct. It is only 'messy' because the paths to the files are long. > > And lots of warnings when applying 'summarizeOverlaps': > > --- > > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") > Warning messages: > 1: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet > - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, > chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, > chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 2: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet > - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, > chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, > chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > --- These warnings indicate mismatches between your annotation (chr4genes) and your Bam file (bamlst). - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, chrchrU, chrchrXHet, ... Your chromosome names do not match. Look at the example on the ?BamFile man page in the 'summarizeOverlaps with BamFileList' section and make sure you can run that. Some important things to note: - The output of summarizeOverlaps is a SummarizedExperiment - A countDataSet can be created from the counts in the assays() slot of the SummarizedExperiment object - untreated1 and untreated3 are treated separately because they are single- and paired-end - yieldSize can be used to iterate through large files Valerie > > From what I understand the end results should be a 'CountDataSet' > object for use in further downstream analysis, and it looks like I do > end up with that: > --- > > geneCDS > CountDataSet (storageMode: environment) > assayData: 82 features, 2 samples > element names: counts > protocolData: none > phenoData > sampleNames: > /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pasi llaBamSubset/extdata/untreated1_chr4.bam > > /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pasi llaBamSubset/extdata/untreated3_chr4.bam > varLabels: sizeFactor condition ... countfiles (5 total) > varMetadata: labelDescription > featureData: none > experimentData: use 'experimentData(object)' > pubMedIds: 20921232 > Annotation: > --- > I'll look into how to query CountDataSets to see if it really worked. > > I would appreciate if someone could help me understand how to supply > the path to bam files to BamFileList in a smarter way than I did > above. I look forward to applying these great tools. > > And thank you everybody for your suggestions! > > Best wishes, > Darwin > > > On Jan 11, 2013, at 9:09 AM, Valerie Obenchain wrote: > >> Hi Darwin, >> >> As Vince mentioned, the bam files are no longer available at the >> location specified in the summarizeOverlaps vignette. This location >> was taken from the DEXSeq vignette which has since been updtated to >> point to the GEO location, >> >> http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18508 >> >> Available file types include GFF, SAM and BEDGRAPH. The SAM can be >> easily converted to BAM with samtools >> >> samtools view -h -o outputFile.sam inputFile.bam >> >> >> As an fyi, we have a Bioconductor data package 'pasillaBamSubset' >> which includes a portion of chromosome 4 from the untreated1 >> (single-end) and untreated3 (paired-end) files. You may find these >> smaller files useful for testing. >> >> Thanks for the reminder of the dead link. I will update the vignette. >> >> >> Valerie >> >> >> >> On 01/10/2013 08:33 PM, Darwin Sorento Dichmann wrote: >>> Greetings, >>> >>> I wish to follow the tutorial for summarizeOverlaps from >>> GenomicRanges, but the pasilla.bam files ("treated1.bam", >>> "untreated1.bam", "untreated2.bam") are not with in the package and >>> the provided link for download is dead >>> (http://www.embl.de/~reyes/Graveley/bam). >>> >>> Anybody know where I can get those data or have a copy? I also tried >>> following the GEO accessions from the original publication, but all >>> I found was GFFs and BEDs, no bams. >>> >>> Any help is greatly appreciated. >>> >>> Best, >>> Darwin >>> ________________________________ >>> Darwin Sorento Dichmann, M.S., PhD >>> University of California, Berkeley >>> Harland Lab >>> Molecular and Cell Biology >>> 571 Life Sciences Addition >>> Berkeley, CA 94720 >>> Phone# (510) 643-7830 >>> Fax# (510) 643-6791 >>> E-mail: dichmann at berkeley.edu >>> >>> Please send Fedex packages to: >>> 163 Life Sciences Addition, attn: Harland lab room 571 >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
On 01/12/2013 09:50 AM, Valerie Obenchain wrote: > Hi Darwin, > > There are a couple of issues going on here. Single-end and paired- end > reads need to be handled separately. The BamFileList should hold either > single or paired-end and the 'singleEnd' argument needs to be specified. > See the ?BamFileList man page for examples. > > The output of summarizeOverlaps is a SummarizedExperiment object not a > countDataSet. To read more about this see the ?summarizeOverlaps man > page. In the summarizeOverlaps vignette, a countDataSet is created from > the counts in the assays() slot of the SummarizedExperiment object. > > More below. > > On 01/11/13 13:08, Darwin Sorento Dichmann wrote: >> Hi Valerie, >> >> Thanks for your response. I did try looking at the GEO location, but >> since the names of the 82 directories do not provide any information >> as to what files they contain (at least as far as I could tell), I did >> not proceed further with that:-P >> >> I did try the pasillaBamSubset yesterday before I emailed the list, >> but I had problems reading the files using BamFileList. I tried: >> >> --- >> > fls <- c("untreated1_chr4.bam", "untreated3_chr4.bam") >> > path <- >> "/Library/Frameworks/R.framework/Versions/2.15/Resources/library/pa sillaBamSubset/extdata/" >> >> > bamlst <- BamFileList(fls, index=character()) >> > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") >> Error in .io_check_exists(path(con)) : file(s) do not exist: >> 'untreated1_chr4.bam' >> --- >> I assume that it's because the path to the files (as stored in 'path') >> is not passed on to bamlst through BamFileList(?). The above was >> modified from the vignette, which reads: >> >> ---- >> > fls<- c("treated1.bam", "untreated1.bam", "untreated2.bam") >> > path<- "pathToBAMFiles" >> > bamlst<- BamFileList(fls, index=character()) >> > genehits<- summarizeOverlaps(chr4genes, bamlst, mode="Union") >> ---- >> I also tried supplying the full path to 'fls' like this: >> >> fls <- >> c("/Library/Frameworks/R.framework/Versions/2.15/Resources/library/ pasillaBamSubset/extdata/untreated1_chr4.bam", >> "/Library/Frameworks/R.framework/Versions/2.15/Resources/library/pa sillaBamSubset/extdata/untreated3_chr4.bam") >> > > A better way to do this is with system.file. See ?BamFileList for an > example. > > fl <- system.file("extdata", "untreated1_chr4.bam", > package="pasillaBamSubset", mustWork=TRUE) Or even better: untreated1_chr4() and untreated3_chr4() to get those paths. The main reason for putting those little wrappers in pasillaBamSubset was to make it easier to document the files: ?untreated1_chr4 (otherwise, and AFAIK, there is no built-in mechanism in R for documenting "external data"). H. > >> bamlst <- BamFileList(fls, index=character()) >> genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") >> >> which works in the sense that the bams are read, but later yields a >> pretty messy 'design' matrix: >> > design >> condition replicate type >> countfiles >> 1 untreated 1 single-read >> /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/untreated1_chr4.bam >> >> 2 untreated 3 single-read >> /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/untreated3_chr4.bam >> > > This is correct. It is only 'messy' because the paths to the files are > long. > >> >> And lots of warnings when applying 'summarizeOverlaps': >> >> --- >> > genehits <- summarizeOverlaps(chr4genes, bamlst, mode="Union") >> Warning messages: >> 1: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet >> - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, >> chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, >> chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 2: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet >> - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, >> chrchrU, chrchrXHet, chrchr2RHet, chrchrdmel_mitochondrion_genome, >> chrchrYHet, chrchr2R, chrchr3LHet, chrchr3RHet, chrchr2L >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> --- > > These warnings indicate mismatches between your annotation (chr4genes) > and your Bam file (bamlst). > > - in 'x': chr2L, chr2R, chr3L, chr3R, chr4, chrM, chrX, chrYHet > > - in 'y': chrchr3R, chrchrX, chrchr4, chrchr3L, chrchr2LHet, chrchrU, > chrchrXHet, ... > > Your chromosome names do not match. Look at the example on the ?BamFile > man page in the 'summarizeOverlaps with BamFileList' section and make > sure you can run that. Some important things to note: > > - The output of summarizeOverlaps is a SummarizedExperiment > - A countDataSet can be created from the counts in the assays() slot of > the SummarizedExperiment object > - untreated1 and untreated3 are treated separately because they are > single- and paired-end > - yieldSize can be used to iterate through large files > > > Valerie > >> >> From what I understand the end results should be a 'CountDataSet' >> object for use in further downstream analysis, and it looks like I do >> end up with that: >> --- >> > geneCDS >> CountDataSet (storageMode: environment) >> assayData: 82 features, 2 samples >> element names: counts >> protocolData: none >> phenoData >> sampleNames: >> /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/untreated1_chr4.bam >> >> /Library/Frameworks/R.framework/Versions/2.15/Resources/library/pas illaBamSubset/extdata/untreated3_chr4.bam >> >> varLabels: sizeFactor condition ... countfiles (5 total) >> varMetadata: labelDescription >> featureData: none >> experimentData: use 'experimentData(object)' >> pubMedIds: 20921232 >> Annotation: >> --- >> I'll look into how to query CountDataSets to see if it really worked. >> >> I would appreciate if someone could help me understand how to supply >> the path to bam files to BamFileList in a smarter way than I did >> above. I look forward to applying these great tools. >> >> And thank you everybody for your suggestions! >> >> Best wishes, >> Darwin >> >> >> On Jan 11, 2013, at 9:09 AM, Valerie Obenchain wrote: >> >>> Hi Darwin, >>> >>> As Vince mentioned, the bam files are no longer available at the >>> location specified in the summarizeOverlaps vignette. This location >>> was taken from the DEXSeq vignette which has since been updtated to >>> point to the GEO location, >>> >>> http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18508 >>> >>> Available file types include GFF, SAM and BEDGRAPH. The SAM can be >>> easily converted to BAM with samtools >>> >>> samtools view -h -o outputFile.sam inputFile.bam >>> >>> >>> As an fyi, we have a Bioconductor data package 'pasillaBamSubset' >>> which includes a portion of chromosome 4 from the untreated1 >>> (single-end) and untreated3 (paired-end) files. You may find these >>> smaller files useful for testing. >>> >>> Thanks for the reminder of the dead link. I will update the vignette. >>> >>> >>> Valerie >>> >>> >>> >>> On 01/10/2013 08:33 PM, Darwin Sorento Dichmann wrote: >>>> Greetings, >>>> >>>> I wish to follow the tutorial for summarizeOverlaps from >>>> GenomicRanges, but the pasilla.bam files ("treated1.bam", >>>> "untreated1.bam", "untreated2.bam") are not with in the package and >>>> the provided link for download is dead >>>> (http://www.embl.de/~reyes/Graveley/bam). >>>> >>>> Anybody know where I can get those data or have a copy? I also tried >>>> following the GEO accessions from the original publication, but all >>>> I found was GFFs and BEDs, no bams. >>>> >>>> Any help is greatly appreciated. >>>> >>>> Best, >>>> Darwin >>>> ________________________________ >>>> Darwin Sorento Dichmann, M.S., PhD >>>> University of California, Berkeley >>>> Harland Lab >>>> Molecular and Cell Biology >>>> 571 Life Sciences Addition >>>> Berkeley, CA 94720 >>>> Phone# (510) 643-7830 >>>> Fax# (510) 643-6791 >>>> E-mail: dichmann at berkeley.edu >>>> >>>> Please send Fedex packages to: >>>> 163 Life Sciences Addition, attn: Harland lab room 571 >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 391 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6