Hello all,
I am attempting to use summarizeOverlaps to assign counts to exons
using paired end Tophat alignments. As an example, I have created a
bamfilelist containing one bamfile and a feature list of exons by
gene. This is followed by using summarizeOverlaps with the single end
parameter set to false.
bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000)
exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
all.exons <- unlist(exons.by.gene)
sumexp <- summarizeOverlaps(
features=all.exons, reads=bfl,
mode=IntersectionNotEmpty,
singleEnd=FALSE,
ignore.strand=TRUE, parallel=TRUE)
Error: cannot allocate vector of size 277.9 Mb
In addition: Warning message:
In readBamGappedAlignmentPairs(bf, param = param) : 'yieldSize' set to
'NA'
Even with the parallel option, this process uses all available memory
which presumably leads to the vector allocation error. While I have
set the yieldSize in the BamFileList, this parameter does not appear
to be used. Running summarizeOverlaps with "singleEnd=TRUE" does not
issue a yieldSize warning. Any ideas on how to get set the yieldSize
with "singleEnd=FALSE"?
Thanks,
Tom
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] DEXSeq_1.4.0
[2] Rsamtools_1.10.2
[3] org.Hs.eg.db_2.8.0
[4] RSQLite_0.11.2
[5] DBI_0.2-5
[6] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0
[7] GenomicFeatures_1.10.1
[8] BSgenome.Hsapiens.UCSC.hg19_1.3.19
[9] BSgenome_1.26.1
[10] Biostrings_2.26.3
[11] GenomicRanges_1.10.6
[12] IRanges_1.16.5
[13] annotate_1.36.0
[14] AnnotationDbi_1.20.3
[15] Biobase_2.18.0
[16] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
[4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
[7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
[10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
Thomas Whisenant, Ph.D.
Salomon Lab, MEM-241
Department of Molecular and Experimental Medicine
The Scripps Research Institute
10550 North Torrey Pines Rd.
La Jolla, CA 92037
thomasw@scripps.edu<mailto:thomasw@scripps.edu>
[[alternative HTML version deleted]]
Hi Tom,
Currently readBamGappedAlignmentPairs() reads all data from a Bam file
into memory to sort and determine proper pairs. This is a problem for
large files and is the reason for the error you see below.
We've been working on alternative approach that involves first sorting
the Bam file by qname. Once sorted, the file can be read in chunks
specified by 'yieldSize' in the BamFile object. This functionality is
available in GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work
in progress and I have not yet implemented a check to ensure the file
is
sorted by qname. If you try this out, please be sure to sort your Bam
file by qname first.
Examples of this approach are on the summarizeOverlaps man page for
the
BamFile method,
?'summarizeOverlaps,GRanges,BamFileList-method'
Here is a piece of the example:
library(pasillaBamSubset)
library("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")
## paired-end sorted by qname:
## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
## can be read in chunks with 'yieldSize'.
fl <- untreated3_chr4()
sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
bf2 <- BamFileList(sortfl, index=character(0),
yieldSize=50000, obeyQname=TRUE)
se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
counts2 <- assays(se2)$counts
## paired-end not sorted:
## If the file is not sorted by qname, all records are read
## into memory for sorting and to determine proper pairs.
## Any 'yieldSize' set on the BamFile will be ignored.
fl <- untreated3_chr4()
bf3 <- BamFileList(fl)
se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
counts3 <- assays(se3)$counts
Another recent feature addition is the GALignmentsList class. The goal
here is to provide a container that holds any type of read, single-
end,
paired-end, singletons, multiple fragments etc. Again, the methods are
based on the user providing a Bam file sorted by qname. Once read in,
the records are split on read id (QNAME in SAM/BAM). The associated
read
functions are in GenomicRanges and Rsamtools.
?readGAlignmentsList (GenomicRanges)
?readBamGAlignmentsList (Rsamtools)
Let me know if you run into problems.
Valerie
On 03/13/2013 11:07 AM, Thomas Whisenant wrote:
> Hello all,
> I am attempting to use summarizeOverlaps to assign counts to exons
using paired end Tophat alignments. As an example, I have created a
bamfilelist containing one bamfile and a feature list of exons by
gene. This is followed by using summarizeOverlaps with the single end
parameter set to false.
>
> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000)
> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
> all.exons <- unlist(exons.by.gene)
> sumexp <- summarizeOverlaps(
> features=all.exons, reads=bfl,
> mode=IntersectionNotEmpty,
> singleEnd=FALSE,
> ignore.strand=TRUE, parallel=TRUE)
> Error: cannot allocate vector of size 277.9 Mb
> In addition: Warning message:
> In readBamGappedAlignmentPairs(bf, param = param) : 'yieldSize' set
to 'NA'
>
> Even with the parallel option, this process uses all available
memory which presumably leads to the vector allocation error. While I
have set the yieldSize in the BamFileList, this parameter does not
appear to be used. Running summarizeOverlaps with "singleEnd=TRUE"
does not issue a yieldSize warning. Any ideas on how to get set the
yieldSize with "singleEnd=FALSE"?
> Thanks,
> Tom
>
>
> R version 2.15.2 (2012-10-26)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=English_United States.1252
>
> [2] LC_CTYPE=English_United States.1252
>
> [3] LC_MONETARY=English_United States.1252
>
> [4] LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
>
>
> attached base packages:
>
> [1] parallel stats graphics grDevices utils datasets
methods
>
> [8] base
>
>
>
> other attached packages:
>
> [1] DEXSeq_1.4.0
>
> [2] Rsamtools_1.10.2
>
> [3] org.Hs.eg.db_2.8.0
>
> [4] RSQLite_0.11.2
>
> [5] DBI_0.2-5
>
> [6] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0
>
> [7] GenomicFeatures_1.10.1
>
> [8] BSgenome.Hsapiens.UCSC.hg19_1.3.19
>
> [9] BSgenome_1.26.1
>
> [10] Biostrings_2.26.3
>
> [11] GenomicRanges_1.10.6
>
> [12] IRanges_1.16.5
>
> [13] annotate_1.36.0
>
> [14] AnnotationDbi_1.20.3
>
> [15] Biobase_2.18.0
>
> [16] BiocGenerics_0.4.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
>
> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
>
> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
>
> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
>
>
>
> Thomas Whisenant, Ph.D.
> Salomon Lab, MEM-241
> Department of Molecular and Experimental Medicine
> The Scripps Research Institute
> 10550 North Torrey Pines Rd.
> La Jolla, CA 92037
> thomasw at scripps.edu<mailto:thomasw at="" scripps.edu="">
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
This sounds interesting. Just want to make sure: does this ensure that
all
alignments for a given QNAME fall within one chunk? What happens if
the
yieldsize is too small to achieve that?
And of course the drawback here is that a QNAME-sorted BAM is not
indexable. So in practice one would need two BAMs, one sorted by POS,
the
other by QNAME.
What about this: preprocess a POS-sorted (and thus indexable) BAM and
add a
flag indicating whether an alignment is the last alignment for a given
QNAME. The scanner then yields all complete QNAMEs after it has
scanned
'yieldSize' such flags. I realize this is more work on your part, but
maybe
it saves the hassle of multiple BAMs? Just an idea.
Michael
On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> Hi Tom,
>
> Currently readBamGappedAlignmentPairs() reads all data from a Bam
file
> into memory to sort and determine proper pairs. This is a problem
for large
> files and is the reason for the error you see below.
>
> We've been working on alternative approach that involves first
sorting the
> Bam file by qname. Once sorted, the file can be read in chunks
specified by
> 'yieldSize' in the BamFile object. This functionality is available
in
> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in
progress and I
> have not yet implemented a check to ensure the file is sorted by
qname. If
> you try this out, please be sure to sort your Bam file by qname
first.
>
> Examples of this approach are on the summarizeOverlaps man page for
the
> BamFile method,
>
> ?'summarizeOverlaps,GRanges,**BamFileList-method'
>
> Here is a piece of the example:
>
> library(pasillaBamSubset)
> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene")
> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene")
>
> ## paired-end sorted by qname:
> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
> ## can be read in chunks with 'yieldSize'.
> fl <- untreated3_chr4()
> sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
> bf2 <- BamFileList(sortfl, index=character(0),
> yieldSize=50000, obeyQname=TRUE)
> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
> counts2 <- assays(se2)$counts
>
> ## paired-end not sorted:
> ## If the file is not sorted by qname, all records are read
> ## into memory for sorting and to determine proper pairs.
> ## Any 'yieldSize' set on the BamFile will be ignored.
> fl <- untreated3_chr4()
> bf3 <- BamFileList(fl)
> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
> counts3 <- assays(se3)$counts
>
>
> Another recent feature addition is the GALignmentsList class. The
goal
> here is to provide a container that holds any type of read, single-
end,
> paired-end, singletons, multiple fragments etc. Again, the methods
are
> based on the user providing a Bam file sorted by qname. Once read
in, the
> records are split on read id (QNAME in SAM/BAM). The associated read
> functions are in GenomicRanges and Rsamtools.
>
> ?readGAlignmentsList (GenomicRanges)
> ?readBamGAlignmentsList (Rsamtools)
>
> Let me know if you run into problems.
>
> Valerie
>
>
>
>
>
>
> On 03/13/2013 11:07 AM, Thomas Whisenant wrote:
>
>> Hello all,
>> I am attempting to use summarizeOverlaps to assign counts to exons
using
>> paired end Tophat alignments. As an example, I have created a
bamfilelist
>> containing one bamfile and a feature list of exons by gene. This is
>> followed by using summarizeOverlaps with the single end parameter
set to
>> false.
>>
>> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000)
>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene,
"gene")
>> all.exons <- unlist(exons.by.gene)
>> sumexp <- summarizeOverlaps(
>> features=all.exons, reads=bfl,
>> mode=IntersectionNotEmpty,
>> singleEnd=FALSE,
>> ignore.strand=TRUE, parallel=TRUE)
>> Error: cannot allocate vector of size 277.9 Mb
>> In addition: Warning message:
>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize'
set to
>> 'NA'
>>
>> Even with the parallel option, this process uses all available
memory
>> which presumably leads to the vector allocation error. While I have
set the
>> yieldSize in the BamFileList, this parameter does not appear to be
used.
>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a
yieldSize
>> warning. Any ideas on how to get set the yieldSize with
"singleEnd=FALSE"?
>> Thanks,
>> Tom
>>
>>
>> R version 2.15.2 (2012-10-26)
>>
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>>
>>
>> locale:
>>
>> [1] LC_COLLATE=English_United States.1252
>>
>> [2] LC_CTYPE=English_United States.1252
>>
>> [3] LC_MONETARY=English_United States.1252
>>
>> [4] LC_NUMERIC=C
>>
>> [5] LC_TIME=English_United States.1252
>>
>>
>>
>> attached base packages:
>>
>> [1] parallel stats graphics grDevices utils datasets
methods
>>
>> [8] base
>>
>>
>>
>> other attached packages:
>>
>> [1] DEXSeq_1.4.0
>>
>> [2] Rsamtools_1.10.2
>>
>> [3] org.Hs.eg.db_2.8.0
>>
>> [4] RSQLite_0.11.2
>>
>> [5] DBI_0.2-5
>>
>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0
>>
>> [7] GenomicFeatures_1.10.1
>>
>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19
>>
>> [9] BSgenome_1.26.1
>>
>> [10] Biostrings_2.26.3
>>
>> [11] GenomicRanges_1.10.6
>>
>> [12] IRanges_1.16.5
>>
>> [13] annotate_1.36.0
>>
>> [14] AnnotationDbi_1.20.3
>>
>> [15] Biobase_2.18.0
>>
>> [16] BiocGenerics_0.4.0
>>
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
>>
>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
>>
>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
>>
>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
>>
>>
>>
>> Thomas Whisenant, Ph.D.
>> Salomon Lab, MEM-241
>> Department of Molecular and Experimental Medicine
>> The Scripps Research Institute
>> 10550 North Torrey Pines Rd.
>> La Jolla, CA 92037
>> thomasw@scripps.edu<mailto:tho**masw@scripps.edu <thomasw@scripps.edu="">>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
>>
> ______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives: http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor="">
>
[[alternative HTML version deleted]]
A while back I was struggling with how I could process large paired
bam
files in managable chunks. I wrote a Python script called to take a
BAM
file and splits in into non-overlapping "loci", or sets of read and
read
pairs that don't overlap reads in other loci. The result is a bunch of
small bam files, one per locus.
https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py
The important point is that read pairs only need to be matched with
other reads in the same locus, which allows reading paired alignments
from a position-sorted BAM file without having to read the whole file
at
once. The disadvantage of this approach is that the minimum required
memory is dependent on the size of the largest locus in the file, and
cannot be controlled easily. However the memory usage is almost
certain
to be significantly less than reading the entire file at once.
If you want to, you could adapt the logic for reading alignment pairs.
Of course, the concept of "loci" breaks down in the presence of fusion
reads, and I'm not certain how it works with spliced reads from
Tophat,
since I believe the splice is encoded into the CIGAR string, which
might
throw off the calculation of the end position.
-Ryan
On 03/14/2013 03:30 PM, Michael Lawrence wrote:
> This sounds interesting. Just want to make sure: does this ensure
that all
> alignments for a given QNAME fall within one chunk? What happens if
the
> yieldsize is too small to achieve that?
>
> And of course the drawback here is that a QNAME-sorted BAM is not
> indexable. So in practice one would need two BAMs, one sorted by
POS, the
> other by QNAME.
>
> What about this: preprocess a POS-sorted (and thus indexable) BAM
and add a
> flag indicating whether an alignment is the last alignment for a
given
> QNAME. The scanner then yields all complete QNAMEs after it has
scanned
> 'yieldSize' such flags. I realize this is more work on your part,
but maybe
> it saves the hassle of multiple BAMs? Just an idea.
>
> Michael
>
>
>
> On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">wrote:
>
>> Hi Tom,
>>
>> Currently readBamGappedAlignmentPairs() reads all data from a Bam
file
>> into memory to sort and determine proper pairs. This is a problem
for large
>> files and is the reason for the error you see below.
>>
>> We've been working on alternative approach that involves first
sorting the
>> Bam file by qname. Once sorted, the file can be read in chunks
specified by
>> 'yieldSize' in the BamFile object. This functionality is available
in
>> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in
progress and I
>> have not yet implemented a check to ensure the file is sorted by
qname. If
>> you try this out, please be sure to sort your Bam file by qname
first.
>>
>> Examples of this approach are on the summarizeOverlaps man page for
the
>> BamFile method,
>>
>> ?'summarizeOverlaps,GRanges,**BamFileList-method'
>>
>> Here is a piece of the example:
>>
>> library(pasillaBamSubset)
>> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene")
>> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene")
>>
>> ## paired-end sorted by qname:
>> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
>> ## can be read in chunks with 'yieldSize'.
>> fl <- untreated3_chr4()
>> sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
>> bf2 <- BamFileList(sortfl, index=character(0),
>> yieldSize=50000, obeyQname=TRUE)
>> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
>> counts2 <- assays(se2)$counts
>>
>> ## paired-end not sorted:
>> ## If the file is not sorted by qname, all records are read
>> ## into memory for sorting and to determine proper pairs.
>> ## Any 'yieldSize' set on the BamFile will be ignored.
>> fl <- untreated3_chr4()
>> bf3 <- BamFileList(fl)
>> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
>> counts3 <- assays(se3)$counts
>>
>>
>> Another recent feature addition is the GALignmentsList class. The
goal
>> here is to provide a container that holds any type of read, single-
end,
>> paired-end, singletons, multiple fragments etc. Again, the methods
are
>> based on the user providing a Bam file sorted by qname. Once read
in, the
>> records are split on read id (QNAME in SAM/BAM). The associated
read
>> functions are in GenomicRanges and Rsamtools.
>>
>> ?readGAlignmentsList (GenomicRanges)
>> ?readBamGAlignmentsList (Rsamtools)
>>
>> Let me know if you run into problems.
>>
>> Valerie
>>
>>
>>
>>
>>
>>
>> On 03/13/2013 11:07 AM, Thomas Whisenant wrote:
>>
>>> Hello all,
>>> I am attempting to use summarizeOverlaps to assign counts to exons
using
>>> paired end Tophat alignments. As an example, I have created a
bamfilelist
>>> containing one bamfile and a feature list of exons by gene. This
is
>>> followed by using summarizeOverlaps with the single end parameter
set to
>>> false.
>>>
>>> bfl <- BamFileList("EP04.bam", index="EP04.bam",
yieldSize=1000000)
>>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene,
"gene")
>>> all.exons <- unlist(exons.by.gene)
>>> sumexp <- summarizeOverlaps(
>>> features=all.exons, reads=bfl,
>>> mode=IntersectionNotEmpty,
>>> singleEnd=FALSE,
>>> ignore.strand=TRUE, parallel=TRUE)
>>> Error: cannot allocate vector of size 277.9 Mb
>>> In addition: Warning message:
>>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize'
set to
>>> 'NA'
>>>
>>> Even with the parallel option, this process uses all available
memory
>>> which presumably leads to the vector allocation error. While I
have set the
>>> yieldSize in the BamFileList, this parameter does not appear to be
used.
>>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a
yieldSize
>>> warning. Any ideas on how to get set the yieldSize with
"singleEnd=FALSE"?
>>> Thanks,
>>> Tom
>>>
>>>
>>> R version 2.15.2 (2012-10-26)
>>>
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>>
>>>
>>> locale:
>>>
>>> [1] LC_COLLATE=English_United States.1252
>>>
>>> [2] LC_CTYPE=English_United States.1252
>>>
>>> [3] LC_MONETARY=English_United States.1252
>>>
>>> [4] LC_NUMERIC=C
>>>
>>> [5] LC_TIME=English_United States.1252
>>>
>>>
>>>
>>> attached base packages:
>>>
>>> [1] parallel stats graphics grDevices utils datasets
methods
>>>
>>> [8] base
>>>
>>>
>>>
>>> other attached packages:
>>>
>>> [1] DEXSeq_1.4.0
>>>
>>> [2] Rsamtools_1.10.2
>>>
>>> [3] org.Hs.eg.db_2.8.0
>>>
>>> [4] RSQLite_0.11.2
>>>
>>> [5] DBI_0.2-5
>>>
>>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0
>>>
>>> [7] GenomicFeatures_1.10.1
>>>
>>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19
>>>
>>> [9] BSgenome_1.26.1
>>>
>>> [10] Biostrings_2.26.3
>>>
>>> [11] GenomicRanges_1.10.6
>>>
>>> [12] IRanges_1.16.5
>>>
>>> [13] annotate_1.36.0
>>>
>>> [14] AnnotationDbi_1.20.3
>>>
>>> [15] Biobase_2.18.0
>>>
>>> [16] BiocGenerics_0.4.0
>>>
>>>
>>>
>>> loaded via a namespace (and not attached):
>>>
>>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
>>>
>>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
>>>
>>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
>>>
>>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
>>>
>>>
>>>
>>> Thomas Whisenant, Ph.D.
>>> Salomon Lab, MEM-241
>>> Department of Molecular and Experimental Medicine
>>> The Scripps Research Institute
>>> 10550 North Torrey Pines Rd.
>>> La Jolla, CA 92037
>>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="" <thomasw="" at="" scripps.edu="">>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor="">
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor="">
>>>
>>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Michael and Ryan,
Yes, all QNAME records fall within one chunk. When yieldSize is too
small, scanBam() will return more than the yieldSize up to the
break/difference in QNAME.
library(pasillaBamSubset)
fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE)
scn <- scanBam(bf, obeyQname=TRUE)
> scn[[1]][c("qname", "seq")]
$qname
[1] "SRR031714.65" "SRR031714.65" "SRR031714.185" "SRR031714.185"
$seq
A DNAStringSet instance of length 4
width seq
[1] 37 CTGAACCGTCGAAGTATTAGCTTGCGGAACAGATCCA
[2] 37 GTTTGGGCTCGAGGGATTGGGCGATTGAGTGGCTATT
[3] 37 GGTGGATACTCGGTTTTCGCGGGAGTTGGTGAACGCA
[4] 37 AGGTGCTCGTGCCCGTGTTGCTCTAACTGGACTGACC
Combining this with position indexing is the next step. Interesting
idea
to create a 'last QNAME' flag. I'll look into that and also the
'non-overlapping loci' approach. Thanks Ryan for sending the code
link.
Valerie
On 03/14/2013 03:57 PM, Ryan C. Thompson wrote:
> A while back I was struggling with how I could process large paired
bam
> files in managable chunks. I wrote a Python script called to take a
BAM
> file and splits in into non-overlapping "loci", or sets of read and
read
> pairs that don't overlap reads in other loci. The result is a bunch
of
> small bam files, one per locus.
>
>
https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py
>
> The important point is that read pairs only need to be matched with
> other reads in the same locus, which allows reading paired
alignments
> from a position-sorted BAM file without having to read the whole
file at
> once. The disadvantage of this approach is that the minimum required
> memory is dependent on the size of the largest locus in the file,
and
> cannot be controlled easily. However the memory usage is almost
certain
> to be significantly less than reading the entire file at once.
>
> If you want to, you could adapt the logic for reading alignment
pairs.
>
> Of course, the concept of "loci" breaks down in the presence of
fusion
> reads, and I'm not certain how it works with spliced reads from
Tophat,
> since I believe the splice is encoded into the CIGAR string, which
might
> throw off the calculation of the end position.
>
> -Ryan
>
> On 03/14/2013 03:30 PM, Michael Lawrence wrote:
>> This sounds interesting. Just want to make sure: does this ensure
that
>> all
>> alignments for a given QNAME fall within one chunk? What happens if
the
>> yieldsize is too small to achieve that?
>>
>> And of course the drawback here is that a QNAME-sorted BAM is not
>> indexable. So in practice one would need two BAMs, one sorted by
POS, the
>> other by QNAME.
>>
>> What about this: preprocess a POS-sorted (and thus indexable) BAM
and
>> add a
>> flag indicating whether an alignment is the last alignment for a
given
>> QNAME. The scanner then yields all complete QNAMEs after it has
scanned
>> 'yieldSize' such flags. I realize this is more work on your part,
but
>> maybe
>> it saves the hassle of multiple BAMs? Just an idea.
>>
>> Michael
>>
>>
>>
>> On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain
>> <vobencha at="" fhcrc.org="">wrote:
>>
>>> Hi Tom,
>>>
>>> Currently readBamGappedAlignmentPairs() reads all data from a Bam
file
>>> into memory to sort and determine proper pairs. This is a problem
for
>>> large
>>> files and is the reason for the error you see below.
>>>
>>> We've been working on alternative approach that involves first
>>> sorting the
>>> Bam file by qname. Once sorted, the file can be read in chunks
>>> specified by
>>> 'yieldSize' in the BamFile object. This functionality is available
in
>>> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in
progress
>>> and I
>>> have not yet implemented a check to ensure the file is sorted by
>>> qname. If
>>> you try this out, please be sure to sort your Bam file by qname
first.
>>>
>>> Examples of this approach are on the summarizeOverlaps man page
for the
>>> BamFile method,
>>>
>>> ?'summarizeOverlaps,GRanges,**BamFileList-method'
>>>
>>> Here is a piece of the example:
>>>
>>> library(pasillaBamSubset)
>>> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene")
>>> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene")
>>>
>>> ## paired-end sorted by qname:
>>> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
>>> ## can be read in chunks with 'yieldSize'.
>>> fl <- untreated3_chr4()
>>> sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
>>> bf2 <- BamFileList(sortfl, index=character(0),
>>> yieldSize=50000, obeyQname=TRUE)
>>> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
>>> counts2 <- assays(se2)$counts
>>>
>>> ## paired-end not sorted:
>>> ## If the file is not sorted by qname, all records are read
>>> ## into memory for sorting and to determine proper pairs.
>>> ## Any 'yieldSize' set on the BamFile will be ignored.
>>> fl <- untreated3_chr4()
>>> bf3 <- BamFileList(fl)
>>> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
>>> counts3 <- assays(se3)$counts
>>>
>>>
>>> Another recent feature addition is the GALignmentsList class. The
goal
>>> here is to provide a container that holds any type of read,
single-end,
>>> paired-end, singletons, multiple fragments etc. Again, the methods
are
>>> based on the user providing a Bam file sorted by qname. Once read
in,
>>> the
>>> records are split on read id (QNAME in SAM/BAM). The associated
read
>>> functions are in GenomicRanges and Rsamtools.
>>>
>>> ?readGAlignmentsList (GenomicRanges)
>>> ?readBamGAlignmentsList (Rsamtools)
>>>
>>> Let me know if you run into problems.
>>>
>>> Valerie
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 03/13/2013 11:07 AM, Thomas Whisenant wrote:
>>>
>>>> Hello all,
>>>> I am attempting to use summarizeOverlaps to assign counts to
exons
>>>> using
>>>> paired end Tophat alignments. As an example, I have created a
>>>> bamfilelist
>>>> containing one bamfile and a feature list of exons by gene. This
is
>>>> followed by using summarizeOverlaps with the single end parameter
>>>> set to
>>>> false.
>>>>
>>>> bfl <- BamFileList("EP04.bam", index="EP04.bam",
yieldSize=1000000)
>>>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene,
"gene")
>>>> all.exons <- unlist(exons.by.gene)
>>>> sumexp <- summarizeOverlaps(
>>>> features=all.exons, reads=bfl,
>>>> mode=IntersectionNotEmpty,
>>>> singleEnd=FALSE,
>>>> ignore.strand=TRUE, parallel=TRUE)
>>>> Error: cannot allocate vector of size 277.9 Mb
>>>> In addition: Warning message:
>>>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize'
>>>> set to
>>>> 'NA'
>>>>
>>>> Even with the parallel option, this process uses all available
memory
>>>> which presumably leads to the vector allocation error. While I
have
>>>> set the
>>>> yieldSize in the BamFileList, this parameter does not appear to
be
>>>> used.
>>>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a
>>>> yieldSize
>>>> warning. Any ideas on how to get set the yieldSize with
>>>> "singleEnd=FALSE"?
>>>> Thanks,
>>>> Tom
>>>>
>>>>
>>>> R version 2.15.2 (2012-10-26)
>>>>
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>>
>>>>
>>>> locale:
>>>>
>>>> [1] LC_COLLATE=English_United States.1252
>>>>
>>>> [2] LC_CTYPE=English_United States.1252
>>>>
>>>> [3] LC_MONETARY=English_United States.1252
>>>>
>>>> [4] LC_NUMERIC=C
>>>>
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>>
>>>>
>>>> attached base packages:
>>>>
>>>> [1] parallel stats graphics grDevices utils datasets
methods
>>>>
>>>> [8] base
>>>>
>>>>
>>>>
>>>> other attached packages:
>>>>
>>>> [1] DEXSeq_1.4.0
>>>>
>>>> [2] Rsamtools_1.10.2
>>>>
>>>> [3] org.Hs.eg.db_2.8.0
>>>>
>>>> [4] RSQLite_0.11.2
>>>>
>>>> [5] DBI_0.2-5
>>>>
>>>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0
>>>>
>>>> [7] GenomicFeatures_1.10.1
>>>>
>>>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19
>>>>
>>>> [9] BSgenome_1.26.1
>>>>
>>>> [10] Biostrings_2.26.3
>>>>
>>>> [11] GenomicRanges_1.10.6
>>>>
>>>> [12] IRanges_1.16.5
>>>>
>>>> [13] annotate_1.36.0
>>>>
>>>> [14] AnnotationDbi_1.20.3
>>>>
>>>> [15] Biobase_2.18.0
>>>>
>>>> [16] BiocGenerics_0.4.0
>>>>
>>>>
>>>>
>>>> loaded via a namespace (and not attached):
>>>>
>>>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
>>>>
>>>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
>>>>
>>>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
>>>>
>>>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
>>>>
>>>>
>>>>
>>>> Thomas Whisenant, Ph.D.
>>>> Salomon Lab, MEM-241
>>>> Department of Molecular and Experimental Medicine
>>>> The Scripps Research Institute
>>>> 10550 North Torrey Pines Rd.
>>>> La Jolla, CA 92037
>>>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="" <thomasw="" at="" scripps.edu="">>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>>
>>>> Search the archives: http://news.gmane.org/gmane.**
>>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor="">
>>>>
>>>>
>>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor="">
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
No problem. Feel free to contact me if you have any questions about
the
code.
On Fri 15 Mar 2013 09:43:40 AM PDT, Valerie Obenchain wrote:
> Hi Michael and Ryan,
>
> Yes, all QNAME records fall within one chunk. When yieldSize is too
> small, scanBam() will return more than the yieldSize up to the
> break/difference in QNAME.
>
> library(pasillaBamSubset)
> fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
> bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE)
> scn <- scanBam(bf, obeyQname=TRUE)
> > scn[[1]][c("qname", "seq")]
> $qname
> [1] "SRR031714.65" "SRR031714.65" "SRR031714.185" "SRR031714.185"
>
> $seq
> A DNAStringSet instance of length 4
> width seq
> [1] 37 CTGAACCGTCGAAGTATTAGCTTGCGGAACAGATCCA
> [2] 37 GTTTGGGCTCGAGGGATTGGGCGATTGAGTGGCTATT
> [3] 37 GGTGGATACTCGGTTTTCGCGGGAGTTGGTGAACGCA
> [4] 37 AGGTGCTCGTGCCCGTGTTGCTCTAACTGGACTGACC
>
>
> Combining this with position indexing is the next step. Interesting
> idea to create a 'last QNAME' flag. I'll look into that and also the
> 'non-overlapping loci' approach. Thanks Ryan for sending the code
link.
>
>
> Valerie
>
> On 03/14/2013 03:57 PM, Ryan C. Thompson wrote:
>> A while back I was struggling with how I could process large paired
bam
>> files in managable chunks. I wrote a Python script called to take a
BAM
>> file and splits in into non-overlapping "loci", or sets of read and
read
>> pairs that don't overlap reads in other loci. The result is a bunch
of
>> small bam files, one per locus.
>>
>>
https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py
>>
>> The important point is that read pairs only need to be matched with
>> other reads in the same locus, which allows reading paired
alignments
>> from a position-sorted BAM file without having to read the whole
file at
>> once. The disadvantage of this approach is that the minimum
required
>> memory is dependent on the size of the largest locus in the file,
and
>> cannot be controlled easily. However the memory usage is almost
certain
>> to be significantly less than reading the entire file at once.
>>
>> If you want to, you could adapt the logic for reading alignment
pairs.
>>
>> Of course, the concept of "loci" breaks down in the presence of
fusion
>> reads, and I'm not certain how it works with spliced reads from
Tophat,
>> since I believe the splice is encoded into the CIGAR string, which
might
>> throw off the calculation of the end position.
>>
>> -Ryan
>>
>> On 03/14/2013 03:30 PM, Michael Lawrence wrote:
>>> This sounds interesting. Just want to make sure: does this ensure
that
>>> all
>>> alignments for a given QNAME fall within one chunk? What happens
if the
>>> yieldsize is too small to achieve that?
>>>
>>> And of course the drawback here is that a QNAME-sorted BAM is not
>>> indexable. So in practice one would need two BAMs, one sorted by
>>> POS, the
>>> other by QNAME.
>>>
>>> What about this: preprocess a POS-sorted (and thus indexable) BAM
and
>>> add a
>>> flag indicating whether an alignment is the last alignment for a
given
>>> QNAME. The scanner then yields all complete QNAMEs after it has
scanned
>>> 'yieldSize' such flags. I realize this is more work on your part,
but
>>> maybe
>>> it saves the hassle of multiple BAMs? Just an idea.
>>>
>>> Michael
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain
>>> <vobencha at="" fhcrc.org="">wrote:
>>>
>>>> Hi Tom,
>>>>
>>>> Currently readBamGappedAlignmentPairs() reads all data from a Bam
file
>>>> into memory to sort and determine proper pairs. This is a problem
for
>>>> large
>>>> files and is the reason for the error you see below.
>>>>
>>>> We've been working on alternative approach that involves first
>>>> sorting the
>>>> Bam file by qname. Once sorted, the file can be read in chunks
>>>> specified by
>>>> 'yieldSize' in the BamFile object. This functionality is
available in
>>>> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in
progress
>>>> and I
>>>> have not yet implemented a check to ensure the file is sorted by
>>>> qname. If
>>>> you try this out, please be sure to sort your Bam file by qname
first.
>>>>
>>>> Examples of this approach are on the summarizeOverlaps man page
for
>>>> the
>>>> BamFile method,
>>>>
>>>> ?'summarizeOverlaps,GRanges,**BamFileList-method'
>>>>
>>>> Here is a piece of the example:
>>>>
>>>> library(pasillaBamSubset)
>>>> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene")
>>>> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene,
"gene")
>>>>
>>>> ## paired-end sorted by qname:
>>>> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
>>>> ## can be read in chunks with 'yieldSize'.
>>>> fl <- untreated3_chr4()
>>>> sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
>>>> bf2 <- BamFileList(sortfl, index=character(0),
>>>> yieldSize=50000, obeyQname=TRUE)
>>>> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
>>>> counts2 <- assays(se2)$counts
>>>>
>>>> ## paired-end not sorted:
>>>> ## If the file is not sorted by qname, all records are read
>>>> ## into memory for sorting and to determine proper pairs.
>>>> ## Any 'yieldSize' set on the BamFile will be ignored.
>>>> fl <- untreated3_chr4()
>>>> bf3 <- BamFileList(fl)
>>>> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
>>>> counts3 <- assays(se3)$counts
>>>>
>>>>
>>>> Another recent feature addition is the GALignmentsList class. The
goal
>>>> here is to provide a container that holds any type of read,
>>>> single-end,
>>>> paired-end, singletons, multiple fragments etc. Again, the
methods are
>>>> based on the user providing a Bam file sorted by qname. Once read
in,
>>>> the
>>>> records are split on read id (QNAME in SAM/BAM). The associated
read
>>>> functions are in GenomicRanges and Rsamtools.
>>>>
>>>> ?readGAlignmentsList (GenomicRanges)
>>>> ?readBamGAlignmentsList (Rsamtools)
>>>>
>>>> Let me know if you run into problems.
>>>>
>>>> Valerie
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 03/13/2013 11:07 AM, Thomas Whisenant wrote:
>>>>
>>>>> Hello all,
>>>>> I am attempting to use summarizeOverlaps to assign counts to
exons
>>>>> using
>>>>> paired end Tophat alignments. As an example, I have created a
>>>>> bamfilelist
>>>>> containing one bamfile and a feature list of exons by gene. This
is
>>>>> followed by using summarizeOverlaps with the single end
parameter
>>>>> set to
>>>>> false.
>>>>>
>>>>> bfl <- BamFileList("EP04.bam", index="EP04.bam",
yieldSize=1000000)
>>>>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene,
"gene")
>>>>> all.exons <- unlist(exons.by.gene)
>>>>> sumexp <- summarizeOverlaps(
>>>>> features=all.exons, reads=bfl,
>>>>> mode=IntersectionNotEmpty,
>>>>> singleEnd=FALSE,
>>>>> ignore.strand=TRUE, parallel=TRUE)
>>>>> Error: cannot allocate vector of size 277.9 Mb
>>>>> In addition: Warning message:
>>>>> In readBamGappedAlignmentPairs(**bf, param = param) :
'yieldSize'
>>>>> set to
>>>>> 'NA'
>>>>>
>>>>> Even with the parallel option, this process uses all available
memory
>>>>> which presumably leads to the vector allocation error. While I
have
>>>>> set the
>>>>> yieldSize in the BamFileList, this parameter does not appear to
be
>>>>> used.
>>>>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue
a
>>>>> yieldSize
>>>>> warning. Any ideas on how to get set the yieldSize with
>>>>> "singleEnd=FALSE"?
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>>
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>>
>>>>>
>>>>> locale:
>>>>>
>>>>> [1] LC_COLLATE=English_United States.1252
>>>>>
>>>>> [2] LC_CTYPE=English_United States.1252
>>>>>
>>>>> [3] LC_MONETARY=English_United States.1252
>>>>>
>>>>> [4] LC_NUMERIC=C
>>>>>
>>>>> [5] LC_TIME=English_United States.1252
>>>>>
>>>>>
>>>>>
>>>>> attached base packages:
>>>>>
>>>>> [1] parallel stats graphics grDevices utils datasets
>>>>> methods
>>>>>
>>>>> [8] base
>>>>>
>>>>>
>>>>>
>>>>> other attached packages:
>>>>>
>>>>> [1] DEXSeq_1.4.0
>>>>>
>>>>> [2] Rsamtools_1.10.2
>>>>>
>>>>> [3] org.Hs.eg.db_2.8.0
>>>>>
>>>>> [4] RSQLite_0.11.2
>>>>>
>>>>> [5] DBI_0.2-5
>>>>>
>>>>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0
>>>>>
>>>>> [7] GenomicFeatures_1.10.1
>>>>>
>>>>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19
>>>>>
>>>>> [9] BSgenome_1.26.1
>>>>>
>>>>> [10] Biostrings_2.26.3
>>>>>
>>>>> [11] GenomicRanges_1.10.6
>>>>>
>>>>> [12] IRanges_1.16.5
>>>>>
>>>>> [13] annotate_1.36.0
>>>>>
>>>>> [14] AnnotationDbi_1.20.3
>>>>>
>>>>> [15] Biobase_2.18.0
>>>>>
>>>>> [16] BiocGenerics_0.4.0
>>>>>
>>>>>
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>
>>>>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3
>>>>>
>>>>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17
>>>>>
>>>>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2
>>>>>
>>>>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0
>>>>>
>>>>>
>>>>>
>>>>> Thomas Whisenant, Ph.D.
>>>>> Salomon Lab, MEM-241
>>>>> Department of Molecular and Experimental Medicine
>>>>> The Scripps Research Institute
>>>>> 10550 North Torrey Pines Rd.
>>>>> La Jolla, CA 92037
>>>>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="">>>>> <thomasw at="" scripps.edu="">>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________**_________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>>>
>>>>>
>>>>> Search the archives: http://news.gmane.org/gmane.**
>>>>> science.biology.informatics.**conductor<http: news.gmane.org="" gm="" ane.science.biology.informatics.conductor="">
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>>
>>>>
>>>> Search the archives: http://news.gmane.org/gmane.**
>>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor="">
>>>>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>