Hello everybody,
I've been told about a very nice R package called Repitools. It has
many
interesting features but the use of some of them remains a little
unclear
to me yet.
There are some subjects about this tool I'll have to ask about, in
this
list, but first of all I need to convert some BAM files into GRanges
objects.
To do so I read about BAM2GenomicRanges function. Reading the package
documentation, I tried to convert 3 BAM files into GRanges. I used the
following instructions:
*(where "/path/to/ANALYSIS/folder/" is the directory where my BAM
files are)
bgr <- BAM2GRanges('/path/to/ANALYSIS/folder/',
what = c("rname", "strand", "pos", "qwidth"),
flag = scanBamFlag(isUnmappedQuery = FALSE, isDuplicate = FALSE),
verbose = TRUE)
And I get this error message:
Reading BAM file AW-07.
[bam_header_read] bgzf_check_EOF: Invalid argument
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Error in open.BamFile(BamFile(file, index), "rb") :
SAM/BAM header missing or empty
file: '/path/to/ANALYSIS/folder/'
The BAM files I'm using here where produced by a tophat alignment and
used
in R previously for other analysis with this R code:
library(Rsamtools)
library(GenomicFeatures)
bamlist=list()
src_files=list.files(pattern="*.bam")
for(filename in src_files)
{
tmp=readBamGappedAlignments(filename)
bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
ranges=IRanges(start=start(tmp),end=end(tmp)),
strand=rep("*",length(tmp)))
}
names(bamlist)=src_files
My question is, Why are now considered as invalid BAM files when they
previously worked as correct BAM files? Am I mistaken about the
meaning of
the error message?Did I miss something in the code?
Thanks in advance for your kind help.
JL
On 01/09/2013 06:08 AM, jluis.lavin at unavarra.es wrote:
> Hello everybody,
>
> I've been told about a very nice R package called Repitools. It has
many
> interesting features but the use of some of them remains a little
unclear
> to me yet.
> There are some subjects about this tool I'll have to ask about, in
this
> list, but first of all I need to convert some BAM files into GRanges
> objects.
> To do so I read about BAM2GenomicRanges function. Reading the
package
> documentation, I tried to convert 3 BAM files into GRanges. I used
the
> following instructions:
>
> *(where "/path/to/ANALYSIS/folder/" is the directory where my BAM
files are)
>
> bgr <- BAM2GRanges('/path/to/ANALYSIS/folder/',
Hi -- I'm not a regular user of BAM2GRanges, but I think the first
argument is
supposed to be the path to the bam file, not to the directory in which
the bam
file is located.
After
> example(BAM2GRanges)
compare
> BAM2GRanges(tiny.BAM)
with
> BAM2GRanges(dirname(tiny.BAM))
Martin
> what = c("rname", "strand", "pos", "qwidth"),
> flag = scanBamFlag(isUnmappedQuery = FALSE, isDuplicate = FALSE),
> verbose = TRUE)
>
> And I get this error message:
>
> Reading BAM file AW-07.
> [bam_header_read] bgzf_check_EOF: Invalid argument
> [bam_header_read] invalid BAM binary header (this is not a BAM
file).
> Error in open.BamFile(BamFile(file, index), "rb") :
> SAM/BAM header missing or empty
> file: '/path/to/ANALYSIS/folder/'
>
> The BAM files I'm using here where produced by a tophat alignment
and used
> in R previously for other analysis with this R code:
>
> library(Rsamtools)
> library(GenomicFeatures)
>
> bamlist=list()
> src_files=list.files(pattern="*.bam")
>
> for(filename in src_files)
> {
> tmp=readBamGappedAlignments(filename)
> bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
> ranges=IRanges(start=start(tmp),end=end(tmp)),
> strand=rep("*",length(tmp)))
> }
>
>
> names(bamlist)=src_files
>
> My question is, Why are now considered as invalid BAM files when
they
> previously worked as correct BAM files? Am I mistaken about the
meaning of
> the error message?Did I miss something in the code?
>
> Thanks in advance for your kind help.
>
> JL
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
Hi JL,
On 01/09/2013 06:08 AM, jluis.lavin at unavarra.es wrote:
> Hello everybody,
>
> I've been told about a very nice R package called Repitools. It has
many
> interesting features but the use of some of them remains a little
unclear
> to me yet.
> There are some subjects about this tool I'll have to ask about, in
this
> list, but first of all I need to convert some BAM files into GRanges
> objects.
> To do so I read about BAM2GenomicRanges function. Reading the
package
> documentation, I tried to convert 3 BAM files into GRanges. I used
the
> following instructions:
>
> *(where "/path/to/ANALYSIS/folder/" is the directory where my BAM
files are)
>
> bgr <- BAM2GRanges('/path/to/ANALYSIS/folder/',
> what = c("rname", "strand", "pos", "qwidth"),
oops, using the qwidth for getting the widths of the genomic ranges is
in general incorrect, unless one assumes that all the cigars are
trivial
(Ms only). But that's really a strong assumption and it will almost
never be true, because AFAIK most aligners introduce indels by default
during the alignment process. Tophat in particular, is one of them.
Note that loading a BAM file into a GRanges object can easily be
done with readGappedAlignments() (from the GenomicRanges package),
followed by coercion of the returned GappedAlignments object to
a GRanges object:
gal <- readGappedAlignments("/path/to/your/file.bam")
as(gal, "GRanges")
It will infer the correct width of the genomic ranges based on the
cigar.
> flag = scanBamFlag(isUnmappedQuery = FALSE, isDuplicate = FALSE),
> verbose = TRUE)
>
> And I get this error message:
>
> Reading BAM file AW-07.
> [bam_header_read] bgzf_check_EOF: Invalid argument
> [bam_header_read] invalid BAM binary header (this is not a BAM
file).
> Error in open.BamFile(BamFile(file, index), "rb") :
> SAM/BAM header missing or empty
> file: '/path/to/ANALYSIS/folder/'
>
> The BAM files I'm using here where produced by a tophat alignment
and used
> in R previously for other analysis with this R code:
>
> library(Rsamtools)
> library(GenomicFeatures)
>
> bamlist=list()
> src_files=list.files(pattern="*.bam")
>
> for(filename in src_files)
> {
> tmp=readBamGappedAlignments(filename)
> bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
> ranges=IRanges(start=start(tmp),end=end(tmp)),
> strand=rep("*",length(tmp)))
> }
>
>
> names(bamlist)=src_files
That code looks correct, but I would rather do:
bam_files <- list.files(pattern="*.bam")
gr_list <- lapply(bam_files,
function(bam_file)
as(readGappedAlignments(bam_file), "GRanges"))
names(gr_list) <- bam_files
Cheers,
H.
>
> My question is, Why are now considered as invalid BAM files when
they
> previously worked as correct BAM files? Am I mistaken about the
meaning of
> the error message?Did I miss something in the code?
>
> Thanks in advance for your kind help.
>
> JL
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
First of all I want to thank Herve Pages and Martin Morgan for their
kind
answers. And I want to apologize for this delay answering back, but I
was
offline for a few days...
I used Herve code to read my BAM files into GRanges objects, but I
don't
seem to be able to use that Granges object into the cpgDensityCalc
function.
This is how I try to do it:
-(My files are in the working directory already)
library(Mus.musculus)
library(Repitools)
library(GenomicRanges)
library(Rsamtools)
bam_files <- list.files(pattern="*.bam")
gr_list <- lapply(bam_files,
function(bam_file)
as(readGappedAlignments(bam_file), "GRanges"))
names(gr_list) <- bam_files
require(BSgenome.Mmusculus.UCSC.mm9)
cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
-And I get the following error:
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "cpgDensityCalc", for
signature "character", "BSgenome"
What does this error mean? Can't "gr_list" be used as input for
cpgDensityCalc? Do my R installation lack any module required to
perform
this analysis?
*I'm awfully sorry for my lack of intermediate-advance R scripting
knowledge, I'm trying to fix that...
Thanks in advance, once more
JL
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Mus.musculus_1.0.0
[2] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
[3] org.Mm.eg.db_2.8.0
[4] GO.db_2.8.0
[5] RSQLite_0.11.2
[6] DBI_0.2-5
[7] OrganismDbi_1.0.2
[8] GenomicFeatures_1.10.1
[9] AnnotationDbi_1.20.3
[10] Biobase_2.18.0
[11] Rsamtools_1.10.2
[12] BSgenome.Hsapiens.UCSC.hg18_1.3.19
[13] BiocInstaller_1.8.3
[14] Repitools_1.4.0
[15] BSgenome.Mmusculus.UCSC.mm9_1.3.19
[16] BSgenome_1.26.1
[17] Biostrings_2.26.2
[18] GenomicRanges_1.10.5
[19] IRanges_1.16.4
[20] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 bitops_1.0-5 edgeR_3.0.8
graph_1.36.1
[5] limma_3.14.3 parallel_2.15.1 RBGL_1.34.0
RCurl_1.95-3
[9] rtracklayer_1.18.2 stats4_2.15.1 tools_2.15.1
XML_3.95-0.1
[13] zlibbioc_1.4.0
--
Dr. Jos? Luis Lav?n Trueba
Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN
Hi,
On Mon, Jan 14, 2013 at 5:20 AM, <jluis.lavin at="" unavarra.es=""> wrote:
> First of all I want to thank Herve Pages and Martin Morgan for their
kind
> answers. And I want to apologize for this delay answering back, but
I was
> offline for a few days...
>
> I used Herve code to read my BAM files into GRanges objects, but I
don't
> seem to be able to use that Granges object into the cpgDensityCalc
> function.
> This is how I try to do it:
>
> -(My files are in the working directory already)
>
> library(Mus.musculus)
> library(Repitools)
> library(GenomicRanges)
> library(Rsamtools)
>
> bam_files <- list.files(pattern="*.bam")
> gr_list <- lapply(bam_files,
> function(bam_file)
> as(readGappedAlignments(bam_file),
"GRanges"))
> names(gr_list) <- bam_files
>
> require(BSgenome.Mmusculus.UCSC.mm9)
>
> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
>
> -And I get the following error:
>
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "cpgDensityCalc",
for
> signature "character", "BSgenome"
>
> What does this error mean?
It means that your `gr_list` isn't what you think it is -- it's a
character, perhaps "try-error"?
Look at `head(gr_list)` ... what do you see?
> *I'm awfully sorry for my lack of intermediate-advance R scripting
> knowledge, I'm trying to fix that...
Reading through an Introduction to R might be helpful, if you haven't
done so already.
Bioconductor uses the S4 class system in R, and it would be helpful to
understand some parts of that too. If you look. If you look at
previous bioc courses online, you will often find that Martin does a
quick intro to these things at the beginning of many sessions:
http://bioconductor.org/help/course-materials/
Look for courses that were run at The Hutch (in Seattle)
Perhaps the intro slides here will bear fruit:
http://bioconductor.org/help/course-materials/2010/SeattleIntro/
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Hi JL,
On 01/14/2013 08:27 AM, Steve Lianoglou wrote:
> Hi,
>
> On Mon, Jan 14, 2013 at 5:20 AM, <jluis.lavin at="" unavarra.es="">
wrote:
>> First of all I want to thank Herve Pages and Martin Morgan for
their kind
>> answers. And I want to apologize for this delay answering back, but
I was
>> offline for a few days...
>>
>> I used Herve code to read my BAM files into GRanges objects, but I
don't
>> seem to be able to use that Granges object into the cpgDensityCalc
>> function.
>> This is how I try to do it:
>>
>> -(My files are in the working directory already)
>>
>> library(Mus.musculus)
>> library(Repitools)
>> library(GenomicRanges)
>> library(Rsamtools)
>>
>> bam_files <- list.files(pattern="*.bam")
>> gr_list <- lapply(bam_files,
>> function(bam_file)
>> as(readGappedAlignments(bam_file),
"GRanges"))
>> names(gr_list) <- bam_files
>>
>> require(BSgenome.Mmusculus.UCSC.mm9)
>>
>> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
>>
>> -And I get the following error:
>>
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "cpgDensityCalc",
for
>> signature "character", "BSgenome"
>>
>> What does this error mean?
>
> It means that your `gr_list` isn't what you think it is -- it's a
> character, perhaps "try-error"?
When running your code, I get:
> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window
=600)
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function ?cpgDensityCalc?
for signature ?"list", "BSgenome"?
Not exactly the same error you get: see "list" in the error message
I get, versus "character" in the error message you get? Not sure why
your 'gr_list' ends up being a character vector instead of a list.
Anyway, having it being a list (an ordinary list) doesn't seem to work
either. The reason the call to cpgDensityCalc() fails is because this
is a generic function and there are no "cpgDensityCalc" methods that
will accept the first argument to be a list:
> cpgDensityCalc
standardGeneric for "cpgDensityCalc" defined from package
"Repitools"
function (x, organism, ...)
standardGeneric("cpgDensityCalc")
<environment: 0x717c2a0="">
Methods may be defined for arguments: x, organism
Use showMethods("cpgDensityCalc") for currently available ones.
> showMethods("cpgDensityCalc")
Function: cpgDensityCalc (package Repitools)
x="data.frame", organism="BSgenome"
x="GRanges", organism="BSgenome"
x="GRangesList", organism="BSgenome"
Note that the type of input expected by the function is explained in
its man page (?cpgDensityCalc):
x: A ?data.frame?, with columns ?chr? and ?position?, or
columns
?chr?, ?start?, ?end?, and ?strand?. Also may be a
?GRangesList? object, or ?GRanges?.
So once you've sorted out the reasons why you get a character instead
of
a list of GRanges (I've no clue how this could happen), you'll need to
turn that ordinary list into a GRangesList object with:
gr_list <- GRangesList(gr_list)
before calling cpgDensityCalc().
Hope this helps,
H.
>
> Look at `head(gr_list)` ... what do you see?
>
>> *I'm awfully sorry for my lack of intermediate-advance R scripting
>> knowledge, I'm trying to fix that...
>
> Reading through an Introduction to R might be helpful, if you
haven't
> done so already.
>
> Bioconductor uses the S4 class system in R, and it would be helpful
to
> understand some parts of that too. If you look. If you look at
> previous bioc courses online, you will often find that Martin does a
> quick intro to these things at the beginning of many sessions:
>
> http://bioconductor.org/help/course-materials/
>
> Look for courses that were run at The Hutch (in Seattle)
>
> Perhaps the intro slides here will bear fruit:
>
> http://bioconductor.org/help/course-materials/2010/SeattleIntro/
>
> -steve
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
First of all I want to thank Herve and Steve for their attention to my
questions this time.
@Steve,
Thanks for the links to Bioconductor manuals, I'll try to read and
understand that info as fast & accurately as I'm capable of, trying
not to
ask very elementary questions in this list ;)
@Herve,
Thank you once again for your answer. Once my files were converted
into a
GRangesList object, as you said, everything seemed to work fine.
I also closed my previous R session and started a new one, which may
have
helped if the other session was corrupted somehow, returning the
previous
nosense error.
Here's the final code that works fine in my computer, if its of any
help
for future users ;)
library(Mus.musculus)
library(Repitools)
library(GenomicRanges)
library(Rsamtools)
bam_files <- list.files(pattern="*.bam")
gr_list <- lapply(bam_files,
function(bam_file)
as(readGappedAlignments(bam_file), "GRanges"))
names(gr_list) <- bam_files
gr_list <- GRangesList(gr_list)
require(BSgenome.Mmusculus.UCSC.mm9)
cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
cpgplot <- cpgDensityPlot(gr_list, seq.len=300, organism=Mmusculus,
lwd=4,
verbose=TRUE)
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.19
[2] BSgenome_1.26.1
[3] Rsamtools_1.10.2
[4] Biostrings_2.26.2
[5] Repitools_1.4.0
[6] Mus.musculus_1.0.0
[7] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
[8] org.Mm.eg.db_2.8.0
[9] GO.db_2.8.0
[10] RSQLite_0.11.2
[11] DBI_0.2-5
[12] OrganismDbi_1.0.2
[13] GenomicFeatures_1.10.1
[14] GenomicRanges_1.10.5
[15] IRanges_1.16.4
[16] AnnotationDbi_1.20.3
[17] Biobase_2.18.0
[18] BiocGenerics_0.4.0
[19] BiocInstaller_1.8.3
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 bitops_1.0-5 edgeR_3.0.8
graph_1.36.1
[5] limma_3.14.3 parallel_2.15.1 RBGL_1.34.0
RCurl_1.95-3
[9] rtracklayer_1.18.2 stats4_2.15.1 tools_2.15.1
XML_3.95-0.1
[13] zlibbioc_1.4.0
With best wishes
JL
El Lun, 14 de Enero de 2013, 20:47, Hervé Pagès escribi?:
> Hi JL,
>
> On 01/14/2013 08:27 AM, Steve Lianoglou wrote:
>> Hi,
>>
>> On Mon, Jan 14, 2013 at 5:20 AM, <jluis.lavin at="" unavarra.es="">
wrote:
>>> First of all I want to thank Herve Pages and Martin Morgan for
their
>>> kind
>>> answers. And I want to apologize for this delay answering back,
but I
>>> was
>>> offline for a few days...
>>>
>>> I used Herve code to read my BAM files into GRanges objects, but I
>>> don't
>>> seem to be able to use that Granges object into the cpgDensityCalc
>>> function.
>>> This is how I try to do it:
>>>
>>> -(My files are in the working directory already)
>>>
>>> library(Mus.musculus)
>>> library(Repitools)
>>> library(GenomicRanges)
>>> library(Rsamtools)
>>>
>>> bam_files <- list.files(pattern="*.bam")
>>> gr_list <- lapply(bam_files,
>>> function(bam_file)
>>> as(readGappedAlignments(bam_file),
"GRanges"))
>>> names(gr_list) <- bam_files
>>>
>>> require(BSgenome.Mmusculus.UCSC.mm9)
>>>
>>> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
>>>
>>> -And I get the following error:
>>>
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "cpgDensityCalc",
for
>>> signature "character", "BSgenome"
>>>
>>> What does this error mean?
>>
>> It means that your `gr_list` isn't what you think it is -- it's a
>> character, perhaps "try-error"?
>
> When running your code, I get:
>
> > cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window
=600)
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function
?cpgDensityCalc?
> for signature ?"list", "BSgenome"?
>
> Not exactly the same error you get: see "list" in the error message
> I get, versus "character" in the error message you get? Not sure why
> your 'gr_list' ends up being a character vector instead of a list.
>
> Anyway, having it being a list (an ordinary list) doesn't seem to
work
> either. The reason the call to cpgDensityCalc() fails is because
this
> is a generic function and there are no "cpgDensityCalc" methods that
> will accept the first argument to be a list:
>
> > cpgDensityCalc
> standardGeneric for "cpgDensityCalc" defined from package
"Repitools"
>
> function (x, organism, ...)
> standardGeneric("cpgDensityCalc")
> <environment: 0x717c2a0="">
> Methods may be defined for arguments: x, organism
> Use showMethods("cpgDensityCalc") for currently available ones.
>
> > showMethods("cpgDensityCalc")
> Function: cpgDensityCalc (package Repitools)
> x="data.frame", organism="BSgenome"
> x="GRanges", organism="BSgenome"
> x="GRangesList", organism="BSgenome"
>
> Note that the type of input expected by the function is explained in
> its man page (?cpgDensityCalc):
>
> x: A ?data.frame?, with columns ?chr? and ?position?, or
columns
> ?chr?, ?start?, ?end?, and ?strand?. Also may be a
> ?GRangesList? object, or ?GRanges?.
>
> So once you've sorted out the reasons why you get a character
instead of
> a list of GRanges (I've no clue how this could happen), you'll need
to
> turn that ordinary list into a GRangesList object with:
>
> gr_list <- GRangesList(gr_list)
>
> before calling cpgDensityCalc().
>
> Hope this helps,
> H.
>
>>
>> Look at `head(gr_list)` ... what do you see?
>>
>>> *I'm awfully sorry for my lack of intermediate-advance R scripting
>>> knowledge, I'm trying to fix that...
>>
>> Reading through an Introduction to R might be helpful, if you
haven't
>> done so already.
>>
>> Bioconductor uses the S4 class system in R, and it would be helpful
to
>> understand some parts of that too. If you look. If you look at
>> previous bioc courses online, you will often find that Martin does
a
>> quick intro to these things at the beginning of many sessions:
>>
>> http://bioconductor.org/help/course-materials/
>>
>> Look for courses that were run at The Hutch (in Seattle)
>>
>> Perhaps the intro slides here will bear fruit:
>>
>> http://bioconductor.org/help/course-materials/2010/SeattleIntro/
>>
>> -steve
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
Dr. Jos? Luis Lav?n Trueba
Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN
Dear all,
I'm afraid I have another question derived from the topic of this
post.
I've used Repitools to calculate Cpg density from my samples, it
worked
correctly, but now when I try to write the density calculations to a
file,
I'm unable...
Is it possible to do it?
Here's how I tried to do it:
cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, seq.len = 300)
write.table(cpdens, file = "Cpg_dens.txt", append = FALSE, quote =
TRUE,
sep = "\t", eol = "\n", na = "NA", dec = ".",
row.names = TRUE, col.names = TRUE)
And this is the error message I get:
Error in data.frame(c(0L, 1L, 2L, 3L, 3L, 3L, 1L, 2L, 14L, 14L, 3L,
1L, :
arguments imply differing number of rows: 530842, 3401342
And I also tried:
> write.csv(cpdens, file = "Cpg_dens.csv", quote = TRUE, eol = "\n",
na =
"NA", + row.names = TRUE)
And got again this error:
Error in data.frame(c(0L, 1L, 2L, 3L, 3L, 3L, 1L, 2L, 14L, 14L, 3L,
1L, :
arguments imply differing number of rows: 530842, 3401342
Can somebody explain to me how to write the Cpg density calculations
data
to a file (tab separated file or csv if possible).
Thanks in advance
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.19
[2] BSgenome_1.26.1
[3] Rsamtools_1.10.2
[4] Biostrings_2.26.2
[5] Repitools_1.4.0
[6] Mus.musculus_1.0.0
[7] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
[8] org.Mm.eg.db_2.8.0
[9] GO.db_2.8.0
[10] RSQLite_0.11.2
[11] DBI_0.2-5
[12] OrganismDbi_1.0.2
[13] GenomicFeatures_1.10.1
[14] GenomicRanges_1.10.5
[15] IRanges_1.16.4
[16] AnnotationDbi_1.20.3
[17] Biobase_2.18.0
[18] BiocGenerics_0.4.0
[19] BiocInstaller_1.8.3
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 bitops_1.0-5 edgeR_3.0.8
graph_1.36.1
[5] limma_3.14.3 parallel_2.15.1 RBGL_1.34.0
RCurl_1.95-3
[9] rtracklayer_1.18.2 stats4_2.15.1 tools_2.15.1
XML_3.95-0.1
[13] zlibbioc_1.4.0
--
Dr. Jos? Luis Lav?n Trueba
Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN
Hi,
On Tue, Jan 22, 2013 at 10:14 AM, <jluis.lavin at="" unavarra.es=""> wrote:
> Dear all,
>
> I'm afraid I have another question derived from the topic of this
post.
> I've used Repitools to calculate Cpg density from my samples, it
worked
> correctly, but now when I try to write the density calculations to a
file,
> I'm unable...
> Is it possible to do it?
>
> Here's how I tried to do it:
>
> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, seq.len = 300)
>
> write.table(cpdens, file = "Cpg_dens.txt", append = FALSE, quote =
TRUE,
> sep = "\t", eol = "\n", na = "NA", dec = ".",
> row.names = TRUE, col.names = TRUE)
I've never used Repitools, but:
(1) What is the output of `is(cpdens)` ? If it's not a data.frame,
this won't work
(2) If it is a data.frame, can we see the output of `head(cpdens)` ?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Hello Steve,
(1) What is the output of `is(cpdens)` ? If it's not a data.frame,
this
won't work
is(cpdens)
[1] "list" "vector" "AssayData"
"vectorORfactor"
head(cpdens)
.
.
.
[99841] 1 3 3 3 4 4 2 2 2 1 0 0 0 0 0 1 1 1 1 1 1
1
0 0
[99865] 2 2 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 0 2 1
1
1 0
[99889] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
0
0 0
[99913] 0 3 3 3 3 3 3 3 0 0 0 3 1 13 0 3 2 2 1 0 0
0
1 1
[99937] 0 0 0 2 2 2 2 3 3 1 1 1 1 1 0 0 0 0 0 0 0
2
0 2
[99961] 0 1 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1
1 1
[99985] 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 3301343 entries ]
It doesn't seem to be data.frame as you pointed out...
Is it possible write this to an output file of some kind?
Thanks
JL
El Mar, 22 de Enero de 2013, 16:36, Steve Lianoglou escribi?:
> Hi,
>
> On Tue, Jan 22, 2013 at 10:14 AM, <jluis.lavin at="" unavarra.es="">
wrote:
>> Dear all,
>>
>> I'm afraid I have another question derived from the topic of this
post.
>> I've used Repitools to calculate Cpg density from my samples, it
worked
>> correctly, but now when I try to write the density calculations to
a
>> file,
>> I'm unable...
>> Is it possible to do it?
>>
>> Here's how I tried to do it:
>>
>> cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, seq.len =
300)
>>
>> write.table(cpdens, file = "Cpg_dens.txt", append = FALSE, quote =
TRUE,
>> sep = "\t", eol = "\n", na = "NA", dec = ".",
>> row.names = TRUE, col.names = TRUE)
>
> I've never used Repitools, but:
>
> (1) What is the output of `is(cpdens)` ? If it's not a data.frame,
> this won't work
> (2) If it is a data.frame, can we see the output of `head(cpdens)` ?
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
--
Dr. Jos? Luis Lav?n Trueba
Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN
Hi,
On 01/22/2013 08:29 AM, Steve Lianoglou wrote:
> Hi,
>
> On Tue, Jan 22, 2013 at 10:54 AM, <jluis.lavin at="" unavarra.es="">
wrote:
>> Hello Steve,
>>
>> (1) What is the output of `is(cpdens)` ? If it's not a data.frame,
this
>> won't work
>>
>> is(cpdens)
>> [1] "list" "vector" "AssayData"
"vectorORfactor"
>>
>> head(cpdens)
>> .
>> .
>> .
>>
>> [99841] 1 3 3 3 4 4 2 2 2 1 0 0 0 0 0 1 1 1 1 1
1 1
>> 0 0
>> [99865] 2 2 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 0 2
1 1
>> 1 0
>> [99889] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 0
>> 0 0
>> [99913] 0 3 3 3 3 3 3 3 0 0 0 3 1 13 0 3 2 2 1 0
0 0
>> 1 1
>> [99937] 0 0 0 2 2 2 2 3 3 1 1 1 1 1 0 0 0 0 0 0
0 2
>> 0 2
>> [99961] 0 1 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1
>> 1 1
>> [99985] 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>> [ reached getOption("max.print") -- omitted 3301343 entries ]
>>
>> It doesn't seem to be data.frame as you pointed out...
>> Is it possible write this to an output file of some kind?
>
> Yes, of course.
>
> You just have to ask yourself how you want to serialize it.
>
> ?writeLines can be your friend, here, if you want to save it in some
> ASCII format.
Or you can use some higher-level facilities like export() from the
rtracklayer package to export to a BED file. That means that you
first need to stick the numeric vectors of CpG densities back into
the GRanges object they correspond to ('cpdens' should have the same
shape as the original GRangesList you passed to cpgDensityCalc()
i.e. it should be list of numeric vectors where the i-th list
element has the length of the i-th GRangesList element). Then
you export each of the GRanges object separately. Could be done
with something like:
library(rtracklayer)
for (i in seq_along(gr_list)) {
gr <- gr_list[[i]]
## Need to set the "score" metadata col (see ??export.bed).
mcols(gr)$score <- cpdens[[i]]
filepath <- paste("CpG", i, ".bed", sep="")
export.bed(gr, filepath)
}
Granted that 'gr_list' was obtained by loading BAM files, you should
end up with 1 BED file per original BAM file.
HTH,
H.
>
> -steve
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
Thank you very much Herv?,
Your code really got the job done.
And thanks also to Steve for his nice comments and Tim for his kind
suggestions.
With best wishes
JL
El Mie, 23 de Enero de 2013, 7:40, Hervé Pagès escribi?:
> Hi,
>
> On 01/22/2013 08:29 AM, Steve Lianoglou wrote:
>> Hi,
>>
>> On Tue, Jan 22, 2013 at 10:54 AM, <jluis.lavin at="" unavarra.es="">
wrote:
>>> Hello Steve,
>>>
>>> (1) What is the output of `is(cpdens)` ? If it's not a data.frame,
this
>>> won't work
>>>
>>> is(cpdens)
>>> [1] "list" "vector" "AssayData"
"vectorORfactor"
>>>
>>> head(cpdens)
>>> .
>>> .
>>> .
>>>
>>> [99841] 1 3 3 3 4 4 2 2 2 1 0 0 0 0 0 1 1 1 1
1 1
>>> 1
>>> 0 0
>>> [99865] 2 2 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 0
2 1
>>> 1
>>> 1 0
>>> [99889] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
1 1
>>> 0
>>> 0 0
>>> [99913] 0 3 3 3 3 3 3 3 0 0 0 3 1 13 0 3 2 2 1
0 0
>>> 0
>>> 1 1
>>> [99937] 0 0 0 2 2 2 2 3 3 1 1 1 1 1 0 0 0 0 0
0 0
>>> 2
>>> 0 2
>>> [99961] 0 1 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0
0 1
>>> 1
>>> 1 1
>>> [99985] 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>>> [ reached getOption("max.print") -- omitted 3301343 entries ]
>>>
>>> It doesn't seem to be data.frame as you pointed out...
>>> Is it possible write this to an output file of some kind?
>>
>> Yes, of course.
>>
>> You just have to ask yourself how you want to serialize it.
>>
>> ?writeLines can be your friend, here, if you want to save it in
some
>> ASCII format.
>
> Or you can use some higher-level facilities like export() from the
> rtracklayer package to export to a BED file. That means that you
> first need to stick the numeric vectors of CpG densities back into
> the GRanges object they correspond to ('cpdens' should have the same
> shape as the original GRangesList you passed to cpgDensityCalc()
> i.e. it should be list of numeric vectors where the i-th list
> element has the length of the i-th GRangesList element). Then
> you export each of the GRanges object separately. Could be done
> with something like:
>
> library(rtracklayer)
> for (i in seq_along(gr_list)) {
> gr <- gr_list[[i]]
> ## Need to set the "score" metadata col (see ??export.bed).
> mcols(gr)$score <- cpdens[[i]]
> filepath <- paste("CpG", i, ".bed", sep="")
> export.bed(gr, filepath)
> }
>
> Granted that 'gr_list' was obtained by loading BAM files, you should
> end up with 1 BED file per original BAM file.
>
> HTH,
> H.
>
>>
>> -steve
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
Dr. Jos? Luis Lav?n Trueba
Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN
> the first argument is supposed to be the path to the bam file, not
to the directory in which the bam file is located.
That's right. Also, note the help page.
path : A character vector of length 1. The path of the BAM file.
Notice there is another method BAM2GRangesList described on the help
page for processing multiple BAM file paths into a GRangesList.
Hervé Pagès makes a good point. Although this package isn't for the
analysis of RNA-seq data, and the output of Tophat is not relevant,
this code was written when everyone was using the original Bowtie to
do ChIP-seq alignments, which doesn't perform gapped alignments. This
will be updated.
Dario,
On 01/09/2013 03:00 PM, Dario Strbenac wrote:
>> the first argument is supposed to be the path to the bam file, not
to the directory in which the bam file is located.
>
> That's right. Also, note the help page.
>
> path : A character vector of length 1. The path of the BAM file.
>
> Notice there is another method BAM2GRangesList described on the help
page for processing multiple BAM file paths into a GRangesList.
>
> Hervé Pagès makes a good point. Although this package isn't for the
analysis of RNA-seq data, and the output of Tophat is not relevant,
this code was written when everyone was using the original Bowtie to
do ChIP-seq alignments, which doesn't perform gapped alignments.
Note that it's not just the gaps (i.e. N letters) that have an impact
on the widths of the genomic ranges, but also the indels (I and D
letters). And I think that Bowtie, like probably most aligners (even
the early ones), introduces indels in the alignments.
However, the impact of the indels is typically less than the impact of
the gaps: in the range of 1-10 nucleotides for the former, hundreds of
nucleotides for the latter. But still, depending on the kind of
downstream analysis you're doing, being off by just 1 nucleotide can
have big consequences (e.g. when translating).
Not really worth taking that risk when it's easy to get this exactly
right ;-)
Cheers,
H.
> This will be updated.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
>That code looks correct, but I would rather do:
>
> bam_files <- list.files(pattern="*.bam")
> gr_list <- lapply(bam_files,
> function(bam_file)
> as(readGappedAlignments(bam_file), "GRanges"))
> names(gr_list) <- bam_files
Is it possible to add an option to the coercion to keep the metadata
columns ? For example, if the user specifies they also want to read
the quality scores, the coercion to GRanges currently discards that.
Hi Dario,
On 01/10/2013 05:00 PM, Dario Strbenac wrote:
>> That code looks correct, but I would rather do:
>>
>> bam_files <- list.files(pattern="*.bam")
>> gr_list <- lapply(bam_files,
>> function(bam_file)
>> as(readGappedAlignments(bam_file),
"GRanges"))
>> names(gr_list) <- bam_files
>
> Is it possible to add an option to the coercion to keep the metadata
columns ? For example, if the user specifies they also want to read
the quality scores, the coercion to GRanges currently discards that.
Because of its syntax, coercion doesn't allow additional options.
So I could modify the coercion method to propagate the metadata cols.
Given that the names are propagated, it probably make sense to also
propagate the metadata cols.
Note that the coercion method from GappedAlignments to GRanges is
just calling the granges() getter with no args. I think I should add
the 'use.names' and 'use.mcols' args to this one to let the user
control whether or not propagate the names and/or metadata cols.
Cheers,
H.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
Hi,
On Mon, Jan 14, 2013 at 12:10 PM, Hervé Pagès <hpages at="" fhcrc.org="">
wrote:
[snip]
> Because of its syntax, coercion doesn't allow additional options.
I feel like I often find myself wishing that `as` couldn't have been
defined as something like:
function (object, Class, ..., .strict = TRUE, .ext =
possibleExtends(thisClass, Class))
or something similar, as opposed to the current:
function (object, Class, strict = TRUE, ext =
possibleExtends(thisClass, Class))
I guess this is another one for R-aleph ...
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
>cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, window =600)
This example doesn't make biological sense to me. This creates a 600
base window around the start of each sequencing read. The window
argument is relevant if you have a data.frame of genes, and you'd like
to know the CpG density in a window around the TSS, for example.
For your purpose, I would recommend
cpdens <- cpgDensityCalc(gr_list, organism=Mmusculus, seq.len =
aveLength)
The average DNA fragment length that was sequenced is assigned to
aveLength. Ask the biologist who prepared the sequencing library for
this information. It is related to the position on the gel that they
cut the DNA out.
See http://nar.oxfordjournals.org/content/40/10/e72.full for more
background.
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
First of all I want to thank Herve Pages and Martin Morgan for
their kind
answers. And I want to apologize for this delay answering back, but
I was
offline for a few days...
I used Herve code to read my BAM files into GRanges objects, but I
don't
seem to be able to use that Granges object into the cpgDensityCalc
(1) What is the output of is(cpdens) ? If it's not a data.frame,
this
won't work
is(cpdens)
1 "list" "vector" "AssayData"
I think it is so good happy wheels
The average DNA fragment length that was sequenced is assigned to
aveLength. Ask the biologist who prepared the sequencing library for
this information. It is related to the position on the gel that they
cut the DNA out.