Hi Jorge,
On Jul 20, 2012, at 6:54 PM, Jorge Beira wrote:
>
>
> Hi Nico,
>
> I started R again to look into it as you suggested, and when I
loaded the easyRNASeq library, I got this (don't know if this is the
info you wanted):
>
>> library(easyRNASeq)
> Loading required package: parallel
> Loading required package: genomeIntervals
> Loading required package: intervals
> Loading required package: BiocGenerics
>
> Attaching package: ?BiocGenerics?
>
> The following object(s) are masked from ?package:stats?:
>
> xtabs
>
> The following object(s) are masked from ?package:base?:
>
> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
get, intersect, lapply, Map, mapply, mget, order, paste,
> pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
rownames, sapply, setdiff, table, tapply, union,
> unique
>
> Loading required package: Biobase
> Welcome to Bioconductor
>
> Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")',
> and for packages 'citation("pkgname")'.
>
> Loading required package: biomaRt
> Error: package ?edgeR? 2.6.9 was found, but >= 2.7.23 is required by
?easyRNASeq?
> In addition: Warning messages:
> 1: replacing previous import ?coerce? when loading ?intervals?
> 2: replacing previous import ?initialize? when loading ?intervals?
>
>
>
That's more like it.
>
> --------
> However, then I re-installed the package and got:
>
>> library(easyRNASeq)
> Error: package ?edgeR? 2.6.9 was found, but >= 2.7.23 is required by
?easyRNASeq?
>> source("
http://bioconductor.org/biocLite.R") biocLite("easyRNASeq")
> Error: unexpected symbol in
"source("
http://bioconductor.org/biocLite.R") biocLite"
>> source("
http://bioconductor.org/biocLite.R")
> BiocInstaller version 1.4.7, ?biocLite for help
>> biocLite("easyRNASeq")
> BioC_mirror:
http://bioconductor.org
> Using R version 2.15, BiocInstaller version 1.4.7.
> Warning: unable to access index for repository http://brainarray.mbn
i.med.umich.edu/bioc/bin/macosx/leopard/contrib/2.15
> Installing package(s) 'easyRNASeq'
> Warning: unable to access index for repository http://brainarray.mbn
i.med.umich.edu/bioc/bin/macosx/leopard/contrib/2.15
> trying URL '
http://www.bioconductor.org/packages/2.10/bioc/bin/macos
x/leopard/contrib/2.15/easyRNASeq_1.2.3.tgz'
> Content type 'application/x-gzip' length 586109 bytes (572 Kb)
> opened URL
> ==================================================
> downloaded 572 Kb
>
>
> The downloaded binary packages are in
> /var/folders/Vu/VuRnTG0pHVChk0UeyLVkr++++TM/-Tmp-//RtmplXgRF0/
downloaded_packages
> Warning: unable to access index for repository http://brainarray.mbn
i.med.umich.edu/bioc/bin/macosx/leopard/contrib/2.15
>
>
>
> -------------------------
>
Exactly what should have happened the first time. Very weird you got
1.3.8 installed. Anyway, now you're set :-)
>
> I got it to work now, using:
>
> setwd("/Users/jbeira/Desktop/bams")
> library("easyRNASeq")
> library(BSgenome.Dmelanogaster.UCSC.dm3)
> count.table <- easyRNASeq(
> filesDirectory=getwd(),
> filenames=c("J05_orig_genome.sorted.bam"),
> format="bam",
> count="exons",
> organism="Dmelanogaster",
> chr.sizes=as.list(seqlengths(Dmelanogaster)),
> readLength=75L,
> annotationMethod="gff",
> annotationFile="dmel-all-r5.45.gff")
>
>
> However the processing is too much for my machine... and I've tried
it in the lab one (24GB RAM) and still couldn't do it. Says it
couldn't allocate a 1.2 Gb sized vector. The thing is, I am starting
with a 20.5 GB .sorted.bam file -- I found it a bit too big, but don't
know why really, but now I'm wondering if there's something wrong with
my alignment / sorting as normally BAMs should be smaller (before
sorting was 6.9 GB). Any ideas now?
Wow, that's huge! It's no surprise you get memory issue, i.e. while
reading the BAM file, easyRNASeq will need as much memory as your bam
file => 20GB+. It's on my TODO list to use the new streaming
functionalities that Martin provided for reading BAM files, but
haven't got the time to put my hands on that yet.
If you're unsorted bam file is 6.9GB, there's no reason that it get 3x
larger when sorted. Are you sure, it is a bam file? I.e. what does
this command reports (in the terminal)
file J05_orig_genome.sorted.bam
I suspect that either you asked for an uncompressed bam file as ouptut
or that you got a sam file as the output of your sort command for some
reason. Check the samtools webpage to make sure you have been using
the proper parameters.
Cheers,
Nico
>
> At least I got it to the running phase...! thanks for your help so
far!! Hope the next bit isn't too hard to get around.
>
> Thanks
>
> Jorge
>
>
>
> On Fri Jul 20 16:15:22 2012, Nicolas Delhomme wrote:
>>
>> On Jul 20, 2012, at 3:57 PM, Jorge Beira wrote:
>>
>>>
>>> Hi Nico,
>>> It looks like I'm making some progress slowly... so it's good.
>>>
>>> On Fri Jul 20 14:43:49 2012, Nicolas Delhomme wrote:
>>>
>>>>
>>>> I'm really surprised you could install version 1.3.8 actually. It
requires edgeR >= 2.7.23 (yours is 2.6.9), and IRanges >= 1.15.18
(yours is 1.14.4). How did you do the installation?
>>>>
>>>
>>> I installed it with the command "
source("
http://bioconductor.org/biocLite.R") biocLite("easyRNASeq")"
and then it asked me if I wanted to update the dependencies to which I
said yes... as long as it works I'm happy ;)
>>
>> This is extremely strange. Since your version of BiocInstaller is
1.4.7 it should never have installed easyRNASeq 1.3.8. Can you open a
fresh R session, load easyRNASeq and report the sessionInfo?
>>
>>>
>>>
>>>> Anyway, if you want to use the development version of the
easyRNASeq package, you should use the development version of every
package (knowing that they are then more unstable and might be broken
for a couple of days from time to time). If you want to use the
development version, you should do the following:
>>>>
>>>> library(BiocInstaller)
>>>> useDevel(TRUE)
>>>> chooseBioCmirror() ## pick the closest one
>>>> update.packages(ask=FALSE)
>>>>
>>>> If not, you're better off reverting to easyRNASeq version 1.2.3.
>>>>
>>>>>
>>>>> -------
>>>>>
>>>>> getting back to the issues:
>>>>>
>>>>> indeed when I query file.exists(system.file("dmel-
all-r5.45.gff")) is turns FALSE -- but now this is very strange, then,
since I know the file is in the bams directory together with the bam
files so why is it not reading it?.... In any case looks like the
major problem is here before I can proceed. Any idea on this one?...
>>>>>
>>>>
>>>> system.file is not appropriate here, look at the help about it
for more details (it basically look up files in your R installation
directories), i.e. in R type:
>>>>
>>>> ?system.file
>>>>
>>>> Just using annotationFile="dmel-all-r5.45.gff" should do it.
>>>>
>>>>> I aligned my reads to the "dmel-all-chromosome-r5.45.fasta"
version, however now I'm using the gff file "dmel-all-r5.45.gff" --
does this suggest any issue to you?
>>>>
>>>> No, that should be fine, but we can double check. What does the
following gives you (in a terminal, not in R)
>>>>
>>>> grep ">" dmel-all-chromosome-r5.45.fasta
>>>>
>>>
>>>
>>> This command gives me what I paste below, so from the looks of it
there's all the info...:
>>
>> Good, that looks as expected.
>>
>>>
>>> (What if I align it to the transcriptome, could I still use
easyRNASeq or would there be problems to decide what sequences
correspond to each feature...? It's just something I've been wondering
about)
>>
>> I would not do that for several reasons.
>>
>> 1) Yes you could use easyRNASeq, but you would have to define your
own gff/gtf file, because the one you use uses genomic coordinates and
if you align to the transcriptome, you would get transcriptomic
coordinates.
>>
>> 2) Aligning against the transcriptome is fine if that is the only
reference you have. Drosophila has a well known genome so it's much
better to align against that. Indeed, it might be that a read aligns
perfectly to the genome (i.e. in an enhancer (see the eRNAs paper from
Kim et al, Nature, 2010)) and only incorrectly to the transcriptome.
By aligning to the transcriptome only, such reads would be incorrectly
assigned and would bias your results. If you're worried about reads
spanning EEJ and/or multiple mapping reads, you should rather use an
aligner that can handle that (tophat, RSEM,..) rather than using the
transcriptome as a reference for your alignment.
>>
>> Cheers,
>>
>> Nico
>>
>>>
>>> thanks!
>>> J
>>>
>>> grep ">" dmel-all-chromosome-r5.45.fasta
>>>
>>>> YHet type=chromosome_arm; loc=YHet:1..347038; ID=YHet;
dbxref=REFSEQ:NW_001848860,GB:CM000461;
MD5=478fbc07ea1b1c03b3d8d04abacf51a5; length=347038; release=r5.45;
species=Dmel;
>>>> dmel_mitochondrion_genome type=chromosome;
loc=dmel_mitochondrion_genome:1..19517; ID=dmel_mitochondrion_genome;
dbxref=GB:NC_001709; MD5=61af8db53361cd5744f41f773d21c3d4;
length=19517; release=r5.45; species=Dmel;
>>>> 2L type=chromosome_arm; loc=2L:1..23011544; ID=2L;
dbxref=REFSEQ:NT_033779,GB:AE014134;
MD5=bfdfb99d39fa5174dae1e2ecd8a231cd; length=23011544; release=r5.45;
species=Dmel;
>>>> X type=chromosome_arm; loc=X:1..22422827; ID=X;
dbxref=REFSEQ:NC_004354,GB:AE014298;
MD5=a83251de102926ba52e3e71c72f1d9b0; length=22422827; release=r5.45;
species=Dmel;
>>>> 3L type=chromosome_arm; loc=3L:1..24543557; ID=3L;
dbxref=REFSEQ:NT_037436,GB:AE014296;
MD5=ec7148cae3daabbd2a226eaa6e85d7c2; length=24543557; release=r5.45;
species=Dmel;
>>>> 4 type=chromosome_arm; loc=4:1..1351857; ID=4;
dbxref=REFSEQ:NC_004353,GB:AE014135;
MD5=515edfcc517bc4e39ae5c696cfd44021; length=1351857; release=r5.45;
species=Dmel;
>>>> 2R type=chromosome_arm; loc=2R:1..21146708; ID=2R;
dbxref=REFSEQ:NT_033778,GB:AE013599;
MD5=1589a9447d4dc94c048aa48ea5b8099d; length=21146708; release=r5.45;
species=Dmel;
>>>> 3R type=chromosome_arm; loc=3R:1..27905053; ID=3R;
dbxref=REFSEQ:NT_033777,GB:AE014297;
MD5=6b279651b3b268f11e0dd1d87ded0ccc; length=27905053; release=r5.45;
species=Dmel;
>>>> Uextra type=chromosome_arm; loc=Uextra:1..29004656; ID=Uextra;
MD5=3d47647f1cde286a947d03cc17f0bad3; length=29004656; release=r5.45;
species=Dmel;
>>>> 2RHet type=chromosome_arm; loc=2RHet:1..3288761; ID=2RHet;
dbxref=REFSEQ:NW_001848856,GB:CM000457;
MD5=88c0ac39ebe4d9ef5a8f58cd746c9810; length=3288761; release=r5.45;
species=Dmel;
>>>> 2LHet type=chromosome_arm; loc=2LHet:1..368872; ID=2LHet;
dbxref=REFSEQ:NW_001848855,GB:CM000456;
MD5=5217e6a4295a824c31e43d5a9da9038b; length=368872; release=r5.45;
species=Dmel;
>>>> 3LHet type=chromosome_arm; loc=3LHet:1..2555491; ID=3LHet;
dbxref=REFSEQ:NW_001848857,GB:CM000458;
MD5=4902d32327726bf385aa55a75399bdfc; length=2555491; release=r5.45;
species=Dmel;
>>>> 3RHet type=chromosome_arm; loc=3RHet:1..2517507; ID=3RHet;
dbxref=REFSEQ:NW_001848858,GB:CM000459;
MD5=752e488dbea78e592aca03745eb732ea; length=2517507; release=r5.45;
species=Dmel;
>>>> U type=chromosome_arm; loc=U:1..10049037; ID=U;
MD5=4b72bf19979c8466d5b66acca66f1804; length=10049037; release=r5.45;
species=Dmel;
>>>> XHet type=chromosome_arm; loc=XHet:1..204112; ID=XHet;
dbxref=REFSEQ:NW_001848859,GB:CM000460;
MD5=dcd15c551e34903f6171f53f553b55ee; length=204112; release=r5.45;
species=Dmel;
>>>
>>>
>>>
>>
>>
>
>