Question

using easyRNASeq examples

0

Entering edit mode

delhomme@embl.de ★ 1.2k

@delhommeemblde-3232

Last seen 9.6 years ago

Hi Francesco, I've fixed a bug in easyRNASeq and made a couple of modifications to deal with novelties introduced in the IRanges and in the DESeq packages. I've just committed that whole bunch, so it will take ~2 days before a binary package is available. You can always install from SVN in the meanwhile. I've listed below the different examples you sent me in our email exchange last week and I've added some comments which I hope will prove useful to you. I've created a new "thread" to make it easier to read, and I'm confident it will prove useful to others. This leads me to thanking you for your patience, for providing the data and for giving me the opportunity to post such an example :-) ## load the library library(easyRNASeq) ## load the chromosome sizes ## Note that you can just provide a named list; using the BSgenome is not necessary (especially as the Hsapiens ones are huge >800Mb) library(BSgenome.Hsapiens.UCSC.hg19) chr.sizes=as.list(seqlengths(Hsapiens)) ## set the wd directory to wherever you have your bam files ## that's just for this example, you could give that directory ## directly to the filesDirectory argument of the function setwd("Desktop/Francesco") ## get the bam filenames bamfiles=dir(getwd(),pattern="*\\.bam$") ## run easyRNASeq to get an RNAseq object. ## we use biomaRt to fetch the annotation directly from ensembl ## rnaSeq <- easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes=chr.sizes, readLength=100L, annotationMethod="biomaRt", format="bam", count="exons", filenames=bamfiles[1], outputFormat="RNAseq" ) ## Fetching the annotation that way is time consuming ## Using the RNAseq object you can extract the genic information ## and save them as an rda for a faster processing time ## later on gAnnot <- genomicAnnotation(rnaSeq) ## There are 244 "chromosomes" in that annotation, let's keep only what we need gAnnot <- gAnnot[space(gAnnot) %in% paste("chr",c(1:22,"X","Y","M"),sep=""),] save(gAnnot,file="gAnnot.rda") ## you could use it this way countTable <- easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes=chr.sizes, readLength=100L, annotationMethod="rda", annotationFile="gAnnot.rda", format="bam", count="exons", filenames=bamfiles[1] ) ## using different annotation (makeTranscriptDB) library(GenomicFeatures) hg19.tx <- makeTranscriptDbFromUCSC( genome="hg19", tablename="refGene") ## easyRNASeq can deal with GRangesList object, so no need to modify it much, i.e. no need to convert it to a RangedData gAnnot <- exons(hg19.tx) ## change the metadata column name to suit easyRNASeq colnames(elementMetadata(gAnnot)) <- "exon" ## finally turn it into a GrangesList gAnnot <- split(gAnnot,seqnames(gAnnot)) ## you could save it as an rda object, you can as well use it directly ## by using the annotationMethod "env" and the annotationObject arguments. ## In addition we will be selecting for a single chromosome: chr1 ## as the corresponding name in your bam file is 1, that's what we'll use countTable <- easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes=chr.sizes, readLength=100L, annotationMethod="env", annotationObject=gAnnot, format="bam", count="exons", filenames=bamfiles[1], chr.sel="1" ) ## applying DESeq ## this will not yield very sensitive results as we have no replicates (biological) ## These are important for DESeq to accurately model the technical and biological variance ## With no replicates for every condition, the dispersion will be based on a "pooled" estimate ## making the differential expression call lose sensitivity. In addition, DESeq is in such case ## using a conservative approach (which is good) so you'd get even less significant results. ## Defining the conditions conditions <- c("A","B") names(conditions) <- bamfiles ## running DESeq ## NOTE that the two last arguments are for the DESeq estimateDispersions method. They are just transferred through. ## There are necessary as we have only 1 replicate per condition; see the DESeq vignette for more details. countDataSet <- easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes=chr.sizes, readLength=100L, annotationMethod="env", annotationObject=gAnnot, format="bam", count="exons", filenames=bamfiles, chr.sel="1", outputFormat="DESeq", normalize=TRUE, conditions=conditions, fitType="local", method="blind" ) ## finally my session info > sessionInfo() R Under development (unstable) (2012-02-07 r58290) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.7.16 AnnotationDbi_1.17.15 easyRNASeq_1.1.6 [4] ShortRead_1.13.12 latticeExtra_0.6-19 RColorBrewer_1.0-5 [7] Rsamtools_1.7.29 DESeq_1.7.6 locfit_1.5-6 [10] lattice_0.20-0 akima_0.5-7 Biobase_2.15.3 [13] BSgenome_1.23.2 GenomicRanges_1.7.24 Biostrings_2.23.6 [16] IRanges_1.13.24 genomeIntervals_1.11.0 intervals_0.13.3 [19] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 [22] BiocGenerics_0.1.4 loaded via a namespace (and not attached): [1] annotate_1.33.2 bitops_1.0-4.1 DBI_0.2-5 genefilter_1.37.0 [5] geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 RCurl_1.91-1 [9] RSQLite_0.11.1 rtracklayer_1.15.7 splines_2.15.0 survival_2.36-12 [13] tools_2.15.0 XML_3.9-4 xtable_1.6-0 zlibbioc_1.1.1 I hope this helps, do not hesitate to contact me if you have further questions and to suggest improvements. Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany

RNASeq Annotation BSgenome convert biomaRt BSgenome IRanges DESeq easyRNASeq RNASeq • 1.1k views

ADD COMMENT • link 12.2 years ago delhomme@embl.de ★ 1.2k