Removing overlapping genes from annotation for RNAseq read count
0
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hello, I am trying to prepare a read count table for DESeq using EasyRNAseq package in R. I followed the vignette and used ensembl.gtf file as my annotation. After constructing my read count table I get warnings about overlapping genes and counting reads more than once but I am not sure how to modify my annotation to avoid it. The manual only mentions that computed gene models can be extracted from created RNAseq object and that overlapping loci should be removed without specifying how to do it. I am able to extract gene models but I am not sure how to correctly process it before re-running the function. Could anyone please give me some advice on how to fix this annotation in R? I attached the code I used to generate my read count table below: read.count <-easyRNASeq(format='bam',readLength=50L, organism="Mmusculus", chr.sizes="auto", annotationMethod="gtf", annotationFile="mm9.ensgene.gtf", count="genes", summarization="geneModels", filesDirectory=getwd(), filenames=c("NI_A_ accepted_hits.bam","NI_B_accepted_hits.bam","DEX_A_accepted_hits.bam", "DEX_B_accepted_hits.bam","GW_A_accepted_hits.bam", "GW_B_accepted_hits.bam", "DEX_GW_A_accepted_hits.bam", "DEX_GW_B_accepted_hits.bam"), conditions=conditions, outputFormat="RNASeq") To get gene models I used: geneModels <- geneModel(read.count) but I am stack at this point and I cannot find a way to remove overlapping features. I tried disjoin function but it gives an error: "Error in function (classes, fdef, mtable) : unable to find an inherited method for function "disjoin", for signature "RangedData" " Thanks a lot for your suggestions! -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.18.0 BSgenome.Mmusculus.UCSC.mm9_1.3.19 [3] easyRNASeq_1.4.2 ShortRead_1.16.1 [5] latticeExtra_0.6-24 RColorBrewer_1.0-5 [7] Rsamtools_1.10.1 BSgenome_1.26.1 [9] GenomicRanges_1.10.2 Biostrings_2.26.2 [11] IRanges_1.16.2 edgeR_3.0.0 [13] limma_3.14.1 biomaRt_2.14.0 [15] genomeIntervals_1.14.0 intervals_0.13.3 [17] DESeq_1.10.1 lattice_0.20-10 [19] locfit_1.5-8 Biobase_2.18.0 [21] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] annotate_1.36.0 AnnotationDbi_1.20.2 bitops_1.0-4.1 DBI_0.2-5 [5] genefilter_1.40.0 geneplotter_1.36.0 grid_2.15.1 hwriter_1.3 [9] RCurl_1.95-1.1 RSQLite_0.11.2 splines_2.15.1 stats4_2.15.1 [13] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 xtable_1.7-0 [17] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
RNASeq Annotation PROcess DESeq RNASeq Annotation PROcess DESeq • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6