easyRNASeq: Number of total counts
0
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi, I am using easyRNASeq for estimating the counts in an rna-seq alignment to hg19/GRCh37 done using bowtie2. I would like to get the counts "per gene". The code runs successfully and I get an output table, but the number of records in the output table are ~57000: cat count.tsv | wc -l 57774 I am wondering why the number of counts are so much greater than the total number of genes (~30,000). I am getting some warning message, which may be related to this, especially #1 and #4: Warning messages: 1: Consider using 'synthetic transcripts' as described in the section 7.1 of the vignette instead of the count=genes,summarization=geneModels deprecated paradigm. 2: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 3: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. 4: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : There are 18950 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? 5: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 6: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 7: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 8: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. Code I am running for estimating the counts: > count.table <- easyRNASeq(filesDirectory=getwd(), + filenames=c("A.sorted.bam","B.sorted.bam","C.sorted.bam","D.sorted.b am"), + organism="Hsapiens", + annotationMethod="gtf", + annotationFile="/general/NGS/index/human/Homo_sapiens.GRCh37.74.gtf", + count="genes", + summarization="geneModels") -- output of sessionInfo(): > sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > > packageDescription("easyRNASeq") Package: easyRNASeq Version: 1.8.7 Date: 2014-03-25 Type: Package Title: Count summarization and normalization for RNA-Seq data. Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler Maintainer: Nicolas Delhomme <delhomme at="" embl.de=""> Description: Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package. Depends: genomeIntervals (>= 1.18.0), Biobase (>= 2.22.0), biomaRt (>= 2.18.0), edgeR (>= 3.4.0), Biostrings (>= 2.30.0), DESeq (>= 1.14.0), GenomicRanges (>= 1.14.3), IRanges (>= 1.20.5), Rsamtools (>= 1.14.1), ShortRead (>= 1.20.0) Imports: graphics, methods, parallel, utils, BiocGenerics (>= 0.8.0), LSD (>= 2.5) Suggests: BSgenome (>= 1.30.0), BSgenome.Dmelanogaster.UCSC.dm3 (>= 1.3.19), GenomicFeatures (>= 1.14.0), RnaSeqTutorial (>= 0.0.13), BiocStyle (>= 1.0.0) License: Artistic-2.0 LazyLoad: yes biocViews: GeneExpression, RNAseq, Genetics, Preprocessing Packaged: 2014-03-26 04:53:07 UTC; biocbuild Built: R 3.0.3; ; 2014-03-31 20:30:18 UTC; unix -- Sent via the guest posting facility at bioconductor.org.
GeneExpression RNASeq Genetics Coverage Normalization BSgenome Biobase Biostrings biomaRt • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6