Search
Question: easyRNASeq: Number of total counts
0
gravatar for Guest User
3.6 years ago by
Guest User12k
Guest User12k wrote:
Hi, I am using easyRNASeq for estimating the counts in an rna-seq alignment to hg19/GRCh37 done using bowtie2. I would like to get the counts "per gene". The code runs successfully and I get an output table, but the number of records in the output table are ~57000: cat count.tsv | wc -l 57774 I am wondering why the number of counts are so much greater than the total number of genes (~30,000). I am getting some warning message, which may be related to this, especially #1 and #4: Warning messages: 1: Consider using 'synthetic transcripts' as described in the section 7.1 of the vignette instead of the count=genes,summarization=geneModels deprecated paradigm. 2: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 3: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. 4: In easyRNASeq(filesDirectory = getwd(), filenames = c("BRPC13-1118_L1.D710_501.sorted.bam", : There are 18950 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? 5: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 6: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 7: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. 8: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. Code I am running for estimating the counts: > count.table <- easyRNASeq(filesDirectory=getwd(), + filenames=c("A.sorted.bam","B.sorted.bam","C.sorted.bam","D.sorted.b am"), + organism="Hsapiens", + annotationMethod="gtf", + annotationFile="/general/NGS/index/human/Homo_sapiens.GRCh37.74.gtf", + count="genes", + summarization="geneModels") -- output of sessionInfo(): > sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > > packageDescription("easyRNASeq") Package: easyRNASeq Version: 1.8.7 Date: 2014-03-25 Type: Package Title: Count summarization and normalization for RNA-Seq data. Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler Maintainer: Nicolas Delhomme <delhomme at="" embl.de=""> Description: Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package. Depends: genomeIntervals (>= 1.18.0), Biobase (>= 2.22.0), biomaRt (>= 2.18.0), edgeR (>= 3.4.0), Biostrings (>= 2.30.0), DESeq (>= 1.14.0), GenomicRanges (>= 1.14.3), IRanges (>= 1.20.5), Rsamtools (>= 1.14.1), ShortRead (>= 1.20.0) Imports: graphics, methods, parallel, utils, BiocGenerics (>= 0.8.0), LSD (>= 2.5) Suggests: BSgenome (>= 1.30.0), BSgenome.Dmelanogaster.UCSC.dm3 (>= 1.3.19), GenomicFeatures (>= 1.14.0), RnaSeqTutorial (>= 0.0.13), BiocStyle (>= 1.0.0) License: Artistic-2.0 LazyLoad: yes biocViews: GeneExpression, RNAseq, Genetics, Preprocessing Packaged: 2014-03-26 04:53:07 UTC; biocbuild Built: R 3.0.3; ; 2014-03-31 20:30:18 UTC; unix -- Sent via the guest posting facility at bioconductor.org.
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 151 users visited in the last hour