Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Hi,
I am using easyRNASeq for estimating the counts in an rna-seq
alignment to hg19/GRCh37 done using bowtie2. I would like to get the
counts "per gene". The code runs successfully and I get an output
table, but the number of records in the output table are ~57000:
cat count.tsv | wc -l
57774
I am wondering why the number of counts are so much greater than the
total number of genes (~30,000).
I am getting some warning message, which may be related to this,
especially #1 and #4:
Warning messages:
1: Consider using 'synthetic transcripts' as described in the section
7.1 of the vignette instead of the
count=genes,summarization=geneModels deprecated paradigm.
2: In easyRNASeq(filesDirectory = getwd(), filenames =
c("BRPC13-1118_L1.D710_501.sorted.bam", :
You enforce UCSC chromosome conventions, however the provided
chromosome size list is not compliant. Correcting it.
3: In easyRNASeq(filesDirectory = getwd(), filenames =
c("BRPC13-1118_L1.D710_501.sorted.bam", :
You enforce UCSC chromosome conventions, however the provided
annotation is not compliant. Correcting it.
4: In easyRNASeq(filesDirectory = getwd(), filenames =
c("BRPC13-1118_L1.D710_501.sorted.bam", :
There are 18950 synthetic exons as determined from your annotation
that overlap! This implies that some reads will be counted more than
once! Is that really what you want?
5: In fetchCoverage(rnaSeq, format = format, filename = filename,
filter = filter, :
You enforce UCSC chromosome conventions, however the provided
alignments are not compliant. Correcting it.
6: In fetchCoverage(rnaSeq, format = format, filename = filename,
filter = filter, :
You enforce UCSC chromosome conventions, however the provided
alignments are not compliant. Correcting it.
7: In fetchCoverage(rnaSeq, format = format, filename = filename,
filter = filter, :
You enforce UCSC chromosome conventions, however the provided
alignments are not compliant. Correcting it.
8: In fetchCoverage(rnaSeq, format = format, filename = filename,
filter = filter, :
You enforce UCSC chromosome conventions, however the provided
alignments are not compliant. Correcting it.
Code I am running for estimating the counts:
> count.table <- easyRNASeq(filesDirectory=getwd(),
+ filenames=c("A.sorted.bam","B.sorted.bam","C.sorted.bam","D.sorted.b
am"),
+ organism="Hsapiens",
+ annotationMethod="gtf",
+
annotationFile="/general/NGS/index/human/Homo_sapiens.GRCh37.74.gtf",
+ count="genes",
+ summarization="geneModels")
-- output of sessionInfo():
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
>
> packageDescription("easyRNASeq")
Package: easyRNASeq
Version: 1.8.7
Date: 2014-03-25
Type: Package
Title: Count summarization and normalization for RNA-Seq data.
Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler
Maintainer: Nicolas Delhomme <delhomme at="" embl.de="">
Description: Calculates the coverage of high-throughput short-reads
against a genome of reference and summarizes it per feature of
interest (e.g. exon, gene, transcript). The data can be
normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package.
Depends: genomeIntervals (>= 1.18.0), Biobase (>= 2.22.0), biomaRt (>=
2.18.0), edgeR (>= 3.4.0), Biostrings (>= 2.30.0), DESeq (>=
1.14.0), GenomicRanges (>= 1.14.3), IRanges (>= 1.20.5),
Rsamtools (>= 1.14.1), ShortRead (>= 1.20.0)
Imports: graphics, methods, parallel, utils, BiocGenerics (>= 0.8.0),
LSD (>= 2.5)
Suggests: BSgenome (>= 1.30.0), BSgenome.Dmelanogaster.UCSC.dm3 (>=
1.3.19), GenomicFeatures (>= 1.14.0), RnaSeqTutorial (>=
0.0.13), BiocStyle (>= 1.0.0)
License: Artistic-2.0
LazyLoad: yes
biocViews: GeneExpression, RNAseq, Genetics, Preprocessing
Packaged: 2014-03-26 04:53:07 UTC; biocbuild
Built: R 3.0.3; ; 2014-03-31 20:30:18 UTC; unix
--
Sent via the guest posting facility at bioconductor.org.