Rsubread - featureCounts for extracting read counts from De novo assembled transcripts
0
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 2 days ago
Australia/Melbourne/Olivia Newton-John …
Dear Alan, Thanks for sending through the data which helped us to identify the problem. The problem was that featureCounts did not allow the chromosome names in the provided annotation file to be longer than 48 characters. We have increased this limit to 100 characters. This solved the problem and featureCounts worked fine for your data now from our testing. We have committed the changes to bioc devel repository and the latest version should available in a couple of days (Rsubread v1.11.11). Note that we have also made some changes to parameters used by featureCounts. The 'annot' parameter is now changed to 'annot.ext' and the 'genome' parameter changed to 'annot.inbuilt'. This makes it more clearer on how to use our in-built annotations and how to provide external annotations. Let me know if the problem you encountered persists. Best wishes, Wei On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: > Hi Wei, > > Thanks for looking into it. Here are few lines from my SAM file: > > @HD VN:1.0 SO:unsorted > @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 > @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 > @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 > @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 > @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 > @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 > @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 > @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 > @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 > > This goes for 129310 lines (number of assembled transcripts) then at the end of the file: > > HWI-EAS146_0380:5:100:14414:21375#0/1 16 Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU > HWI-EAS146_0380:5:100:14769:21375#0/1 16 Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 * * 0 0 ACGACGAAGAAGATGACGACGAGGAC ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU > HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU > HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU > HWI-EAS146_0380:5:100:16039:21373#0/1 16 Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:16121:21377#0/1 16 Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 * * 0 0 CCATTCATCATCTTGATACTGCAGAT fffffhhghfghhhhhhhhhhhehea YT:Z:UU > HWI-EAS146_0380:5:100:16964:21378#0/1 0 Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 * * 0 0 GCCATCAAACCTTACCAATGCTGAGA dacccf[afffffffffdfff]ffca YT:Z:UU > HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA a`a`aggggffffffgaggdfcdfdg YT:Z:UU > > Thanks again, > Alan > > > > > On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Alan, > > Could you please provide us a few reads in your SAM file so that we can reproduce the problem? Sorry for your crashed R session. > > Best wishes, > Wei > > On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: > > > Hello, > > > > I'm trying to use featureCounts to extract read counts per transcript from > > the SAM file (generated using Bowtie2) mapped to de novo assembled > > transcripts (for DE analysis). I made annotation file as per the SAF > > requirements: > > > > Eg: > > > > *GeneID Chr Start End Strand* > > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 > > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + > > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 > > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + > > > > Here I made both the GeneID and the Chr columns same with *Start* being 1 > > and *End* as the length of the transcript and the Strand as + for all > > transcripts. > > > > When I used the following code, it does nothing for a while and then R > > crashes. > > > > library(Rsubread) > > counts <- > > featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot= "AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) > > > > Rsubread works great with GTF as annot parameter on genome mapped SAM files > > but not with the SAF, as I tried here. > > > > Please let me know where I'm wrong here or if anyone tried other packages > > that serves the purpose. > > > > sessionInfo() > > R version 3.0.1 (2013-05-16) > > Platform: x86_64-redhat-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C > > LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] splines parallel grid stats graphics grDevices utils > > datasets methods base > > > > other attached packages: > > [1] Rsamtools_1.12.3 edgeR_3.2.4 > > Rsubread_1.10.5 > > [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 > > limma_3.16.7 > > [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 > > RSQLite_0.11.4 > > [10] DBI_0.2-7 AnnotationDbi_1.22.6 > > BSgenome.Ecoli.NCBI.20080805_1.3.17 > > [13] BSgenome_1.28.0 GenomicRanges_1.12.4 > > Biostrings_2.28.0 > > [16] IRanges_1.18.2 multtest_2.16.0 > > Biobase_2.20.1 > > [19] biomaRt_2.16.0 BiocGenerics_0.6.0 > > VennDiagram_1.6.4 > > [22] sendmailR_1.1-2 base64enc_0.1-1 > > BiocInstaller_1.10.3 > > > > loaded via a namespace (and not attached): > > [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 > > rtracklayer_1.20.4 stats4_3.0.1 > > [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:17}}
Annotation GO BSgenome cdf BSgenome Rsubread Annotation GO BSgenome cdf BSgenome • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6