Rsubread - featureCounts for extracting read counts from De novo assembled transcripts

Rsubread - featureCounts for extracting read counts from De novo assembled transcripts - EdgeR DE analysis

0

Entering edit mode

Alan Smith ▴ 150

@alan-smith-5987

Last seen 8.7 years ago

United States

Dear Wei, I tried with the devel version but still the problem persists (R crashed). Does this got to do with the large number of transcripts (129310) in the file? Could you also please comment on whether if the count data obtained from Rsubread processed on de novo assembled transcripts be used for edgeR differential expression analysis. Or using RSEM to obtain read count information is the way to go? Thank you, Alan On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Alan, > > Thanks for sending through the data which helped us to identify the > problem. The problem was that featureCounts did not allow the chromosome > names in the provided annotation file to be longer than 48 characters. We > have increased this limit to 100 characters. This solved the problem and > featureCounts worked fine for your data now from our testing. > > We have committed the changes to bioc devel repository and the latest > version should available in a couple of days (Rsubread v1.11.11). > > Note that we have also made some changes to parameters used by > featureCounts. The 'annot' parameter is now changed to 'annot.ext' and the > 'genome' parameter changed to 'annot.inbuilt'. This makes it more clearer > on how to use our in-built annotations and how to provide external > annotations. > > Let me know if the problem you encountered persists. > > > Best wishes, > > Wei > > On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: > > Hi Wei, > > Thanks for looking into it. Here are few lines from my SAM file: > > @HD VN:1.0 SO:unsorted > @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 > @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 > @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 > @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 > @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 > @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 > @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 > @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 > @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 > > *This goes for 129310 lines (number of assembled transcripts) then at the end of the file:* > > > HWI-EAS146_0380:5:100:14414:21375#0/1 16 Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU > HWI-EAS146_0380:5:100:14769:21375#0/1 16 Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 * * 0 0 ACGACGAAGAAGATGACGACGAGGAC ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU > HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU > HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU > HWI-EAS146_0380:5:100:16039:21373#0/1 16 Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:16121:21377#0/1 16 Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 * * 0 0 CCATTCATCATCTTGATACTGCAGAT fffffhhghfghhhhhhhhhhhehea YT:Z:UU > HWI-EAS146_0380:5:100:16964:21378#0/1 0 Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 * * 0 0 GCCATCAAACCTTACCAATGCTGAGA dacccf[afffffffffdfff]ffca YT:Z:UU > HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA a`a`aggggffffffgaggdfcdfdg YT:Z:UU > > > Thanks again, > > Alan > > > > > > On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: > >> Dear Alan, >> >> Could you please provide us a few reads in your SAM file so that we can >> reproduce the problem? Sorry for your crashed R session. >> >> Best wishes, >> Wei >> >> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: >> >> > Hello, >> > >> > I'm trying to use featureCounts to extract read counts per transcript >> from >> > the SAM file (generated using Bowtie2) mapped to de novo assembled >> > transcripts (for DE analysis). I made annotation file as per the SAF >> > requirements: >> > >> > Eg: >> > >> > *GeneID Chr Start End Strand* >> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 >> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + >> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 >> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + >> > >> > Here I made both the GeneID and the Chr columns same with *Start* being >> 1 >> > and *End* as the length of the transcript and the Strand as + for all >> > transcripts. >> > >> > When I used the following code, it does nothing for a while and then R >> > crashes. >> > >> > library(Rsubread) >> > counts <- >> > >> featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot=" AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) >> > >> > Rsubread works great with GTF as annot parameter on genome mapped SAM >> files >> > but not with the SAF, as I tried here. >> > >> > Please let me know where I'm wrong here or if anyone tried other >> packages >> > that serves the purpose. >> > >> > sessionInfo() >> > R version 3.0.1 (2013-05-16) >> > Platform: x86_64-redhat-linux-gnu (64-bit) >> > >> > locale: >> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C >> > LC_NAME=C >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] splines parallel grid stats graphics grDevices utils >> > datasets methods base >> > >> > other attached packages: >> > [1] Rsamtools_1.12.3 edgeR_3.2.4 >> > Rsubread_1.10.5 >> > [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 >> > limma_3.16.7 >> > [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 >> > RSQLite_0.11.4 >> > [10] DBI_0.2-7 AnnotationDbi_1.22.6 >> > BSgenome.Ecoli.NCBI.20080805_1.3.17 >> > [13] BSgenome_1.28.0 GenomicRanges_1.12.4 >> > Biostrings_2.28.0 >> > [16] IRanges_1.18.2 multtest_2.16.0 >> > Biobase_2.20.1 >> > [19] biomaRt_2.16.0 BiocGenerics_0.6.0 >> > VennDiagram_1.6.4 >> > [22] sendmailR_1.1-2 base64enc_0.1-1 >> > BiocInstaller_1.10.3 >> > >> > loaded via a namespace (and not attached): >> > [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 >> > rtracklayer_1.20.4 stats4_3.0.1 >> > [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 >> zlibbioc_1.6.0 >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

Annotation GO BSgenome cdf BSgenome Rsubread Annotation GO BSgenome cdf BSgenome • 2.3k views

ADD COMMENT • link 12.5 years ago Alan Smith ▴ 150

0

Entering edit mode

Alan Smith ▴ 150

@alan-smith-5987

Last seen 8.7 years ago

United States

Dear Yang, The output of the script is : 59 59 Therefore, it is not an issue with the length of the feature/chromosome name (which are same in my case). I greatly appreciate your help, Alan On Sat, Aug 3, 2013 at 6:36 PM, Yang Liao <liao@wehi.edu.au> wrote: > Dear Alan, > > Thank you for using Rsubread. The number of transcripts in the annotation > file would not be an issue -- our featureCounts function supports > unlimited number of annotations and chromosomes/transcripts (actually, at > most 2 billions). > > I have just tried to run the in-development version of Rsubread to assign > reads in your last mail, using the two-line annotation in the other mail. > The program finished with no errors. The version number I tested was > "Rsubread_1.11.10". > > If this is the version you are using, could you please run this command? > > $ cat my_annotations.txt | awk 'BEGIN{max_feature=0; max_chro=0} > length($1)>max_feature{max_feature=length($1) } > length($2)>max_chro{max_chro=length($2)} END{print max_feature, max_chro}' > > > where "my_annotations.txt" is the annotation file in SAF format. This > command will give the length of the longest feature name and the longest > chromosome name. If either of them is longer than 99, it is still too long > to our program. > > Thank you. > > Cheers, > > Yang > > > Dear Wei, > > > > I tried with the devel version but still the problem persists (R > crashed). > > Does this got to do with the large number of transcripts (129310) in the > > file? > > > > Could you also please comment on whether if the count data obtained from > > Rsubread processed on de novo assembled transcripts be used for edgeR > > differential expression analysis. Or using RSEM to obtain read count > > information is the way to go? > > > > Thank you, > > Alan > > > > > > On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi@wehi.edu.au> wrote: > > > >> Dear Alan, > >> > >> Thanks for sending through the data which helped us to identify the > >> problem. The problem was that featureCounts did not allow the chromosome > >> names in the provided annotation file to be longer than 48 characters. > >> We > >> have increased this limit to 100 characters. This solved the problem and > >> featureCounts worked fine for your data now from our testing. > >> > >> We have committed the changes to bioc devel repository and the latest > >> version should available in a couple of days (Rsubread v1.11.11). > >> > >> Note that we have also made some changes to parameters used by > >> featureCounts. The 'annot' parameter is now changed to 'annot.ext' and > >> the > >> 'genome' parameter changed to 'annot.inbuilt'. This makes it more > >> clearer > >> on how to use our in-built annotations and how to provide external > >> annotations. > >> > >> Let me know if the problem you encountered persists. > >> > >> > >> Best wishes, > >> > >> Wei > >> > >> On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: > >> > >> Hi Wei, > >> > >> Thanks for looking into it. Here are few lines from my SAM file: > >> > >> @HD VN:1.0 SO:unsorted > >> @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 > >> @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 > >> @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 > >> @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 > >> @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 > >> @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 > >> @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 > >> @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 > >> @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 > >> > >> *This goes for 129310 lines (number of assembled transcripts) then at > >> the end of the file:* > >> > >> > >> HWI-EAS146_0380:5:100:14414:21375#0/1 16 > Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 > 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC > ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 > XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU > >> HWI-EAS146_0380:5:100:14769:21375#0/1 16 > Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 > 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT > c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 > NM:i:0 MD:Z:26 YT:Z:UU > >> HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 > * * 0 0 ACGACGAAGAAGATGACGACGAGGAC > ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU > >> HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 > * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA > YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU > >> HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 > * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG > `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU > >> HWI-EAS146_0380:5:100:16039:21373#0/1 16 > Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 > 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT > hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 > NM:i:0 MD:Z:26 YT:Z:UU > >> HWI-EAS146_0380:5:100:16121:21377#0/1 16 > Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 > 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC > fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 > NM:i:0 MD:Z:26 YT:Z:UU > >> HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 > * * 0 0 CCATTCATCATCTTGATACTGCAGAT > fffffhhghfghhhhhhhhhhhehea YT:Z:UU > >> HWI-EAS146_0380:5:100:16964:21378#0/1 0 > Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M > * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA > `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 > NM:i:0 MD:Z:26 YT:Z:UU > >> HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 > * * 0 0 GCCATCAAACCTTACCAATGCTGAGA > dacccf[afffffffffdfff]ffca YT:Z:UU > >> HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 > * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA > a`a`aggggffffffgaggdfcdfdg YT:Z:UU > >> > >> > >> Thanks again, > >> > >> Alan > >> > >> > >> > >> > >> > >> On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: > >> > >>> Dear Alan, > >>> > >>> Could you please provide us a few reads in your SAM file so that we can > >>> reproduce the problem? Sorry for your crashed R session. > >>> > >>> Best wishes, > >>> Wei > >>> > >>> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: > >>> > >>> > Hello, > >>> > > >>> > I'm trying to use featureCounts to extract read counts per transcript > >>> from > >>> > the SAM file (generated using Bowtie2) mapped to de novo assembled > >>> > transcripts (for DE analysis). I made annotation file as per the SAF > >>> > requirements: > >>> > > >>> > Eg: > >>> > > >>> > *GeneID Chr Start End Strand* > >>> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 > >>> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + > >>> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 > >>> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + > >>> > > >>> > Here I made both the GeneID and the Chr columns same with *Start* > >>> being > >>> 1 > >>> > and *End* as the length of the transcript and the Strand as + for all > >>> > transcripts. > >>> > > >>> > When I used the following code, it does nothing for a while and then > >>> R > >>> > crashes. > >>> > > >>> > library(Rsubread) > >>> > counts <- > >>> > > >>> > featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot="A nnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) > >>> > > >>> > Rsubread works great with GTF as annot parameter on genome mapped SAM > >>> files > >>> > but not with the SAF, as I tried here. > >>> > > >>> > Please let me know where I'm wrong here or if anyone tried other > >>> packages > >>> > that serves the purpose. > >>> > > >>> > sessionInfo() > >>> > R version 3.0.1 (2013-05-16) > >>> > Platform: x86_64-redhat-linux-gnu (64-bit) > >>> > > >>> > locale: > >>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >>> > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >>> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C > >>> > LC_NAME=C > >>> > [9] LC_ADDRESS=C LC_TELEPHONE=C > >>> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >>> > > >>> > attached base packages: > >>> > [1] splines parallel grid stats graphics grDevices utils > >>> > datasets methods base > >>> > > >>> > other attached packages: > >>> > [1] Rsamtools_1.12.3 edgeR_3.2.4 > >>> > Rsubread_1.10.5 > >>> > [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 > >>> > limma_3.16.7 > >>> > [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 > >>> > RSQLite_0.11.4 > >>> > [10] DBI_0.2-7 AnnotationDbi_1.22.6 > >>> > BSgenome.Ecoli.NCBI.20080805_1.3.17 > >>> > [13] BSgenome_1.28.0 GenomicRanges_1.12.4 > >>> > Biostrings_2.28.0 > >>> > [16] IRanges_1.18.2 multtest_2.16.0 > >>> > Biobase_2.20.1 > >>> > [19] biomaRt_2.16.0 BiocGenerics_0.6.0 > >>> > VennDiagram_1.6.4 > >>> > [22] sendmailR_1.1-2 base64enc_0.1-1 > >>> > BiocInstaller_1.10.3 > >>> > > >>> > loaded via a namespace (and not attached): > >>> > [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 > >>> > rtracklayer_1.20.4 stats4_3.0.1 > >>> > [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 > >>> zlibbioc_1.6.0 > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > _______________________________________________ > >>> > Bioconductor mailing list > >>> > Bioconductor@r-project.org > >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> > Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >>> ______________________________________________________________________ > >>> The information in this email is confidential and intended solely for > >>> the > >>> addressee. > >>> You must not disclose, forward, print or use it without the permission > >>> of > >>> the sender. > >>> ______________________________________________________________________ > >>> > >> > >> > >> > >> ______________________________________________________________________ > >> The information in this email is confidential and intended solely for > >> the > >> addressee. > >> You must not disclose, forward, print or use it without the permission > >> of > >> the sender. > >> ______________________________________________________________________ > >> > > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD COMMENT • link 12.5 years ago Alan Smith ▴ 150

0

Entering edit mode

Yang Liao ▴ 450

@yang-liao-6075

Last seen 3 months ago

Australia

Dear Alan, Thank you for using Rsubread. The number of transcripts in the annotation file would not be an issue -- our featureCounts function supports unlimited number of annotations and chromosomes/transcripts (actually, at most 2 billions). I have just tried to run the in-development version of Rsubread to assign reads in your last mail, using the two-line annotation in the other mail. The program finished with no errors. The version number I tested was "Rsubread_1.11.10". If this is the version you are using, could you please run this command? $ cat my_annotations.txt | awk 'BEGIN{max_feature=0; max_chro=0} length($1)>max_feature{max_feature=length($1) } length($2)>max_chro{max_chro=length($2)} END{print max_feature, max_chro}' where "my_annotations.txt" is the annotation file in SAF format. This command will give the length of the longest feature name and the longest chromosome name. If either of them is longer than 99, it is still too long to our program. Thank you. Cheers, Yang > Dear Wei, > > I tried with the devel version but still the problem persists (R crashed). > Does this got to do with the large number of transcripts (129310) in the > file? > > Could you also please comment on whether if the count data obtained from > Rsubread processed on de novo assembled transcripts be used for edgeR > differential expression analysis. Or using RSEM to obtain read count > information is the way to go? > > Thank you, > Alan > > > On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi at="" wehi.edu.au=""> wrote: > >> Dear Alan, >> >> Thanks for sending through the data which helped us to identify the >> problem. The problem was that featureCounts did not allow the chromosome >> names in the provided annotation file to be longer than 48 characters. >> We >> have increased this limit to 100 characters. This solved the problem and >> featureCounts worked fine for your data now from our testing. >> >> We have committed the changes to bioc devel repository and the latest >> version should available in a couple of days (Rsubread v1.11.11). >> >> Note that we have also made some changes to parameters used by >> featureCounts. The 'annot' parameter is now changed to 'annot.ext' and >> the >> 'genome' parameter changed to 'annot.inbuilt'. This makes it more >> clearer >> on how to use our in-built annotations and how to provide external >> annotations. >> >> Let me know if the problem you encountered persists. >> >> >> Best wishes, >> >> Wei >> >> On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: >> >> Hi Wei, >> >> Thanks for looking into it. Here are few lines from my SAM file: >> >> @HD VN:1.0 SO:unsorted >> @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 >> @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 >> @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 >> @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 >> @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 >> @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 >> @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 >> @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 >> @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 >> >> *This goes for 129310 lines (number of assembled transcripts) then at >> the end of the file:* >> >> >> HWI-EAS146_0380:5:100:14414:21375#0/1 16 Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU >> HWI-EAS146_0380:5:100:14769:21375#0/1 16 Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 * * 0 0 ACGACGAAGAAGATGACGACGAGGAC ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU >> HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU >> HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU >> HWI-EAS146_0380:5:100:16039:21373#0/1 16 Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:16121:21377#0/1 16 Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 * * 0 0 CCATTCATCATCTTGATACTGCAGAT fffffhhghfghhhhhhhhhhhehea YT:Z:UU >> HWI-EAS146_0380:5:100:16964:21378#0/1 0 Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 * * 0 0 GCCATCAAACCTTACCAATGCTGAGA dacccf[afffffffffdfff]ffca YT:Z:UU >> HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA a`a`aggggffffffgaggdfcdfdg YT:Z:UU >> >> >> Thanks again, >> >> Alan >> >> >> >> >> >> On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> >>> Dear Alan, >>> >>> Could you please provide us a few reads in your SAM file so that we can >>> reproduce the problem? Sorry for your crashed R session. >>> >>> Best wishes, >>> Wei >>> >>> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: >>> >>> > Hello, >>> > >>> > I'm trying to use featureCounts to extract read counts per transcript >>> from >>> > the SAM file (generated using Bowtie2) mapped to de novo assembled >>> > transcripts (for DE analysis). I made annotation file as per the SAF >>> > requirements: >>> > >>> > Eg: >>> > >>> > *GeneID Chr Start End Strand* >>> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 >>> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + >>> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 >>> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + >>> > >>> > Here I made both the GeneID and the Chr columns same with *Start* >>> being >>> 1 >>> > and *End* as the length of the transcript and the Strand as + for all >>> > transcripts. >>> > >>> > When I used the following code, it does nothing for a while and then >>> R >>> > crashes. >>> > >>> > library(Rsubread) >>> > counts <- >>> > >>> featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot= "AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) >>> > >>> > Rsubread works great with GTF as annot parameter on genome mapped SAM >>> files >>> > but not with the SAF, as I tried here. >>> > >>> > Please let me know where I'm wrong here or if anyone tried other >>> packages >>> > that serves the purpose. >>> > >>> > sessionInfo() >>> > R version 3.0.1 (2013-05-16) >>> > Platform: x86_64-redhat-linux-gnu (64-bit) >>> > >>> > locale: >>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C >>> > LC_NAME=C >>> > [9] LC_ADDRESS=C LC_TELEPHONE=C >>> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> > >>> > attached base packages: >>> > [1] splines parallel grid stats graphics grDevices utils >>> > datasets methods base >>> > >>> > other attached packages: >>> > [1] Rsamtools_1.12.3 edgeR_3.2.4 >>> > Rsubread_1.10.5 >>> > [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 >>> > limma_3.16.7 >>> > [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 >>> > RSQLite_0.11.4 >>> > [10] DBI_0.2-7 AnnotationDbi_1.22.6 >>> > BSgenome.Ecoli.NCBI.20080805_1.3.17 >>> > [13] BSgenome_1.28.0 GenomicRanges_1.12.4 >>> > Biostrings_2.28.0 >>> > [16] IRanges_1.18.2 multtest_2.16.0 >>> > Biobase_2.20.1 >>> > [19] biomaRt_2.16.0 BiocGenerics_0.6.0 >>> > VennDiagram_1.6.4 >>> > [22] sendmailR_1.1-2 base64enc_0.1-1 >>> > BiocInstaller_1.10.3 >>> > >>> > loaded via a namespace (and not attached): >>> > [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 >>> > rtracklayer_1.20.4 stats4_3.0.1 >>> > [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 >>> zlibbioc_1.6.0 >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > _______________________________________________ >>> > Bioconductor mailing list >>> > Bioconductor at r-project.org >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for >>> the >>> addressee. >>> You must not disclose, forward, print or use it without the permission >>> of >>> the sender. >>> ______________________________________________________________________ >>> >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for >> the >> addressee. >> You must not disclose, forward, print or use it without the permission >> of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.5 years ago Yang Liao ▴ 450

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 16 days ago

Australia/Melbourne

Dear Alan, The Rsubread in the devel branch has not been updated to v1.11.1 yet. I committed the changes yesterday and you should get v1.11.11 by tomorrow. Please make sure you use this latest version when you try it next time. We have found that the gene-level analysis is much more accurate than the transcript level expression analysis. Part of the reason for this is that it is very hard to accurately quantify the expression levels of overlapping transcripts. featureCounts simply uses the overlap between reads and genes to assign reads. It does not assign those reads which are found to overlap with more than one gene to try to achieve a highly accurate quantification, which is different from the model-bases methods that try to assign these ambiguous reads. You may have a look at the subread-featureCounts-limma/edgeR pipeline which is much simpler than what you do there: http://bioinf.wehi.edu.au/RNAseqCaseStudy/ This pipeline has been also demonstrated in a recent Bioc workshop (2013): http://bioconductor.org/help/course-materials/2013/BioC2013/shi.pdf Hope this helps. Best wishes, Wei On Aug 4, 2013, at 3:53 AM, Alan Smith wrote: > Dear Wei, > > I tried with the devel version but still the problem persists (R crashed). Does this got to do with the large number of transcripts (129310) in the file? > > Could you also please comment on whether if the count data obtained from Rsubread processed on de novo assembled transcripts be used for edgeR differential expression analysis. Or using RSEM to obtain read count information is the way to go? > > Thank you, > Alan > > > On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Alan, > > Thanks for sending through the data which helped us to identify the problem. The problem was that featureCounts did not allow the chromosome names in the provided annotation file to be longer than 48 characters. We have increased this limit to 100 characters. This solved the problem and featureCounts worked fine for your data now from our testing. > > We have committed the changes to bioc devel repository and the latest version should available in a couple of days (Rsubread v1.11.11). > > Note that we have also made some changes to parameters used by featureCounts. The 'annot' parameter is now changed to 'annot.ext' and the 'genome' parameter changed to 'annot.inbuilt'. This makes it more clearer on how to use our in-built annotations and how to provide external annotations. > > Let me know if the problem you encountered persists. > > > Best wishes, > > Wei > > On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: > >> Hi Wei, >> >> Thanks for looking into it. Here are few lines from my SAM file: >> >> @HD VN:1.0 SO:unsorted >> @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 >> @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 >> @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 >> @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 >> @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 >> @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 >> @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 >> @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 >> @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 >> >> This goes for 129310 lines (number of assembled transcripts) then at the end of the file: >> >> HWI-EAS146_0380:5:100:14414:21375#0/1 16 Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU >> HWI-EAS146_0380:5:100:14769:21375#0/1 16 Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 * * 0 0 ACGACGAAGAAGATGACGACGAGGAC ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU >> HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU >> HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU >> HWI-EAS146_0380:5:100:16039:21373#0/1 16 Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:16121:21377#0/1 16 Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 * * 0 0 CCATTCATCATCTTGATACTGCAGAT fffffhhghfghhhhhhhhhhhehea YT:Z:UU >> HWI-EAS146_0380:5:100:16964:21378#0/1 0 Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU >> HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 * * 0 0 GCCATCAAACCTTACCAATGCTGAGA dacccf[afffffffffdfff]ffca YT:Z:UU >> HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA a`a`aggggffffffgaggdfcdfdg YT:Z:UU >> >> Thanks again, >> Alan >> >> >> >> >> On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: >> Dear Alan, >> >> Could you please provide us a few reads in your SAM file so that we can reproduce the problem? Sorry for your crashed R session. >> >> Best wishes, >> Wei >> >> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: >> >> > Hello, >> > >> > I'm trying to use featureCounts to extract read counts per transcript from >> > the SAM file (generated using Bowtie2) mapped to de novo assembled >> > transcripts (for DE analysis). I made annotation file as per the SAF >> > requirements: >> > >> > Eg: >> > >> > *GeneID Chr Start End Strand* >> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 >> > Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + >> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 >> > Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + >> > >> > Here I made both the GeneID and the Chr columns same with *Start* being 1 >> > and *End* as the length of the transcript and the Strand as + for all >> > transcripts. >> > >> > When I used the following code, it does nothing for a while and then R >> > crashes. >> > >> > library(Rsubread) >> > counts <- >> > featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot ="AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) >> > >> > Rsubread works great with GTF as annot parameter on genome mapped SAM files >> > but not with the SAF, as I tried here. >> > >> > Please let me know where I'm wrong here or if anyone tried other packages >> > that serves the purpose. >> > >> > sessionInfo() >> > R version 3.0.1 (2013-05-16) >> > Platform: x86_64-redhat-linux-gnu (64-bit) >> > >> > locale: >> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C >> > LC_NAME=C >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] splines parallel grid stats graphics grDevices utils >> > datasets methods base >> > >> > other attached packages: >> > [1] Rsamtools_1.12.3 edgeR_3.2.4 >> > Rsubread_1.10.5 >> > [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 >> > limma_3.16.7 >> > [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 >> > RSQLite_0.11.4 >> > [10] DBI_0.2-7 AnnotationDbi_1.22.6 >> > BSgenome.Ecoli.NCBI.20080805_1.3.17 >> > [13] BSgenome_1.28.0 GenomicRanges_1.12.4 >> > Biostrings_2.28.0 >> > [16] IRanges_1.18.2 multtest_2.16.0 >> > Biobase_2.20.1 >> > [19] biomaRt_2.16.0 BiocGenerics_0.6.0 >> > VennDiagram_1.6.4 >> > [22] sendmailR_1.1-2 base64enc_0.1-1 >> > BiocInstaller_1.10.3 >> > >> > loaded via a namespace (and not attached): >> > [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 >> > rtracklayer_1.20.4 stats4_3.0.1 >> > [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:17}}

ADD COMMENT • link 12.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Alan Smith ▴ 150

@alan-smith-5987

Last seen 8.7 years ago

United States

Dear Wei and Yang, I installed Rsubread version 1.11.11. After using the following code, it ended up in a different error this time. Thank you, Alan library(Rsubread) featureCounts(files="AssembledTranscriptomeLMcontrol1.sam",annot.ext=" AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) Error: lazy-load database 'P' is corruptIn addition: Warning message:internal error -3 in R_decompress1 > sessionInfo()R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.11.4 Rsubread_1.11.11 loaded via a namespace (and not attached): [1] tools_3.0.1 On Sat, Aug 3, 2013 at 8:18 PM, Alan Smith <alan.sm310@gmail.com> wrote: > Dear Wei, > > Thanks a bunch for looking into it. I tried it again just a minute back > after downloading Rsubread from bioc devel and its still the same. > > I'm certain that the changes you made were in it based on the use of > "annot.ext", just "annot" won't work now. > > I will try again tomorrow. > > I greatly appreciate your time and help. > Alan > > > On Sat, Aug 3, 2013 at 7:56 PM, Wei Shi <shi@wehi.edu.au> wrote: > >> Dear Alan, >> >> Just to clarify this. We have also fixed the problem you reported for >> Rsubread v1.11.10 in bioc devel. But depending on the time you downloaded >> v1.11.10 from bioc devel, it may work or may not work. But v1.11.11 should >> always work and it should be available tomorrow. Once again, please do >> check the version you use. >> >> Sorry for the confusion. >> >> Best wishes, >> >> Wei >> >> >> On Aug 4, 2013, at 9:36 AM, Yang Liao wrote: >> >> > Dear Alan, >> > >> > Thank you for using Rsubread. The number of transcripts in the >> annotation >> > file would not be an issue -- our featureCounts function supports >> > unlimited number of annotations and chromosomes/transcripts (actually, >> at >> > most 2 billions). >> > >> > I have just tried to run the in-development version of Rsubread to >> assign >> > reads in your last mail, using the two-line annotation in the other >> mail. >> > The program finished with no errors. The version number I tested was >> > "Rsubread_1.11.10". >> > >> > If this is the version you are using, could you please run this command? >> > >> > $ cat my_annotations.txt | awk 'BEGIN{max_feature=0; max_chro=0} >> > length($1)>max_feature{max_feature=length($1) } >> > length($2)>max_chro{max_chro=length($2)} END{print max_feature, >> max_chro}' >> > >> > >> > where "my_annotations.txt" is the annotation file in SAF format. This >> > command will give the length of the longest feature name and the longest >> > chromosome name. If either of them is longer than 99, it is still too >> long >> > to our program. >> > >> > Thank you. >> > >> > Cheers, >> > >> > Yang >> > >> >> Dear Wei, >> >> >> >> I tried with the devel version but still the problem persists (R >> crashed). >> >> Does this got to do with the large number of transcripts (129310) in >> the >> >> file? >> >> >> >> Could you also please comment on whether if the count data obtained >> from >> >> Rsubread processed on de novo assembled transcripts be used for edgeR >> >> differential expression analysis. Or using RSEM to obtain read count >> >> information is the way to go? >> >> >> >> Thank you, >> >> Alan >> >> >> >> >> >> On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi@wehi.edu.au> wrote: >> >> >> >>> Dear Alan, >> >>> >> >>> Thanks for sending through the data which helped us to identify the >> >>> problem. The problem was that featureCounts did not allow the >> chromosome >> >>> names in the provided annotation file to be longer than 48 characters. >> >>> We >> >>> have increased this limit to 100 characters. This solved the problem >> and >> >>> featureCounts worked fine for your data now from our testing. >> >>> >> >>> We have committed the changes to bioc devel repository and the latest >> >>> version should available in a couple of days (Rsubread v1.11.11). >> >>> >> >>> Note that we have also made some changes to parameters used by >> >>> featureCounts. The 'annot' parameter is now changed to 'annot.ext' and >> >>> the >> >>> 'genome' parameter changed to 'annot.inbuilt'. This makes it more >> >>> clearer >> >>> on how to use our in-built annotations and how to provide external >> >>> annotations. >> >>> >> >>> Let me know if the problem you encountered persists. >> >>> >> >>> >> >>> Best wishes, >> >>> >> >>> Wei >> >>> >> >>> On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: >> >>> >> >>> Hi Wei, >> >>> >> >>> Thanks for looking into it. Here are few lines from my SAM file: >> >>> >> >>> @HD VN:1.0 SO:unsorted >> >>> @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 >> >>> @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 >> >>> @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 >> >>> @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 >> >>> @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 >> >>> @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 >> >>> @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 >> >>> @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 >> >>> @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 >> >>> >> >>> *This goes for 129310 lines (number of assembled transcripts) then at >> >>> the end of the file:* >> >>> >> >>> >> >>> HWI-EAS146_0380:5:100:14414:21375#0/1 16 >> Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 >> 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC >> ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 >> XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU >> >>> HWI-EAS146_0380:5:100:14769:21375#0/1 16 >> Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 >> 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT >> c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 >> NM:i:0 MD:Z:26 YT:Z:UU >> >>> HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 >> * * 0 0 ACGACGAAGAAGATGACGACGAGGAC >> ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU >> >>> HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 >> * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA >> YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU >> >>> HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 >> * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG >> `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU >> >>> HWI-EAS146_0380:5:100:16039:21373#0/1 16 >> Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 >> 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT >> hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 >> NM:i:0 MD:Z:26 YT:Z:UU >> >>> HWI-EAS146_0380:5:100:16121:21377#0/1 16 >> Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 >> 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC >> fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 >> NM:i:0 MD:Z:26 YT:Z:UU >> >>> HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 >> * * 0 0 CCATTCATCATCTTGATACTGCAGAT >> fffffhhghfghhhhhhhhhhhehea YT:Z:UU >> >>> HWI-EAS146_0380:5:100:16964:21378#0/1 0 >> Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M >> * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA >> `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 >> NM:i:0 MD:Z:26 YT:Z:UU >> >>> HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 >> * * 0 0 GCCATCAAACCTTACCAATGCTGAGA >> dacccf[afffffffffdfff]ffca YT:Z:UU >> >>> HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 >> * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA >> a`a`aggggffffffgaggdfcdfdg YT:Z:UU >> >>> >> >>> >> >>> Thanks again, >> >>> >> >>> Alan >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: >> >>> >> >>>> Dear Alan, >> >>>> >> >>>> Could you please provide us a few reads in your SAM file so that we >> can >> >>>> reproduce the problem? Sorry for your crashed R session. >> >>>> >> >>>> Best wishes, >> >>>> Wei >> >>>> >> >>>> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: >> >>>> >> >>>>> Hello, >> >>>>> >> >>>>> I'm trying to use featureCounts to extract read counts per >> transcript >> >>>> from >> >>>>> the SAM file (generated using Bowtie2) mapped to de novo assembled >> >>>>> transcripts (for DE analysis). I made annotation file as per the SAF >> >>>>> requirements: >> >>>>> >> >>>>> Eg: >> >>>>> >> >>>>> *GeneID Chr Start End Strand* >> >>>>> Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 >> >>>>> Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + >> >>>>> Locus_58_Transcript_85/85_Confidence_0.017_Length_650 >> >>>>> Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + >> >>>>> >> >>>>> Here I made both the GeneID and the Chr columns same with *Start* >> >>>> being >> >>>> 1 >> >>>>> and *End* as the length of the transcript and the Strand as + for >> all >> >>>>> transcripts. >> >>>>> >> >>>>> When I used the following code, it does nothing for a while and then >> >>>> R >> >>>>> crashes. >> >>>>> >> >>>>> library(Rsubread) >> >>>>> counts <- >> >>>>> >> >>>> >> featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",annot=" AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) >> >>>>> >> >>>>> Rsubread works great with GTF as annot parameter on genome mapped >> SAM >> >>>> files >> >>>>> but not with the SAF, as I tried here. >> >>>>> >> >>>>> Please let me know where I'm wrong here or if anyone tried other >> >>>> packages >> >>>>> that serves the purpose. >> >>>>> >> >>>>> sessionInfo() >> >>>>> R version 3.0.1 (2013-05-16) >> >>>>> Platform: x86_64-redhat-linux-gnu (64-bit) >> >>>>> >> >>>>> locale: >> >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >>>>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C >> >>>>> LC_NAME=C >> >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >>>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >>>>> >> >>>>> attached base packages: >> >>>>> [1] splines parallel grid stats graphics grDevices >> utils >> >>>>> datasets methods base >> >>>>> >> >>>>> other attached packages: >> >>>>> [1] Rsamtools_1.12.3 edgeR_3.2.4 >> >>>>> Rsubread_1.10.5 >> >>>>> [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 >> >>>>> limma_3.16.7 >> >>>>> [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 >> >>>>> RSQLite_0.11.4 >> >>>>> [10] DBI_0.2-7 AnnotationDbi_1.22.6 >> >>>>> BSgenome.Ecoli.NCBI.20080805_1.3.17 >> >>>>> [13] BSgenome_1.28.0 GenomicRanges_1.12.4 >> >>>>> Biostrings_2.28.0 >> >>>>> [16] IRanges_1.18.2 multtest_2.16.0 >> >>>>> Biobase_2.20.1 >> >>>>> [19] biomaRt_2.16.0 BiocGenerics_0.6.0 >> >>>>> VennDiagram_1.6.4 >> >>>>> [22] sendmailR_1.1-2 base64enc_0.1-1 >> >>>>> BiocInstaller_1.10.3 >> >>>>> >> >>>>> loaded via a namespace (and not attached): >> >>>>> [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 >> >>>>> rtracklayer_1.20.4 stats4_3.0.1 >> >>>>> [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 >> >>>> zlibbioc_1.6.0 >> >>>>> >> >>>>> [[alternative HTML version deleted]] >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Bioconductor mailing list >> >>>>> Bioconductor@r-project.org >> >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >>>> >> ______________________________________________________________________ >> >>>> The information in this email is confidential and intended solely for >> >>>> the >> >>>> addressee. >> >>>> You must not disclose, forward, print or use it without the >> permission >> >>>> of >> >>>> the sender. >> >>>> >> ______________________________________________________________________ >> >>>> >> >>> >> >>> >> >>> >> >>> ______________________________________________________________________ >> >>> The information in this email is confidential and intended solely for >> >>> the >> >>> addressee. >> >>> You must not disclose, forward, print or use it without the permission >> >>> of >> >>> the sender. >> >>> ______________________________________________________________________ >> >>> >> >> >> > >> > >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.5 years ago Alan Smith ▴ 150

0

Entering edit mode

Dear Alan, Is it possible that you can provide us a reproducible example so that we can find out what caused the problem you encountered? We could not figure out what is the problem from the error message. If your example data is big, you may send it off list and you may put it somewhere so that we can download. Thanks, Wei On Aug 5, 2013, at 4:16 AM, Alan Smith wrote: > Dear Wei and Yang, > > I installed Rsubread version 1.11.11. > > After using the following code, it ended up in a different error this time. > > > Thank you, > Alan > > library(Rsubread) > featureCounts(files="AssembledTranscriptomeLMcontrol1.sam",annot.ext ="AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) > > Error: lazy-load database 'P' is corrupt > In addition: Warning message: > internal error -3 in R_decompress1 > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.11.4 Rsubread_1.11.11 > > loaded via a namespace (and not attached): > [1] tools_3.0.1 > > > On Sat, Aug 3, 2013 at 8:18 PM, Alan Smith <alan.sm310@gmail.com> wrote: > Dear Wei, > > Thanks a bunch for looking into it. I tried it again just a minute back after downloading Rsubread from bioc devel and its still the same. > > I'm certain that the changes you made were in it based on the use of "annot.ext", just "annot" won't work now. > > I will try again tomorrow. > > I greatly appreciate your time and help. > Alan > > > On Sat, Aug 3, 2013 at 7:56 PM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Alan, > > Just to clarify this. We have also fixed the problem you reported for Rsubread v1.11.10 in bioc devel. But depending on the time you downloaded v1.11.10 from bioc devel, it may work or may not work. But v1.11.11 should always work and it should be available tomorrow. Once again, please do check the version you use. > > Sorry for the confusion. > > Best wishes, > > Wei > > > On Aug 4, 2013, at 9:36 AM, Yang Liao wrote: > > > Dear Alan, > > > > Thank you for using Rsubread. The number of transcripts in the annotation > > file would not be an issue -- our featureCounts function supports > > unlimited number of annotations and chromosomes/transcripts (actually, at > > most 2 billions). > > > > I have just tried to run the in-development version of Rsubread to assign > > reads in your last mail, using the two-line annotation in the other mail. > > The program finished with no errors. The version number I tested was > > "Rsubread_1.11.10". > > > > If this is the version you are using, could you please run this command? > > > > $ cat my_annotations.txt | awk 'BEGIN{max_feature=0; max_chro=0} > > length($1)>max_feature{max_feature=length($1) } > > length($2)>max_chro{max_chro=length($2)} END{print max_feature, max_chro}' > > > > > > where "my_annotations.txt" is the annotation file in SAF format. This > > command will give the length of the longest feature name and the longest > > chromosome name. If either of them is longer than 99, it is still too long > > to our program. > > > > Thank you. > > > > Cheers, > > > > Yang > > > >> Dear Wei, > >> > >> I tried with the devel version but still the problem persists (R crashed). > >> Does this got to do with the large number of transcripts (129310) in the > >> file? > >> > >> Could you also please comment on whether if the count data obtained from > >> Rsubread processed on de novo assembled transcripts be used for edgeR > >> differential expression analysis. Or using RSEM to obtain read count > >> information is the way to go? > >> > >> Thank you, > >> Alan > >> > >> > >> On Fri, Aug 2, 2013 at 8:35 PM, Wei Shi <shi@wehi.edu.au> wrote: > >> > >>> Dear Alan, > >>> > >>> Thanks for sending through the data which helped us to identify the > >>> problem. The problem was that featureCounts did not allow the chromosome > >>> names in the provided annotation file to be longer than 48 characters. > >>> We > >>> have increased this limit to 100 characters. This solved the problem and > >>> featureCounts worked fine for your data now from our testing. > >>> > >>> We have committed the changes to bioc devel repository and the latest > >>> version should available in a couple of days (Rsubread v1.11.11). > >>> > >>> Note that we have also made some changes to parameters used by > >>> featureCounts. The 'annot' parameter is now changed to 'annot.ext' and > >>> the > >>> 'genome' parameter changed to 'annot.inbuilt'. This makes it more > >>> clearer > >>> on how to use our in-built annotations and how to provide external > >>> annotations. > >>> > >>> Let me know if the problem you encountered persists. > >>> > >>> > >>> Best wishes, > >>> > >>> Wei > >>> > >>> On Aug 2, 2013, at 2:06 PM, Alan Smith wrote: > >>> > >>> Hi Wei, > >>> > >>> Thanks for looking into it. Here are few lines from my SAM file: > >>> > >>> @HD VN:1.0 SO:unsorted > >>> @SQ SN:Locus_1_Transcript_1/43_Confidence_0.175_Length_1072 LN:1072 > >>> @SQ SN:Locus_1_Transcript_6/43_Confidence_0.105_Length_219 LN:219 > >>> @SQ SN:Locus_1_Transcript_8/43_Confidence_0.088_Length_258 LN:258 > >>> @SQ SN:Locus_1_Transcript_10/43_Confidence_0.088_Length_421 LN:421 > >>> @SQ SN:Locus_1_Transcript_14/43_Confidence_0.105_Length_217 LN:217 > >>> @SQ SN:Locus_1_Transcript_15/43_Confidence_0.105_Length_392 LN:392 > >>> @SQ SN:Locus_1_Transcript_20/43_Confidence_0.123_Length_547 LN:547 > >>> @SQ SN:Locus_1_Transcript_21/43_Confidence_0.105_Length_408 LN:408 > >>> @SQ SN:Locus_1_Transcript_23/43_Confidence_0.070_Length_246 LN:246 > >>> > >>> *This goes for 129310 lines (number of assembled transcripts) then at > >>> the end of the file:* > >>> > >>> > >>> HWI-EAS146_0380:5:100:14414:21375#0/1 16 Locus_21817_Transcript_4/5_Confidence_0.200_Length_1641 933 0 26M * 0 0 TCTCGGGGCTCCACTGGAGATGGTCC ^Qa]cf]\U\Z_Zbbeccfaf]`][S AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C0A24 YT:Z:UU > >>> HWI-EAS146_0380:5:100:14769:21375#0/1 16 Locus_3777_Transcript_1/1_Confidence_0.000_Length_2776 283 42 26M * 0 0 ATCACATCTCTTTACAGAACCCACTT c]fbdbafdaeafdfdffeffa_aa^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > >>> HWI-EAS146_0380:5:100:14923:21376#0/1 4 * 0 0 * * 0 0 ACGACGAAGAAGATGACGACGAGGAC ^^\^Y^W[]aa_cW^ddb``a`WcfR YT:Z:UU > >>> HWI-EAS146_0380:5:100:15383:21374#0/1 4 * 0 0 * * 0 0 CAACCTTCACCGTGGGTTCTAATGTA YY\\_fff[fd^^^]ZZ]^Wc]c]_a YT:Z:UU > >>> HWI-EAS146_0380:5:100:15873:21375#0/1 4 * 0 0 * * 0 0 CCTGCTTAAAGAGATCGGAAGAGCGG `_Z^acRcccffYf[ca]d]ff[cdf YT:Z:UU > >>> HWI-EAS146_0380:5:100:16039:21373#0/1 16 Locus_4044_Transcript_6/48_Confidence_0.066_Length_596 412 42 26M * 0 0 TGCAAGCATTCCTCCTGTGGAGGGGT hghedhgdhcgceffdhhhhgeeeee AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > >>> HWI-EAS146_0380:5:100:16121:21377#0/1 16 Locus_2303_Transcript_7/11_Confidence_0.500_Length_1785 1287 42 26M * 0 0 GCTCCGTTTGGATCCGAGAATAGCAC fchhhghhfhgghhfgghhhh\a\\\ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > >>> HWI-EAS146_0380:5:100:16914:21379#0/1 4 * 0 0 * * 0 0 CCATTCATCATCTTGATACTGCAGAT fffffhhghfghhhhhhhhhhhehea YT:Z:UU > >>> HWI-EAS146_0380:5:100:16964:21378#0/1 0 Locus_17940_Transcript_2/7_Confidence_0.538_Length_1249 584 42 26M * 0 0 CTGTAGCCTTTCAGTGAGCTGAGAAA `b^``effffgfgfdgegggedgggg AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:26 YT:Z:UU > >>> HWI-EAS146_0380:5:100:17356:21373#0/1 4 * 0 0 * * 0 0 GCCATCAAACCTTACCAATGCTGAGA dacccf[afffffffffdfff]ffca YT:Z:UU > >>> HWI-EAS146_0380:5:100:17415:21379#0/1 4 * 0 0 * * 0 0 CTAGATGCCTCTTCAGGACTTGAGAA a`a`aggggffffffgaggdfcdfdg YT:Z:UU > >>> > >>> > >>> Thanks again, > >>> > >>> Alan > >>> > >>> > >>> > >>> > >>> > >>> On Thu, Aug 1, 2013 at 10:55 PM, Wei Shi <shi@wehi.edu.au> wrote: > >>> > >>>> Dear Alan, > >>>> > >>>> Could you please provide us a few reads in your SAM file so that we can > >>>> reproduce the problem? Sorry for your crashed R session. > >>>> > >>>> Best wishes, > >>>> Wei > >>>> > >>>> On Aug 2, 2013, at 12:47 PM, Alan Smith wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> I'm trying to use featureCounts to extract read counts per transcript > >>>> from > >>>>> the SAM file (generated using Bowtie2) mapped to de novo assembled > >>>>> transcripts (for DE analysis). I made annotation file as per the SAF > >>>>> requirements: > >>>>> > >>>>> Eg: > >>>>> > >>>>> *GeneID Chr Start End Strand* > >>>>> Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 > >>>>> Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 1 1485 + > >>>>> Locus_58_Transcript_85/85_Confidence_0.017_Length_650 > >>>>> Locus_58_Transcript_85/85_Confidence_0.017_Length_650 1 650 + > >>>>> > >>>>> Here I made both the GeneID and the Chr columns same with *Start* > >>>> being > >>>> 1 > >>>>> and *End* as the length of the transcript and the Strand as + for all > >>>>> transcripts. > >>>>> > >>>>> When I used the following code, it does nothing for a while and then > >>>> R > >>>>> crashes. > >>>>> > >>>>> library(Rsubread) > >>>>> counts <- > >>>>> > >>>> featureCounts(files="AssembledTranscriptome-LMcontrol1.sam",ann ot="AnnotRsubreadSAF.txt",isGTFAnnotationFile=FALSE) > >>>>> > >>>>> Rsubread works great with GTF as annot parameter on genome mapped SAM > >>>> files > >>>>> but not with the SAF, as I tried here. > >>>>> > >>>>> Please let me know where I'm wrong here or if anyone tried other > >>>> packages > >>>>> that serves the purpose. > >>>>> > >>>>> sessionInfo() > >>>>> R version 3.0.1 (2013-05-16) > >>>>> Platform: x86_64-redhat-linux-gnu (64-bit) > >>>>> > >>>>> locale: > >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >>>>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C > >>>>> LC_NAME=C > >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >>>>> > >>>>> attached base packages: > >>>>> [1] splines parallel grid stats graphics grDevices utils > >>>>> datasets methods base > >>>>> > >>>>> other attached packages: > >>>>> [1] Rsamtools_1.12.3 edgeR_3.2.4 > >>>>> Rsubread_1.10.5 > >>>>> [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3 > >>>>> limma_3.16.7 > >>>>> [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0 > >>>>> RSQLite_0.11.4 > >>>>> [10] DBI_0.2-7 AnnotationDbi_1.22.6 > >>>>> BSgenome.Ecoli.NCBI.20080805_1.3.17 > >>>>> [13] BSgenome_1.28.0 GenomicRanges_1.12.4 > >>>>> Biostrings_2.28.0 > >>>>> [16] IRanges_1.18.2 multtest_2.16.0 > >>>>> Biobase_2.20.1 > >>>>> [19] biomaRt_2.16.0 BiocGenerics_0.6.0 > >>>>> VennDiagram_1.6.4 > >>>>> [22] sendmailR_1.1-2 base64enc_0.1-1 > >>>>> BiocInstaller_1.10.3 > >>>>> > >>>>> loaded via a namespace (and not attached): > >>>>> [1] bitops_1.0-5 MASS_7.3-26 RCurl_1.95-4.1 > >>>>> rtracklayer_1.20.4 stats4_3.0.1 > >>>>> [6] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 > >>>> zlibbioc_1.6.0 > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioconductor mailing list > >>>>> Bioconductor@r-project.org > >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>>> ______________________________________________________________________ > >>>> The information in this email is confidential and intended solely for > >>>> the > >>>> addressee. > >>>> You must not disclose, forward, print or use it without the permission > >>>> of > >>>> the sender. > >>>> ______________________________________________________________________ > >>>> > >>> > >>> > >>> > >>> ______________________________________________________________________ > >>> The information in this email is confidential and intended solely for > >>> the > >>> addressee. > >>> You must not disclose, forward, print or use it without the permission > >>> of > >>> the sender. > >>> ______________________________________________________________________ > >>> > >> > > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:18}}

ADD REPLY • link 12.5 years ago Wei Shi ★ 3.6k

Login before adding your answer.