how to get the exon count by using htseq-count
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear teacher, How to get the exon count by using htseq-count? My script is: python -m HTSeq.scripts.count -m intersection-strict --stranded=no -t exon -i exon_id accepted_hits_name.sorted.sam Homo_sapiens.GRCh37.70_new.gtf>outfile_htseqcount_exon For example gene SLC25A13 has 18 exons, how can I get the counts for the 18 exons? How can I get the result like this (coming from easyRNASeq): "\"ENSG00000004864\"_1" 2 "\"ENSG00000004864\"_2" 4 "\"ENSG00000004864\"_3" 16 "\"ENSG00000004864\"_4" 3 "\"ENSG00000004864\"_5" 7 "\"ENSG00000004864\"_6" 8 "\"ENSG00000004864\"_7" 5 "\"ENSG00000004864\"_8" 4 "\"ENSG00000004864\"_9" 4 "\"ENSG00000004864\"_10" 1 "\"ENSG00000004864\"_11" 6 "\"ENSG00000004864\"_12" 4 "\"ENSG00000004864\"_13" 4 "\"ENSG00000004864\"_14" 6 "\"ENSG00000004864\"_15" 8 "\"ENSG00000004864\"_16" 5 "\"ENSG00000004864\"_17" 3 "\"ENSG00000004864\"_18" 25 Here is a part from my genes.gtf (human ensembl) 7 protein_coding exon 95951254 95951405 . - . gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "1"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; exon_id "ENSE00001830180"; 7 protein_coding CDS 95951254 95951268 . - 0 gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "1"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; protein_id "ENSP00000265631"; 7 protein_coding start_codon 95951266 95951268 . - 0 gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "1"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; 7 protein_coding exon 95926210 95926263 . - . gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "2"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; exon_id "ENSE00003192771"; 7 protein_coding CDS 95926210 95926263 . - 0 gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "2"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; protein_id "ENSP00000265631"; 7 protein_coding exon 95906508 95906650 . - . gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "3"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; exon_id "ENSE00003069204"; 7 protein_coding CDS 95906508 95906650 . - 0 gene_id "ENSG00000004864"; transcript_id "ENST00000265631"; exon_number "3"; gene_name "SLC25A13"; gene_biotype "protein_coding"; transcript_name "SLC25A13-001"; protein_id "ENSP00000265631"; 7 protein_coding exon -- output of sessionInfo(): > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] easyRNASeq_1.4.2 ShortRead_1.16.4 latticeExtra_0.6-24 [4] RColorBrewer_1.0-5 Rsamtools_1.10.2 DESeq_1.10.1 [7] lattice_0.20-13 locfit_1.5-8 BSgenome_1.26.1 [10] GenomicRanges_1.10.7 Biostrings_2.26.3 IRanges_1.16.6 [13] edgeR_3.0.8 limma_3.14.4 biomaRt_2.14.0 [16] Biobase_2.18.0 genomeIntervals_1.14.0 BiocGenerics_0.4.0 [19] intervals_0.13.3 loaded via a namespace (and not attached): [1] annotate_1.36.0 AnnotationDbi_1.20.5 bitops_1.0-5 [4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0 [7] grid_2.15.2 hwriter_1.3 RCurl_1.95-3 [10] RSQLite_0.11.2 splines_2.15.2 stats4_2.15.2 [13] survival_2.37-4 XML_3.95-0.1 xtable_1.7-1 [16] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
• 2.1k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.7 years ago
Zentrum für Molekularbiologie, Universi…
Hi On 19/03/13 10:45, Hu Fuyan [guest] wrote: > How to get the exon count by using htseq-count? > My script is: > python -m HTSeq.scripts.count -m intersection-strict --stranded=no -t exon -i exon_id accepted_hits_name.sorted.sam Homo_sapiens.GRCh37.70_new.gtf>outfile_htseqcount_exon You should not use htseq-count for this purpose. Rather, use the Python scripts dexseq-prepare.py and dexseq-count.py. For more information, please see the last section of the DEXSeq vignette and the relevant sections in the vignette of the pasilla package. If this does not help, please ask again. Simon
ADD COMMENT

Login before adding your answer.

Traffic: 1088 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6