HT-Seq count and GTF
2
0
Entering edit mode
@jose-m-garcia-manteiga-6046
Last seen 7.7 years ago
Italy
> > Dear Simon, > I am using HT-Seq count to obtain counts on sorted bam files based on gene-id as you recommended, but I had a problem. > > I started by using the GTF file from UCSC ensemble genes (build 72), which seemed to work fine and produced a file of counts per bam file. However I realised looking at one of our genes of interest, which "had" to be expressed, that the counts were 0. > > By looking at the bam files with IGV it shouldn't because a good number of reads were mapped to exons of the gene. So I inspected the GTF and surfing BioC I found the issue of gene_id being equal to transcript_id in Ensembl_UCSC gtfs, and I remembered you advised us to use the one from Ensembl ftp, hence, If I understood well the error, that meant we were counting only genes with just one transcript, hence all the others, ht-seq count (-m union -t exon) with overlapping exons, were losing those reads, or at least they were given only the reads of the "unique" exons. Indeed, the histogram looked very left shifted with most of the features with a mean (in 9 samples, 3 groups) less than 10. > > Then I downloaded the new GTF from Ensembl, and I encountered the second problem. Again discussed in the past here and elsewhere about chromosome names not beginning with 'chr'. I applied the line: > > awk '{print "chr"$0}' Homo_sapiens.GRCh37.72.gtf | sed 's/chrMT/chrM/g' > hg19.ensembl-with-chr.gtf > > ..that I found in the internet. Checked again the GTF, and sent the htseq-count.py with this command, as usual: > > /usr/local/cluster/bin/samtools view $bamFile | /usr/local/cluster/python2.7/bin/python2.7 /usr/local/cluster/python2.7/bin/htseq-count -m union -t exon -s no -q - $GTF > $out > > Got again the counts. When looked at the histogram with DESeq2 doing: > > hist(log2(assay(ddsFull)),breaks=50) > > ..the histogram had shifted right, meaning that I was including now many more counts in genes by avoiding the gene_id=transcript_id_many_ambiguous_issue. > Then I checked again my favourite gene, but got 0 again!! and here is my puzzle. > The bam file with IGV looks like this: > > <screen shot="" 2013-07-17="" at="" 11.26.58="" am.png=""> > > I let you see the lines of my GTF coming from the ENSEMBL_GENE_ID of my gene, as downloaded from ftp of ensembl build 72 (genome used hg19, aligner SOAPSplice), after adding "chr": > > chr19 protein_coding exon 18263626 18264104 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003067609"; > chr19 protein_coding exon 18266267 18267011 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003474719"; > chr19 protein_coding CDS 18266690 18267011 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding start_codon 18266690 18266692 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; > chr19 protein_coding exon 18271281 18271373 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003603229"; > chr19 protein_coding CDS 18271281 18271373 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18271729 18271779 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "4"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003645468"; > chr19 protein_coding CDS 18271729 18271779 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "4"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18271864 18271995 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "5"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003662245"; > chr19 protein_coding CDS 18271864 18271995 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "5"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18272089 18272305 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "6"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003498230"; > chr19 protein_coding CDS 18272089 18272305 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "6"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18272776 18272861 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "7"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003464305"; > chr19 protein_coding CDS 18272776 18272861 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "7"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18273012 18273120 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "8"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003516250"; > chr19 protein_coding CDS 18273012 18273120 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "8"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18273218 18273316 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "9"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003540261"; > chr19 protein_coding CDS 18273218 18273316 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "9"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18273777 18273957 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "10"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003546729"; > chr19 protein_coding CDS 18273777 18273957 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "10"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18274073 18274198 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "11"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003571819"; > chr19 protein_coding CDS 18274073 18274198 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "11"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18276970 18277112 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "12"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003506088"; > chr19 protein_coding CDS 18276970 18277112 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "12"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18277940 18278116 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "13"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003589841"; > chr19 protein_coding CDS 18277940 18278116 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "13"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18279285 18279356 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "14"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003502685"; > chr19 protein_coding CDS 18279285 18279356 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "14"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18279536 18279706 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "15"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00003498129"; > chr19 protein_coding CDS 18279536 18279706 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "15"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding exon 18279897 18281350 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "16"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; exon_id "ENSE00000893034"; > chr19 protein_coding CDS 18279897 18280101 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "16"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; protein_id "ENSP00000222254"; > chr19 protein_coding stop_codon 18280102 18280104 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000222254"; exon_number "16"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-001"; > chr19 protein_coding exon 18264092 18264418 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000473973"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-003"; exon_id "ENSE00001921962"; > chr19 protein_coding exon 18266267 18266808 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000473973"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-003"; exon_id "ENSE00001834321"; > chr19 protein_coding CDS 18266690 18266808 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000473973"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-003"; protein_id "ENSP00000470075"; > chr19 protein_coding start_codon 18266690 18266692 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000473973"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-003"; > chr19 nonsense_mediated_decay exon 18266432 18267011 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00001721882"; > chr19 nonsense_mediated_decay CDS 18266690 18267011 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay start_codon 18266690 18266692 . + 0gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; > chr19 nonsense_mediated_decay exon 18271281 18271373 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003603229"; > chr19 nonsense_mediated_decay CDS 18271281 18271373 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18271729 18271779 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003645468"; > chr19 nonsense_mediated_decay CDS 18271729 18271779 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18271864 18271995 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "4"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003662245"; > chr19 nonsense_mediated_decay CDS 18271864 18271995 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "4"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18272089 18272305 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "5"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003498230"; > chr19 nonsense_mediated_decay CDS 18272089 18272305 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "5"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18272776 18272861 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "6"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003464305"; > chr19 nonsense_mediated_decay CDS 18272776 18272861 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "6"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18273012 18273120 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "7"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003516250"; > chr19 nonsense_mediated_decay CDS 18273012 18273120 . + 2 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "7"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18273218 18273316 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "8"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003540261"; > chr19 nonsense_mediated_decay CDS 18273218 18273316 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "8"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18273777 18273957 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "9"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003546729"; > chr19 nonsense_mediated_decay CDS 18273777 18273957 . + 1 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "9"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18274073 18274198 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "10"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003571819"; > chr19 nonsense_mediated_decay CDS 18274073 18274198 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "10"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay exon 18276921 18277112 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "11"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00001646815"; > chr19 nonsense_mediated_decay CDS 18276921 18276947 . + 0 gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "11"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; protein_id "ENSP00000395636"; > chr19 nonsense_mediated_decay stop_codon 18276948 18276950 . + 0gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "11"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; > chr19 nonsense_mediated_decay exon 18277940 18278116 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "12"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003595850"; > chr19 nonsense_mediated_decay exon 18279285 18279356 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "13"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003569756"; > chr19 nonsense_mediated_decay exon 18279536 18279706 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "14"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00003638071"; > chr19 nonsense_mediated_decay exon 18279897 18280883 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000426902"; exon_number "15"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-002"; exon_id "ENSE00001779641"; > chr19 retained_intron exon 18271677 18271779 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000474310"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-004"; exon_id "ENSE00001888508"; > chr19 retained_intron exon 18271864 18271995 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000474310"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-004"; exon_id "ENSE00003611107"; > chr19 retained_intron exon 18272089 18272278 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000474310"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-004"; exon_id "ENSE00001872488"; > chr19 retained_intron exon 18273004 18273120 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000600533"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-008"; exon_id "ENSE00003164217"; > chr19 retained_intron exon 18273218 18273527 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000600533"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-008"; exon_id "ENSE00003205602"; > chr19 retained_intron exon 18276966 18277112 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000464016"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-006"; exon_id "ENSE00001931248"; > chr19 retained_intron exon 18277940 18278116 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000464016"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-006"; exon_id "ENSE00003595850"; > chr19 retained_intron exon 18279285 18279706 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000464016"; exon_number "3"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-006"; exon_id "ENSE00001846742"; > chr19 retained_intron exon 18279897 18279962 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000464016"; exon_number "4"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-006"; exon_id "ENSE00001924470"; > chr19 retained_intron exon 18279228 18279356 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000459743"; exon_number "1"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-005"; exon_id "ENSE00001827706"; > chr19 retained_intron exon 18279536 18279560 . + . gene_id "ENSG00000105647"; transcript_id "ENST00000459743"; exon_number "2"; gene_name "PIK3R2"; gene_biotype "protein_coding"; transcript_name "PIK3R2-005"; exon_id "ENSE00001913624"; > > > > What could be the problem? Thanks in advance > Regards > Jose > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > [[alternative HTML version deleted]]
• 1.8k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi Jose, On Wed, Jul 17, 2013 at 3:26 AM, Jose M Garcia Manteiga <garciamanteiga.josemanuel at="" hsr.it=""> wrote: >> >> Dear Simon, >> I am using HT-Seq count to obtain counts on sorted bam files based on gene-id as you recommended, but I had a problem. >> >> I started by using the GTF file from UCSC ensemble genes (build 72), which seemed to work fine and produced a file of counts per bam file. However I realised looking at one of our genes of interest, which "had" to be expressed, that the counts were 0. [snip] This does not answer your question, but perhaps you might like to try an alternative "all-bioc" approach to counting reads over genes. This is outlined in the vignette to the parathyroidSE data package here: http://bioconductor.org/packages/release/data/experiment/vignettes/par athyroidSE/inst/doc/parathyroidSE.pdf Look at section 4 (counting reads in genes), which uses the GenomicRanges::summarizeOverlaps method. You should also read through how the different summarizeOverlaps parameter affect the total number of reads that are tallied per "feature," which is outlined here: http://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/ inst/doc/summarizeOverlaps.pdf HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.3 years ago
Zentrum für Molekularbiologie, Universi…
Hi Jose not sure what is going on. The best might be to take one of these reads (find the read ID in IGV, then grep for it in the SAM file) and first check whether it might be multiply aligned ("NH" field). Then, make a mini SAM file with only this read and feed it to hseq-count, with the "--samout" option, to see what the script does with it. Simon
ADD COMMENT

Login before adding your answer.

Traffic: 839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6