Entering edit mode
Akula, Nirmala NIH/NIMH [C]
▴
190
@akula-nirmala-nihnimh-c-5007
Last seen 5.0 years ago
Thank you Simon. I tried Ensemble GTF file with HTSeq and got ~50,000
genes for testing by DESeq. We filtered the genes with low counts and
the resulting file had ~23,000 genes. The problem now is the QQ-plot
is way above the expected. Please see the attachment.
Analysis pipeline: Tophat-HTSeq-DESeq
Any suggestions will be greatly helpful.
Thank you very much.
Regards,
Nirmala
-----Original Message-----
From: Simon Anders [mailto:anders@embl.de]
Sent: Thursday, May 31, 2012 2:31 AM
To: bioconductor at r-project.org
Subject: Re: [BioC] easyRNAseq question
Dear Nirmala
On 2012-05-27 02:25, Akula, Nirmala (NIH/NIMH) [C] wrote:
> I used HTSeq (similar to your geneModel method) which takes the
counts
> of disjoint exons for the genes. The problem with this method is
that
> too many reads are assigned to ambiguous category and sometimes
total
> number of reads that fall on disjoint exons are too few to get a
valid
> DESeq result. Using RefSeq genes the total number of genes counted
by
> HTSeq on my data is ~14000 whereas using the bestExon method we get
> ~22000. Do you observe similar counts with your data?
It does not quite make sense that counting only for the best exons
gives you more counts than counting for all exons.
Could it be that the issue with UCSC GTF files described here is the
source of your problems:
https://stat.ethz.ch/pipermail/bioconductor/2012-April/044717.html
Simon
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor