some problems of easyRNASeq� : about the gtf files
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
I want to use easyRNASeq to get exon counts. But I found a strange thing: I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html). The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute. When I run easyRNASeq, I used the two gtf files to check the result. I have got different results for SLC25A13 exons -- output of sessionInfo(): Firstly,I got my bam file from tophat. When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result: "\"ENSG00000004864\"_1" 2 "\"ENSG00000004864\"_2" 4 "\"ENSG00000004864\"_3" 16 "\"ENSG00000004864\"_4" 3 "\"ENSG00000004864\"_5" 7 "\"ENSG00000004864\"_6" 8 "\"ENSG00000004864\"_7" 5 "\"ENSG00000004864\"_8" 4 "\"ENSG00000004864\"_9" 4 "\"ENSG00000004864\"_10" 1 "\"ENSG00000004864\"_11" 6 "\"ENSG00000004864\"_12" 4 "\"ENSG00000004864\"_13" 4 "\"ENSG00000004864\"_14" 6 "\"ENSG00000004864\"_15" 8 "\"ENSG00000004864\"_16" 5 "\"ENSG00000004864\"_17" 3 "\"ENSG00000004864\"_18" 25 But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV): "\"ENSG00000004864\"_18" 25 "\"ENSG00000004864\"_17" 13 "\"ENSG00000004864\"_2" 11 "\"ENSG00000004864\"_16" 3 "\"ENSG00000004864\"_1" 8 "\"ENSG00000004864\"_15" 5 "\"ENSG00000004864\"_14" 8 "\"ENSG00000004864\"_6" 6 "\"ENSG00000004864\"_13" 6 "\"ENSG00000004864\"_5" 0 "\"ENSG00000004864\"_3" 4 "\"ENSG00000004864\"_4" 4 "\"ENSG00000004864\"_12" 4 "\"ENSG00000004864\"_11" 4 "\"ENSG00000004864\"_10" 6 "\"ENSG00000004864\"_9" 1 "\"ENSG00000004864\"_8" 4 "\"ENSG00000004864\"_7" 4 I am so confused about the different result. Here are my main program using easyRNASeq: count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(), filenames="accepted_hits.sorted.bam", organism="Hsapiens", chr.sizes="auto", annotationMethod="gtf", annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl/ussd- ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensembl/GRCh 37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.gtf", format="bam", gapped=TRUE, count="exon") When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts. Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning. How can I fix it? -- Sent via the guest posting facility at bioconductor.org.
Annotation easyRNASeq Annotation easyRNASeq • 1.6k views
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 10.2 years ago
Hej Fuyan! On 19 Mar 2013, at 05:27, Hu Fuyan [guest] wrote: > > I want to use easyRNASeq to get exon counts. But I found a strange thing: > > I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz > ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html). > > The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute. > When I run easyRNASeq, I used the two gtf files to check the result. > > I have got different results for SLC25A13 exons > > This sounds strange, as I don't remember expecting any ordering. Thanks for the example files and the report, I'll check that. > -- output of sessionInfo(): Can you paste the output? It's not in the file you sent off list either. > > Firstly,I got my bam file from tophat. > > When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result: > > > > "\"ENSG00000004864\"_1" 2 > > > > "\"ENSG00000004864\"_2" 4 > > > > "\"ENSG00000004864\"_3" 16 > > > > "\"ENSG00000004864\"_4" 3 > > > > "\"ENSG00000004864\"_5" 7 > > > > "\"ENSG00000004864\"_6" 8 > > > > "\"ENSG00000004864\"_7" 5 > > > > "\"ENSG00000004864\"_8" 4 > > > > "\"ENSG00000004864\"_9" 4 > > > > "\"ENSG00000004864\"_10" 1 > > > > "\"ENSG00000004864\"_11" 6 > > > > "\"ENSG00000004864\"_12" 4 > > > > "\"ENSG00000004864\"_13" 4 > > > > "\"ENSG00000004864\"_14" 6 > > > > "\"ENSG00000004864\"_15" 8 > > > > "\"ENSG00000004864\"_16" 5 > > > > "\"ENSG00000004864\"_17" 3 > > > > "\"ENSG00000004864\"_18" 25 > > > > But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV): > > > "\"ENSG00000004864\"_18" 25 > > "\"ENSG00000004864\"_17" 13 > > "\"ENSG00000004864\"_2" 11 > > "\"ENSG00000004864\"_16" 3 > > "\"ENSG00000004864\"_1" 8 > > "\"ENSG00000004864\"_15" 5 > > "\"ENSG00000004864\"_14" 8 > > "\"ENSG00000004864\"_6" 6 > > "\"ENSG00000004864\"_13" 6 > > "\"ENSG00000004864\"_5" 0 > > "\"ENSG00000004864\"_3" 4 > > "\"ENSG00000004864\"_4" 4 > > "\"ENSG00000004864\"_12" 4 > > "\"ENSG00000004864\"_11" 4 > > "\"ENSG00000004864\"_10" 6 > > "\"ENSG00000004864\"_9" 1 > > "\"ENSG00000004864\"_8" 4 > > "\"ENSG00000004864\"_7" 4 > > > > I am so confused about the different result. > > Here are my main program using easyRNASeq: > > > > > count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(), > filenames="accepted_hits.sorted.bam", > organism="Hsapiens", > chr.sizes="auto", > annotationMethod="gtf", > annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl /ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensemb l/GRCh37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.g tf", > format="bam", > gapped=TRUE, > count="exon") > > > > When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts. > > > > Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning. > How can I fix it? > You would need to change the chromosome names, i.e. prepend the "chr" prefix as to follow the UCSC convention (e.g. 7 to chr7) and convert the mitochondrion name to chrM in both your alignment and your annotation file (BAM and GTF). Anyway, this is just a warning to draw your attention on the essential point that both these files need to have a common chromosome naming. I'm handling this differently in the next release of easyRNASeq and do not enforce the UCSC conventions anymore. So in your current case, you can ignore that warning. Cheers, Nico > > -- > Sent via the guest posting facility at bioconductor.org.
ADD COMMENT

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6