Question: some problems of easyRNASeq� : about the gtf files
0
gravatar for Guest User
6.7 years ago by
Guest User12k
Guest User12k wrote:
I want to use easyRNASeq to get exon counts. But I found a strange thing: I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html). The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute. When I run easyRNASeq, I used the two gtf files to check the result. I have got different results for SLC25A13 exons -- output of sessionInfo(): Firstly,I got my bam file from tophat. When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result: "\"ENSG00000004864\"_1" 2 "\"ENSG00000004864\"_2" 4 "\"ENSG00000004864\"_3" 16 "\"ENSG00000004864\"_4" 3 "\"ENSG00000004864\"_5" 7 "\"ENSG00000004864\"_6" 8 "\"ENSG00000004864\"_7" 5 "\"ENSG00000004864\"_8" 4 "\"ENSG00000004864\"_9" 4 "\"ENSG00000004864\"_10" 1 "\"ENSG00000004864\"_11" 6 "\"ENSG00000004864\"_12" 4 "\"ENSG00000004864\"_13" 4 "\"ENSG00000004864\"_14" 6 "\"ENSG00000004864\"_15" 8 "\"ENSG00000004864\"_16" 5 "\"ENSG00000004864\"_17" 3 "\"ENSG00000004864\"_18" 25 But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV): "\"ENSG00000004864\"_18" 25 "\"ENSG00000004864\"_17" 13 "\"ENSG00000004864\"_2" 11 "\"ENSG00000004864\"_16" 3 "\"ENSG00000004864\"_1" 8 "\"ENSG00000004864\"_15" 5 "\"ENSG00000004864\"_14" 8 "\"ENSG00000004864\"_6" 6 "\"ENSG00000004864\"_13" 6 "\"ENSG00000004864\"_5" 0 "\"ENSG00000004864\"_3" 4 "\"ENSG00000004864\"_4" 4 "\"ENSG00000004864\"_12" 4 "\"ENSG00000004864\"_11" 4 "\"ENSG00000004864\"_10" 6 "\"ENSG00000004864\"_9" 1 "\"ENSG00000004864\"_8" 4 "\"ENSG00000004864\"_7" 4 I am so confused about the different result. Here are my main program using easyRNASeq: count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(), filenames="accepted_hits.sorted.bam", organism="Hsapiens", chr.sizes="auto", annotationMethod="gtf", annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl/ussd- ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensembl/GRCh 37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.gtf", format="bam", gapped=TRUE, count="exon") When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts. Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning. How can I fix it? -- Sent via the guest posting facility at bioconductor.org.
annotation easyrnaseq • 764 views
ADD COMMENTlink modified 6.7 years ago by delhomme@embl.de1.2k • written 6.7 years ago by Guest User12k
Answer: some problems of easyRNASeq� : about the gtf files
0
gravatar for delhomme@embl.de
6.7 years ago by
delhomme@embl.de1.2k wrote:
Hej Fuyan! On 19 Mar 2013, at 05:27, Hu Fuyan [guest] wrote: > > I want to use easyRNASeq to get exon counts. But I found a strange thing: > > I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz > ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html). > > The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute. > When I run easyRNASeq, I used the two gtf files to check the result. > > I have got different results for SLC25A13 exons > > This sounds strange, as I don't remember expecting any ordering. Thanks for the example files and the report, I'll check that. > -- output of sessionInfo(): Can you paste the output? It's not in the file you sent off list either. > > Firstly,I got my bam file from tophat. > > When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result: > > > > "\"ENSG00000004864\"_1" 2 > > > > "\"ENSG00000004864\"_2" 4 > > > > "\"ENSG00000004864\"_3" 16 > > > > "\"ENSG00000004864\"_4" 3 > > > > "\"ENSG00000004864\"_5" 7 > > > > "\"ENSG00000004864\"_6" 8 > > > > "\"ENSG00000004864\"_7" 5 > > > > "\"ENSG00000004864\"_8" 4 > > > > "\"ENSG00000004864\"_9" 4 > > > > "\"ENSG00000004864\"_10" 1 > > > > "\"ENSG00000004864\"_11" 6 > > > > "\"ENSG00000004864\"_12" 4 > > > > "\"ENSG00000004864\"_13" 4 > > > > "\"ENSG00000004864\"_14" 6 > > > > "\"ENSG00000004864\"_15" 8 > > > > "\"ENSG00000004864\"_16" 5 > > > > "\"ENSG00000004864\"_17" 3 > > > > "\"ENSG00000004864\"_18" 25 > > > > But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV): > > > "\"ENSG00000004864\"_18" 25 > > "\"ENSG00000004864\"_17" 13 > > "\"ENSG00000004864\"_2" 11 > > "\"ENSG00000004864\"_16" 3 > > "\"ENSG00000004864\"_1" 8 > > "\"ENSG00000004864\"_15" 5 > > "\"ENSG00000004864\"_14" 8 > > "\"ENSG00000004864\"_6" 6 > > "\"ENSG00000004864\"_13" 6 > > "\"ENSG00000004864\"_5" 0 > > "\"ENSG00000004864\"_3" 4 > > "\"ENSG00000004864\"_4" 4 > > "\"ENSG00000004864\"_12" 4 > > "\"ENSG00000004864\"_11" 4 > > "\"ENSG00000004864\"_10" 6 > > "\"ENSG00000004864\"_9" 1 > > "\"ENSG00000004864\"_8" 4 > > "\"ENSG00000004864\"_7" 4 > > > > I am so confused about the different result. > > Here are my main program using easyRNASeq: > > > > > count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(), > filenames="accepted_hits.sorted.bam", > organism="Hsapiens", > chr.sizes="auto", > annotationMethod="gtf", > annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl /ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensemb l/GRCh37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.g tf", > format="bam", > gapped=TRUE, > count="exon") > > > > When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts. > > > > Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning. > How can I fix it? > You would need to change the chromosome names, i.e. prepend the "chr" prefix as to follow the UCSC convention (e.g. 7 to chr7) and convert the mitochondrion name to chrM in both your alignment and your annotation file (BAM and GTF). Anyway, this is just a warning to draw your attention on the essential point that both these files need to have a common chromosome naming. I'm handling this differently in the next release of easyRNASeq and do not enforce the UCSC conventions anymore. So in your current case, you can ignore that warning. Cheers, Nico > > -- > Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink written 6.7 years ago by delhomme@embl.de1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 416 users visited in the last hour