Start and End positions in the GTF/GFF3 files
1
0
Entering edit mode
@delasa-aghamirzaie-5973
Last seen 9.1 years ago
United States
Hi Bioconductors, I have a question regarding to GTF/GFF3's start and end positions in the genome and I dont know if here is the right place to ask. I would be appreciated if anyone can answer my question. Does anybody know the numbers regarding to start position and end positions in GTF/GFF3 files are based on which version of reference genome? I see different versions: hard_masked.fa or soft-masked.fa, cds.fa, cds_primaryTranscriptOnly.fa. I have a GTF file and I want to find the corresponding sequences in the fasta references, but I dont know which file to use. I used the hardmasked one in which we used to map the reads to genome, but corresponding positions in the GTF file does not give me correct sequence for the each gene. Sincerely Yours, Delasa Aghamirzaie Genetics, Bioinformatics, and Computational Biology (GBCB) PhD Student Virginia Tech Blacksburg, Virginia [[alternative HTML version deleted]]
• 2.3k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 17 hours ago
Seattle, WA, United States
Hi Delasa, If you are lucky the exact version of the reference genome might be specified in the header of the file (i.e. in the first lines of comments -- those lines should start with ## in a GFF3 file). But I don't think the specs for GTF/GFF3 require this information to be present. That means there is no way to know by just looking at the content of the file. This is information one generally gathers from the provider of the file. For example on the NCBI or UCSC FTP servers, things are organized in a way that makes it clear which GTF/GFF3 files go with which reference genomes. If this is not clear, then it's a problem with how the provider is distributing those files and the best way to clarify is to contact them. H. On 07/03/2013 04:56 PM, Delasa Aghamirzaie wrote: > Hi Bioconductors, > I have a question regarding to GTF/GFF3's start and end positions in the > genome and I dont know if here is the right place to ask. I would be > appreciated if anyone can answer my question. > > Does anybody know the numbers regarding to start position and end positions > in GTF/GFF3 files are based on which version of reference genome? I see > different versions: hard_masked.fa or soft-masked.fa, cds.fa, > cds_primaryTranscriptOnly.fa. I have a GTF file and I want to find the > corresponding sequences in the fasta references, but I dont know which file > to use. I used the hardmasked one in which we used to map the reads to > genome, but corresponding positions in the GTF file does not give me > correct sequence for the each gene. > > > Sincerely Yours, > Delasa Aghamirzaie > Genetics, Bioinformatics, and Computational Biology (GBCB) PhD Student > Virginia Tech > Blacksburg, Virginia > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6