How to extract non-overlapping genes from a a bed file?
0
0
Entering edit mode
@birdguysaikat-11323
Last seen 7.7 years ago

Hi,

I am entirely new to bioinformatics, therefore I sincerely apologize for asking a very naive question. I am interested in counting the number of genes per chromosome in the human genome. For this purpose I have downloaded the latest release of the 'gff' file from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_000001405.34_GRCh38.p8/ and have also sorted it using IGVtools from the Integrative Genomics Viewer. IGV also allows for exporting the features as 'bed' file, although there are a single base differences between the start positions of a gene in the 'gff' files and the 'bed' files generated in this way. While scanning through the 'bed' file I observed a lot of repetitions and overlaps. Now my questions are:

1) Is there any available tool for extracting non-overlapping genes from the bed file?

2) Is there any way to automate the selection of only one of the several overlapping genes?

3) Is the 'bed' file converted using the 'Export Features' tool in IGV reliable enough for further processing? What are the preferred alternatives? 

I am sure such a trivial topic has already been discussed scores of times in your forum. I would appreciate if you could direct me to some such discussions. I sincerely thank you and apologize once again.

gene density bed files gff • 1.3k views
ADD COMMENT
0
Entering edit mode

Your questions are not very interpretable.

1) Is there any available tool for extracting non-overlapping genes from the bed file?

What do you mean by 'non-overlapping genes'? There are any number of genes that overlap; they may be on different strands, or even on the same strand. Do you really want to remove genes for this arbitrary reason? Or are you confusing transcripts and genes? Do you instead want a single gene that represents all possible transcripts?

2) Is there any way to automate the selection of only one of the several overlapping genes?

Probably, but it depends on what you are after.

3) Is the 'bed' file converted using the 'Export Features' tool in IGV reliable enough for further processing? What are the preferred alternatives? 

This is a Bioconductor support site, so questions about IGV aren't really on topic. But it's not really likely that you have to actually download a GFF file from NCBI to do what you want, as there are lots of resources in Bioconductor for genetic locations of human genes.

 

 

ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6