Question

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.

0

Entering edit mode

KMS ▴ 20

@de2afa3c

Last seen 3.3 years ago

Croatia

Dear All, I am using GTF file to use as input of FeatureCount along with .bam file, but I am getting error. I read many post on forum related to this query but could not get proper solution. I request if anyone has information about how to solve this error, please reply. Thanks

FeatureCounts • 2.8k views

ADD COMMENT • link 3.5 years ago KMS ▴ 20

0

Entering edit mode

I am sure you will never get an answer to your question because you fail to provide any details! Please read the posting guide.

I (and James) have answered a previous question of you, and therefore I assume you are talking again about the issue your raised in your previous thread: NCBI Locus ID to gene ID conversion

As said over there, if there is no gene identifier attribute in the 9th column of "the" provided GTF, or there is no id-to-gene mapping available for your organism of interest, there is no Bioconductor package available that will allow you to do this!

Reason for this is that Bioconductor (annotation) packages do not infer gene annotations etc. themselves, but rather parse and make accessible such information that is provided by 3rd parties (such as NCBI, Ensembl or TAIR, or Affymetrix, Agilent, ...).

ADD REPLY • link 3.5 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Dear Guido, I am sorry if I am unable to make understand my question to you, although in the previous question I have given a full link of assembly, species, and what exactly I wanted. Although I have seen a multiple already posted questions on similar topics over which multiple replies and comments by everyone else are there and in the end, there was no solution. That is why I am making another clear question. Now I request you to please have a look at my problem below, and if you have any suggestions I will be greatly thankful to you.

I am working on RNASeq data of Streptomyces coelicolor strains and used bowtie2 for alignment and want to get readcount through subread-featurecount which I can use in DESeq2 for differential expression. but the reference genome which is available in the assembly section (11 assemblies as mentioned) I used its fasta to build genome index against which I used bowtie2, but when I am using its gtf files to get read count through FeatureCount tool but it through error, for which I am asking this question.
I also used another reference genome assemble of S. coelicolor but its gff containing Locus Tag *e.g. (FQ762_RS31685 ) so I got results and when used DESeq2 on these read counts it produced DEG, FPKM, etc but id of all genes were Locus Tag, now I want to convert these locus Tag to gene id for gene set enrichment analysis, but I could not find how to do that correctly in each file where random locus tags are there.
another question I would also like to add here is that string-tie produced out.gtf file and gene abundance file. but I used -B argument in command and thus it produced a few files for ballgown and one was e_data, which contains read counts as rcounts, I ask you please tell me if I can use this rcounts for DESEq2 directly or I have to use rcount only in ballgown package (which currently I don't know how to perform).

My preference is to use RsubreadFeatureCounts package, but GTF is having a problem here.

I hope this is very clear and I am not making it "mysterious" at this moment.

Thanks for your critical comment and suggestion

ADD REPLY • link 3.5 years ago KMS ▴ 20