Dear Bioconductor list:
I have aligned some paired end reads to the human genome assembly
CRCh38.p7 with R subread. I am now trying to count the reads with feature
counts. It is taking over an hour on my mac, and as I recall it should
run faster. This is teh first time that I have used featurecounts with a
GFF file as distinct from the built in annotation, and I am wondering if
that is the problem.
Here is my R script:
My bam file is 25 Gb in size.
Here are the first few lines of my GFF file (obtained from the NCBI web site,
after the annotation:
J01415.2 Genbank region 1 16569 . + . ID=id549;Dbxref=taxon:9606;Is_circular=true;Name=MT;country=United Kingdom: Great Britain;gbkey=Src;genome=mitochondrion;isolation-source=caucasian;mol_type=genomic DNA;note=this is the rCRS;tissue-type=placenta
J01415.2 Genbank D_loop 16024 17145 . - . ID=id550;gbkey=D-loop
J01415.2 Genbank gene 577 647 . + . ID=gene0;Dbxref=GeneID:4558;Name=TRNF;gbkey=Gene;gene=TRNF;gene_biotype=tRNA;gene_synonym=MTTF
Here is are excerpts from my output file:
package ‘Rsubread’ was built under R version 3.3.3
|| Load annotation file GCA_000001405.22_GRCh38.p7_genomic.gff ... ||
Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
The attributes included in your GTF annotation are 'ID=gene0;Dbxref=GeneID:4558;Name=TRNF;gbkey=Gene;gene=TRNF;gene_biotype=tRNA;gene_synonym=MTTF'
|| Features : 37 ||
|| Meta-features : 1 ||
|| Chromosomes/contigs : 1 ||
|| Process BAM file b1.res.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| Read re-ordering is performed. ||
1. Is it just talking long or is it unable to read the gff file?
2. I originally put in a GTF.attrType="gene_id" option but I got pretty much teh same warning message, and also ran for a long time. Since I figured that I did not have to sum over exons Icould omit it.
3. I used the GFF filecorrsponding to the genome patch I used. If the GFF file is
not in the wirte format is is ok to use teh bulit in hg38 assembly?
the ncbi site says that the patches do not chnage the numbering,
Thanks and best wishes,
Richard A. Friedman, PhD
Columbia University Medical Center