I am trying to generate Gencode annotation R objects using the PrepareAnnotationGENCODE() function from proBAMr (proBAMr_1.18.0).
I downloaded gencode.v32.annotation.gtf
, gencode.v32.pc_transcripts.fa
, gencode.v32.pc_translations.fa
files and used the following code:
annotation_path_Gencode=paste0(path, "GencodeAnnotationPath/")
gtfFile=paste0(annotation_path_Gencode, "gencode.v32.annotation.gtf")
CDSfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_transcripts.fa")
pepfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_translations.fa")
PrepareAnnotationGENCODE(gtfFile, CDSfasta, pepfasta, annotation_path=annotation_path_Gencode, dbsnp = NULL, splice_matrix = FALSE, COSMIC = FALSE)
First issue: I get a warning message
Warning messages:
1: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
closing unused connection 4 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/GencodeAnnotationPath/gencode.v32.annotation.gtf)
2: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
closing unused connection 3 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/gencode.v32.annotation.gtf)
3: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
Second issue. I loaded the procodingseq
and proteinseq
.RData objects generated in the previous step and I noticed that procodingseq reports ENST identifiers in both txname and proname fields and proteinseq reports ENSP identifiers in both txname and proname fields.
Third issue. I tried to convert a PSM table to SAM file using:
SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)
and I got a SAM file where all peptides are unmapped.
I also looked at the RData objects provided with the proBAMr package for Gencode annotation
load(system.file("extdata/GENCODE", "exon_anno.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "proseq.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "procodingseq.RData", package="proBAMr"))
but the corresponding proteinseq and procodingseq objects report ENST identifiers in both txname and proname fields and I cannot generate the SAM file at all with them:
SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)
paste("XG:Z:", pep_g, sep=""): object 'pep_g' not found
Can you please help me with that?