issues in using proBAMr PrepareAnnotationGENCODE: generated objects have transcript ID in protein name and viceversa
0
0
Entering edit mode
@laurafancello-22293
Last seen 5.1 years ago

I am trying to generate Gencode annotation R objects using the PrepareAnnotationGENCODE() function from proBAMr (proBAMr_1.18.0). I downloaded gencode.v32.annotation.gtf, gencode.v32.pc_transcripts.fa, gencode.v32.pc_translations.fa files and used the following code:

annotation_path_Gencode=paste0(path, "GencodeAnnotationPath/")
gtfFile=paste0(annotation_path_Gencode, "gencode.v32.annotation.gtf")
CDSfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_transcripts.fa")
pepfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_translations.fa")
PrepareAnnotationGENCODE(gtfFile, CDSfasta, pepfasta, annotation_path=annotation_path_Gencode, dbsnp = NULL, splice_matrix = FALSE, COSMIC = FALSE)

First issue: I get a warning message

Warning messages:
   1: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
   closing unused connection 4 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/GencodeAnnotationPath/gencode.v32.annotation.gtf)
 2: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
   closing unused connection 3 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/gencode.v32.annotation.gtf)
 3: In .get_cds_IDX(mcols0$type, mcols0$phase) :
   The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

Second issue. I loaded the procodingseq and proteinseq .RData objects generated in the previous step and I noticed that procodingseq reports ENST identifiers in both txname and proname fields and proteinseq reports ENSP identifiers in both txname and proname fields.

Third issue. I tried to convert a PSM table to SAM file using:

SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)

and I got a SAM file where all peptides are unmapped.

I also looked at the RData objects provided with the proBAMr package for Gencode annotation

load(system.file("extdata/GENCODE", "exon_anno.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "proseq.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "procodingseq.RData", package="proBAMr"))

but the corresponding proteinseq and procodingseq objects report ENST identifiers in both txname and proname fields and I cannot generate the SAM file at all with them:

SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)

paste("XG:Z:", pep_g, sep=""): object 'pep_g' not found

Can you please help me with that?

proBAMr PrepareAnnotationGENCODE Gencode annotation • 736 views
ADD COMMENT

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6