Question: issues in using proBAMr PrepareAnnotationGENCODE: generated objects have transcript ID in protein name and viceversa
0
gravatar for laura.fancello
14 days ago by
laura.fancello0 wrote:

I am trying to generate Gencode annotation R objects using the PrepareAnnotationGENCODE() function from proBAMr (proBAMr_1.18.0). I downloaded gencode.v32.annotation.gtf, gencode.v32.pc_transcripts.fa, gencode.v32.pc_translations.fa files and used the following code:

annotation_path_Gencode=paste0(path, "GencodeAnnotationPath/")
gtfFile=paste0(annotation_path_Gencode, "gencode.v32.annotation.gtf")
CDSfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_transcripts.fa")
pepfasta=paste0(annotation_path_Gencode, "gencode.v32.pc_translations.fa")
PrepareAnnotationGENCODE(gtfFile, CDSfasta, pepfasta, annotation_path=annotation_path_Gencode, dbsnp = NULL, splice_matrix = FALSE, COSMIC = FALSE)

First issue: I get a warning message

Warning messages:
   1: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
   closing unused connection 4 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/GencodeAnnotationPath/gencode.v32.annotation.gtf)
 2: In .Internal(strsplit(x, as.character(split), fixed, perl, useBytes)) :
   closing unused connection 3 (C:/Users/LF260934/Documents/29healthyTissues/MSMSprotP013163/gencode.v32.annotation.gtf)
 3: In .get_cds_IDX(mcols0$type, mcols0$phase) :
   The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

Second issue. I loaded the procodingseq and proteinseq .RData objects generated in the previous step and I noticed that procodingseq reports ENST identifiers in both txname and proname fields and proteinseq reports ENSP identifiers in both txname and proname fields.

Third issue. I tried to convert a PSM table to SAM file using:

SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)

and I got a SAM file where all peptides are unmapped.

I also looked at the RData objects provided with the proBAMr package for Gencode annotation

load(system.file("extdata/GENCODE", "exon_anno.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "proseq.RData", package="proBAMr"))
load(system.file("extdata/GENCODE", "procodingseq.RData", package="proBAMr"))

but the corresponding proteinseq and procodingseq objects report ENST identifiers in both txname and proname fields and I cannot generate the SAM file at all with them:

SAM <- PSMtab2SAM(passed, XScolumn='expect', exon, proteinseq, procodingseq)

paste("XG:Z:", pep_g, sep=""): object 'pep_g' not found

Can you please help me with that?

ADD COMMENTlink written 14 days ago by laura.fancello0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 132 users visited in the last hour