Entering edit mode
laura.fancello
•
0
@laurafancello-22293
Last seen 5.0 years ago
Hi, I am using proBAMr package and I found that the SAM file generated by the PSMtab2SAM() function contains some peptides which could not be mapped on the genome ( the field reporting bitwise FLAG has value 4, which means "unmapped").
I manually verified and the peptide is contained in the corresponding protein sequence in the proteinseq
object containing annotation from Ensembl.
Do you have any idea of reasons why this could happen?
Any hint/idea is warmly welcome!
Here is my code:
library(proBAMr)
options(stringsAsFactors=FALSE)
load(paste0(annotation_path_Ensembl, "exon_anno.RData"))
load(paste0(annotation_path_Ensembl, "proteinseq.RData"))
load(paste0(annotation_path_Ensembl, "procodingseq.RData"))
# Read Mascot pepXML file
pepXMLTab <- pepXMLTab::pepXML2tab(pepxml=paste0(path, "GalaxyConverted_mascotXML_to_pepXML_F134317.pepXML"))
# Filter PSMs
passed <- PSMfilter(pepXMLTab, pepFDR=0.01, scorecolumn='expect', hitrank=1, minpeplen=6, decoyprefix='###REV###')
#Add "spectrumNativeID" field, Indeed, the PSMtab2SAM() function processed the spectrumNativeID field to generate proBAM QNAME
field. The "spectrumNativeID" field is not present in Mascot result file: probably function was not completely adapted to Mascot?
passed$spectrumNativeID <- paste0("start_scan=", passed$start_scan, " end_scan=", passed$end_scan, " scan=", passed$start_scan)
#Remove hits to contaminants (field with protein ID contains "2::", indicating hit to second database, the contaminants database) and decoys (field with protein ID contains "###REV###")
passed_NoContam <- passed[-(grep("2::", passed$protein)), ]
passed_NoContamNoDecoy <- passed_NoContam[-(grep("###REV###", passed_NoContam$protein)), ]
#Convert to SAM format
SAM <- PSMtab2SAM(passed_NoContamNoDecoy, XScolumn='expect', exon, proteinseq, procodingseq)