Hi, I am using proBAMr package and I found that the SAM file generated by the PSMtab2SAM() function contains some peptides which could not be mapped on the genome ( the field reporting bitwise FLAG has value 4, which means "unmapped").
I manually verified and the peptide is contained in the corresponding protein sequence in the
proteinseq object containing annotation from Ensembl.
Do you have any idea of reasons why this could happen?
Any hint/idea is warmly welcome!
Here is my code:
library(proBAMr) options(stringsAsFactors=FALSE) load(paste0(annotation_path_Ensembl, "exon_anno.RData")) load(paste0(annotation_path_Ensembl, "proteinseq.RData")) load(paste0(annotation_path_Ensembl, "procodingseq.RData")) # Read Mascot pepXML file pepXMLTab <- pepXMLTab::pepXML2tab(pepxml=paste0(path, "GalaxyConverted_mascotXML_to_pepXML_F134317.pepXML")) # Filter PSMs passed <- PSMfilter(pepXMLTab, pepFDR=0.01, scorecolumn='expect', hitrank=1, minpeplen=6, decoyprefix='###REV###') #Add "spectrumNativeID" field, Indeed, the PSMtab2SAM() function processed the spectrumNativeID field to generate proBAM QNAME field. The "spectrumNativeID" field is not present in Mascot result file: probably function was not completely adapted to Mascot? passed$spectrumNativeID <- paste0("start_scan=", passed$start_scan, " end_scan=", passed$end_scan, " scan=", passed$start_scan) #Remove hits to contaminants (field with protein ID contains "2::", indicating hit to second database, the contaminants database) and decoys (field with protein ID contains "###REV###") passed_NoContam <- passed[-(grep("2::", passed$protein)), ] passed_NoContamNoDecoy <- passed_NoContam[-(grep("###REV###", passed_NoContam$protein)), ] #Convert to SAM format SAM <- PSMtab2SAM(passed_NoContamNoDecoy, XScolumn='expect', exon, proteinseq, procodingseq)