I would like to build my Proteins database using the mzid files filtered by MsnId. Can you somebody please help me. Is it possible to add identifications from as a data.frame
Thanks
Viswanathan Raghuram
I would like to build my Proteins database using the mzid files filtered by MsnId. Can you somebody please help me. Is it possible to add identifications from as a data.frame
Thanks
Viswanathan Raghuram
My suggestion would be to do the following (not tested though): using MSnID, calculate the filters to obtain your desired FDR, record the values and then apply them directly on the Proteins object. Something like
library("Pbase") data(p) ## example data delta <- pcols(p)[, "experimentalMassToCharge"] - pcols(p)[, "calculatedMassToCharge"] lengths(pranges(p)) ## assuming MSnID suggested a delta of 0.35 sel <- abs(delta) < 0.35 pranges(p) <- pranges(p)[sel] lengths(pranges(p))
Note that you will need to install the latest version (0.11.3) from github:
biocLite("ComputationalProteomicsUnit/Pbase")
Unable to install the latest version
----------------------------------------------------------
> library("devtools")
> install_github("ComputationalProteomicsUnit/Pbase")
Downloading GitHub repo ComputationalProteomicsUnit/Pbase@master
from URL https://api.github.com/repos/ComputationalProteomicsUnit/Pbase/zipball/master
Installing Pbase
Skipping 1 package ahead of CRAN: ggplot2
"C:/PROGRA~1/R/R-devel/bin/x64/R" --no-site-file --no-environ --no-save --no-restore CMD \
INSTALL \
"C:/Users/Viswanathan/AppData/Local/Temp/RtmpSSoTcF/devtools39ea43b975fa1/ComputationalProteomicsUnit-Pbase-9e90914" \
--library="C:/Users/Viswanathan/Documents/R/win-library/3.3" --install-tests
* installing *source* package 'Pbase' ...
** R
Error in parse(outFile) :
C:/Users/Viswanathan/AppData/Local/Temp/RtmpSSoTcF/devtools39ea43b975fa1/ComputationalProteomicsUnit-Pbase-9e90914/R/Proteins:201:9: unexpected symbol
200: if (!identical(names(object@pranges, names(values)))
201: stop
^
ERROR: unable to collate and parse R files for package 'Pbase'
* removing 'C:/Users/Viswanathan/Documents/R/win-library/3.3/Pbase'
* restoring previous 'C:/Users/Viswanathan/Documents/R/win-library/3.3/Pbase'
Error: Command failed (1)
>
> session_info()
Session info ------------------------------------------------------------------------------------
setting value
version R Under development (unstable) (2016-02-25 r70222)
system x86_64, mingw32
ui RStudio (0.99.484)
language (EN)
collate English_United States.1252
tz America/New_York
date 2016-03-01
Packages ----------------------------------------------------------------------------------------
package * version date source
Biobase 2.31.3 2015-12-22 Bioconductor
BiocGenerics 0.17.3 2016-01-29 Bioconductor
BiocInstaller 1.21.3 2016-01-21 Bioconductor
codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
curl 0.9.6 2016-02-17 CRAN (R 3.3.0)
devtools * 1.10.0 2016-01-23 CRAN (R 3.3.0)
digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
git2r 0.13.1 2015-12-10 CRAN (R 3.3.0)
httr 1.1.0 2016-01-28 CRAN (R 3.3.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
mzR 2.5.3 2016-02-25 Bioconductor
ProtGenerics 1.3.3 2015-10-20 Bioconductor
R6 2.1.2 2016-01-26 CRAN (R 3.3.0)
Rcpp 0.12.3 2016-01-10 CRAN (R 3.3.0)
withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
Unable to subset pranges(p)
> data(p)
> length(pranges(p))
[1] 9
> lengths(pranges(p))
A4UGR9 A6H8Y1 O43707 O75369 P00558 P02545 P04075 P04075-2 P60709
36 23 6 13 5 12 21 20 1
> delta <- pcols(p)[, "experimentalMassToCharge"] -
+ pcols(p)[, "calculatedMassToCharge"]
> sel <- abs(delta) < 0.35
> sel
LogicalList of length 9
[["A4UGR9"]] TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE ... TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
[["A6H8Y1"]] TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE ... TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
[["O43707"]] TRUE FALSE TRUE TRUE TRUE TRUE
[["O75369"]] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[["P00558"]] FALSE FALSE TRUE FALSE FALSE
[["P02545"]] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[["P04075"]] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ... TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["P04075-2"]] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ... TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["P60709"]] TRUE
> pranges(p) <- pranges(p)[sel]
Error in names(object@pranges, names(values)) :
2 arguments passed to 'names' which requires 1
> lengths(pranges(p))
A4UGR9 A6H8Y1 O43707 O75369 P00558 P02545 P04075 P04075-2 P60709
36 23 6 13 5 12 21 20 1
> session_info()
Session info ---------------------------------------------------------------------------------------------------
setting value
version R Under development (unstable) (2016-02-25 r70222)
system x86_64, mingw32
ui RStudio (0.99.484)
language (EN)
collate English_United States.1252
tz America/New_York
date 2016-03-05
Packages -------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.0)
affy 1.49.0 2015-10-14 Bioconductor
affyio 1.41.0 2015-10-14 Bioconductor
AnnotationDbi 1.33.7 2016-01-29 Bioconductor
AnnotationHub 2.3.14 2016-03-04 Bioconductor
Biobase 2.31.3 2015-12-22 Bioconductor
BiocGenerics * 0.17.3 2016-01-29 Bioconductor
BiocInstaller 1.21.3 2016-01-21 Bioconductor
BiocParallel 1.5.19 2016-03-03 Bioconductor
biomaRt 2.27.2 2015-11-24 Bioconductor
Biostrings 2.39.12 2016-02-21 Bioconductor
biovizBase 1.19.4 2016-02-22 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
BSgenome 1.39.4 2016-02-21 Bioconductor
chron 2.3-47 2015-06-24 CRAN (R 3.3.0)
cleaver 1.9.0 2015-10-14 Bioconductor
cluster 2.0.3 2015-07-21 CRAN (R 3.3.0)
codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
curl 0.9.6 2016-02-17 CRAN (R 3.3.0)
data.table 1.9.6 2015-09-19 CRAN (R 3.3.0)
DBI 0.3.1 2014-09-24 CRAN (R 3.3.0)
devtools * 1.10.0 2016-01-23 CRAN (R 3.3.0)
dichromat 2.0-0 2013-01-24 CRAN (R 3.3.0)
digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
doParallel 1.0.10 2015-10-14 CRAN (R 3.3.0)
ensembldb 1.3.17 2016-02-22 Bioconductor
foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
foreign 0.8-66 2015-08-19 CRAN (R 3.3.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
GenomeInfoDb * 1.7.6 2016-01-29 Bioconductor
GenomicAlignments 1.7.20 2016-02-25 Bioconductor
GenomicFeatures 1.23.25 2016-02-25 Bioconductor
GenomicRanges * 1.23.23 2016-02-25 Bioconductor
ggplot2 2.1.0 2016-03-01 CRAN (R 3.3.0)
git2r 0.13.1 2015-12-10 CRAN (R 3.3.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
Gviz * 1.15.4 2016-02-18 Bioconductor
Hmisc 3.17-2 2016-02-21 CRAN (R 3.3.0)
htmltools 0.3 2015-12-29 CRAN (R 3.3.0)
httpuv 1.3.3 2015-08-04 CRAN (R 3.3.0)
httr 1.1.0 2016-01-28 CRAN (R 3.3.0)
impute 1.45.0 2015-10-14 Bioconductor
interactiveDisplayBase 1.9.0 2015-10-14 Bioconductor
IRanges * 2.5.39 2016-02-28 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
lattice 0.20-33 2015-07-14 CRAN (R 3.3.0)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
limma 3.27.13 2016-03-02 Bioconductor
magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
MALDIquant 1.14 2015-11-18 CRAN (R 3.3.0)
matrixStats 0.50.1 2015-12-15 CRAN (R 3.3.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
mime 0.4 2015-09-03 CRAN (R 3.3.0)
MSnbase 1.19.11 2016-02-04 Bioconductor
munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
mzID 1.9.0 2015-10-14 Bioconductor
mzR 2.5.3 2016-02-25 Bioconductor
nnet 7.3-12 2016-02-02 CRAN (R 3.3.0)
Pbase * 0.11.3 2016-03-05 Github (ComputationalProteomicsUnit/Pbase@b4c4955)
pcaMethods 1.63.0 2016-01-17 Bioconductor
plyr 1.8.3 2015-06-12 CRAN (R 3.3.0)
preprocessCore 1.33.0 2015-10-14 Bioconductor
ProtGenerics 1.3.3 2015-10-20 Bioconductor
Pviz 1.5.0 2015-10-14 Bioconductor
R6 2.1.2 2016-01-26 CRAN (R 3.3.0)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.0)
Rcpp * 0.12.3 2016-01-10 CRAN (R 3.3.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
reshape2 1.4.1 2014-12-06 CRAN (R 3.3.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
Rsamtools 1.23.3 2016-01-25 Bioconductor
RSQLite 1.0.0 2014-10-25 CRAN (R 3.3.0)
rtracklayer 1.31.7 2016-02-16 Bioconductor
S4Vectors * 0.9.40 2016-02-25 Bioconductor
scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
shiny 0.13.1 2016-02-17 CRAN (R 3.3.0)
stringi 1.0-1 2015-10-22 CRAN (R 3.3.0)
stringr 1.0.0 2015-04-30 CRAN (R 3.3.0)
SummarizedExperiment 1.1.21 2016-02-25 Bioconductor
survival 2.38-3 2015-07-02 CRAN (R 3.3.0)
VariantAnnotation 1.17.18 2016-01-29 Bioconductor
vsn 3.39.2 2016-01-06 Bioconductor
withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.3.0)
XVector 0.11.7 2016-02-13 Bioconductor
zlibbioc 1.17.0 2015-10-14 Bioconductor
Here we go again. There will be more changes in the coming days, so better (for now) use a specific commit.
install_github("ComputationalProteomicsUnit/Pbase", ref = "#6eb74a307f8cc6705df3b758bb74ec691689f1f0")
Once all tests have been added, I will commit to Bioc.
EDIT: please now rather use the official version 0.11.3 from Bioconductor.
It works for the example data set,but not for my dataset. I get the following error.
Error in replacePranges(object, value) :
Names of replacement pranges differ from current ones.
> library("Pbase")
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated,
eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit
Loading required package: Rcpp
Loading required package: Gviz
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: grid
This is Pbase version 0.11.3
> setwd("D:/Proteins_DB")
> p <- readRDS("p.test.Ptn")
> p
S4 class type : Proteins
Class version : 0.1
Created : Tue Feb 9 14:26:29 2016
Number of Proteins: 10
Sequences:
[1] Q9Z2K1 [2] Q3SYP5 ... [9] D3Z765 [10] Q61781
Sequence features:
[1] DB [2] AccessionNumber ... [11] Filename [12] npeps
Peptide features:
[1] DB [2] AccessionNumber ... [27] acquisitionNum [28] filenames
> aa(p)
A AAStringSet instance of length 10
width seq names
[1] 469 MATCSRQFTSSSSMKGSCGIGGGSSRMSS...SSSRQPRSILKEQGSTSFSQSQSQSSRD Q9Z2K1
[2] 469 MATCSRQFTSSSSMKGSCGIGGGSSRMSS...SSSRQPRSILKEQGSTSFSQSQSQSSRD Q3SYP5
[3] 2703 MYNGIGLPTPRGSGTNGYVQRNLSLVRGR...RGDSHSPGHKRKETPSPRSNRHRSSRSP Q8BTI8
[4] 432 MISAAQLLDELMGRDRNLAPDEKRSNVRW...PESKESDTKNEVNGTSEDIKSEGDTQSN Q5SUF2
[5] 707 MSCQISCRSRRGGGGGGGGGFRGFSSGSA...SGGAGSSSEKGGSGSGEGCGSGVTFSFR Q3TTY5
[6] 1505 MADRFSRFNEDRDFQGNHFDQYEEGHLEI...AATALHLHPLLHPIFSGQDLQHPPSHGT A2A6A1
[7] 371 MSAQAQMRALLDQLMGTARDGDETRQRVK...RGPTDWRLENSNGKTASRRSEEKEAGEI Q9CYI4
[8] 325 MSAQAQMRALLDQLMGTARDGDETRQRVK...RYRRHRSRSRSHSRGHRRASRDRSTKYK A0A0R4J047
[9] 172 MDHLESFIAECDRRTELAKKRLAETQEEI...TVAEKQEKRNQDRLRRREEREREERLGR D3Z765
[10] 484 MATCSRQFTSSSSMKGSCGIGGGSSRMSS...NRQIRTKVMDVHDGKVVSTHEQVLRTKN Q61781
> acols(p)
DataFrame with 10 rows and 12 columns
DB AccessionNumber EntryName IsoformName
<Rle> <character> <character> <Rle>
1 sp Q9Z2K1 K1C16_MOUSE NA
2 tr Q3SYP5 Q3SYP5_MOUSE NA
3 sp Q8BTI8 SRRM2_MOUSE NA
4 sp Q5SUF2 LC7L3_MOUSE NA
5 sp Q3TTY5 K22E_MOUSE NA
6 sp A2A6A1 GPTC8_MOUSE NA
7 sp Q9CYI4 LUC7L_MOUSE NA
8 tr A0A0R4J047 A0A0R4J047_MOUSE NA
9 tr D3Z765 D3Z765_MOUSE NA
10 sp Q61781 K1C14_MOUSE NA
ProteinName OrganismName GeneName
<character> <Rle> <Rle>
1 Keratin, type I cytoskeletal 16 Mus musculus Krt16
2 Keratin 16 Mus musculus Krt16
3 Serine/arginine repetitive matrix protein 2 Mus musculus Srrm2
4 Luc7-like protein 3 Mus musculus Luc7l3
5 Keratin, type II cytoskeletal 2 epidermal Mus musculus Krt2
6 G patch domain-containing protein 8 Mus musculus Gpatch8
7 Putative RNA-binding protein Luc7-like 1 Mus musculus Luc7l
8 Luc7 homolog (S. cerevisiae)-like, isoform CRA_c Mus musculus Luc7l
9 Putative RNA-binding protein Luc7-like 1 (Fragment) Mus musculus Luc7l
10 Keratin, type I cytoskeletal 14 Mus musculus Krt14
ProteinExistence SequenceVersion Comment Filename
<Rle> <Rle> <Rle> <Rle>
1 Evidence at protein level 3 NA ~/Pbase/uniprot_proteome_mouse.fasta
2 Evidence at protein level 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
3 Evidence at protein level 3 NA ~/Pbase/uniprot_proteome_mouse.fasta
4 Evidence at protein level 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
5 Evidence at protein level 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
6 Evidence at protein level 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
7 Evidence at protein level 2 NA ~/Pbase/uniprot_proteome_mouse.fasta
8 Predicted 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
9 Evidence at protein level 1 NA ~/Pbase/uniprot_proteome_mouse.fasta
10 Evidence at protein level 2 NA ~/Pbase/uniprot_proteome_mouse.fasta
npeps
<integer>
1 10
2 10
3 2
4 7
5 7
6 7
7 2
8 2
9 2
10 10
> pcols(p)
SplitDataFrameList of length 10
$A0A0R4J047
DataFrame with 2 rows and 28 columns
DB AccessionNumber EntryName IsoformName
<Rle> <character> <character> <Rle>
1 tr A0A0R4J047 A0A0R4J047_MOUSE NA
2 tr A0A0R4J047 A0A0R4J047_MOUSE NA
ProteinName
<character>
1 tr|A0A0R4J047|A0A0R4J047_MOUSE Luc7 homolog (S. cerevisiae)-like, isoform CRA_c
2 tr|A0A0R4J047|A0A0R4J047_MOUSE Luc7 homolog (S. cerevisiae)-like, isoform CRA_c
OrganismName GeneName ProteinExistence SequenceVersion Comment
<Rle> <Rle> <Rle> <Rle> <Rle>
1 Mus musculus Luc7l Predicted 1 NA
2 Mus musculus Luc7l Predicted 1 NA
spectrumID chargeState rank passThreshold
<factor> <integer> <integer> <logical>
1 controllerType=0 controllerNumber=1 scan=4303 2 1 TRUE
2 controllerType=0 controllerNumber=1 scan=4213 2 1 TRUE
experimentalMassToCharge calculatedMassToCharge sequence modNum isDecoy
<numeric> <numeric> <factor> <integer> <logical>
1 791.8651 791.3703 AEQLGAEGNVDESQK 2 FALSE
2 791.8632 791.3703 AEQLGAEGNVDESQK 2 FALSE
post pre start end DatabaseAccess DBseqLength
<factor> <factor> <integer> <integer> <factor> <integer>
1 I K 140 154 tr|A0A0R4J047|A0A0R4J047_MOUSE 325
2 I K 140 154 tr|A0A0R4J047|A0A0R4J047_MOUSE 325
DatabaseSeq acquisitionNum filenames
<factor> <numeric> <Rle>
1 4303 120.mzid
2 4213 120.mzid
$A2A6A1
DataFrame with 7 rows and 28 columns
DB AccessionNumber EntryName IsoformName
<Rle> <character> <character> <Rle>
1 sp A2A6A1 GPTC8_MOUSE NA
2 sp A2A6A1 GPTC8_MOUSE NA
3 sp A2A6A1 GPTC8_MOUSE NA
4 sp A2A6A1 GPTC8_MOUSE NA
5 sp A2A6A1 GPTC8_MOUSE NA
6 sp A2A6A1 GPTC8_MOUSE NA
7 sp A2A6A1 GPTC8_MOUSE NA
ProteinName OrganismName GeneName
<character> <Rle> <Rle>
1 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
2 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
3 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
4 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
5 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
6 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
7 sp|A2A6A1|GPTC8_MOUSE G patch domain-containing protein 8 Mus musculus Gpatch8
ProteinExistence SequenceVersion Comment
<Rle> <Rle> <Rle>
1 Evidence at protein level 1 NA
2 Evidence at protein level 1 NA
3 Evidence at protein level 1 NA
4 Evidence at protein level 1 NA
5 Evidence at protein level 1 NA
6 Evidence at protein level 1 NA
7 Evidence at protein level 1 NA
spectrumID chargeState rank passThreshold
<factor> <integer> <integer> <logical>
1 controllerType=0 controllerNumber=1 scan=3694 3 2 TRUE
2 controllerType=0 controllerNumber=1 scan=364 2 1 TRUE
3 controllerType=0 controllerNumber=1 scan=364 2 2 TRUE
4 controllerType=0 controllerNumber=1 scan=364 2 3 TRUE
5 controllerType=0 controllerNumber=1 scan=364 2 4 TRUE
6 controllerType=0 controllerNumber=1 scan=364 2 5 TRUE
7 controllerType=0 controllerNumber=1 scan=364 2 6 TRUE
experimentalMassToCharge calculatedMassToCharge sequence modNum isDecoy
<numeric> <numeric> <factor> <integer> <logical>
1 801.3566 801.0378 RHSKRSHDSDDSDYTSSKHR 0 FALSE
2 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
3 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
4 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
5 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
6 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
7 972.6678 972.1609 KKKHKKSSKHKRKHK 5 FALSE
post pre start end DatabaseAccess DBseqLength DatabaseSeq
<factor> <factor> <integer> <integer> <factor> <integer> <factor>
1 S R 907 926 sp|A2A6A1|GPTC8_MOUSE 1505
2 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
3 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
4 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
5 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
6 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
7 A K 673 687 sp|A2A6A1|GPTC8_MOUSE 1505
acquisitionNum filenames
<numeric> <Rle>
1 3694 120.mzid
2 364 120.mzid
3 364 120.mzid
4 364 120.mzid
5 364 120.mzid
6 364 120.mzid
7 364 120.mzid
$D3Z765
DataFrame with 2 rows and 28 columns
DB AccessionNumber EntryName IsoformName
<Rle> <character> <character> <Rle>
1 tr D3Z765 D3Z765_MOUSE NA
2 tr D3Z765 D3Z765_MOUSE NA
ProteinName OrganismName
<character> <Rle>
1 tr|D3Z765|D3Z765_MOUSE Putative RNA-binding protein Luc7-like 1 (Fragment) Mus musculus
2 tr|D3Z765|D3Z765_MOUSE Putative RNA-binding protein Luc7-like 1 (Fragment) Mus musculus
GeneName ProteinExistence SequenceVersion Comment
<Rle> <Rle> <Rle> <Rle>
1 Luc7l Evidence at protein level 1 NA
2 Luc7l Evidence at protein level 1 NA
spectrumID chargeState rank passThreshold
<factor> <integer> <integer> <logical>
1 controllerType=0 controllerNumber=1 scan=4303 2 1 TRUE
2 controllerType=0 controllerNumber=1 scan=4213 2 1 TRUE
experimentalMassToCharge calculatedMassToCharge sequence modNum isDecoy
<numeric> <numeric> <factor> <integer> <logical>
1 791.8651 791.3703 AEQLGAEGNVDESQK 2 FALSE
2 791.8632 791.3703 AEQLGAEGNVDESQK 2 FALSE
post pre start end DatabaseAccess DBseqLength DatabaseSeq
<factor> <factor> <integer> <integer> <factor> <integer> <factor>
1 I K 54 68 tr|D3Z765|D3Z765_MOUSE 172
2 I K 54 68 tr|D3Z765|D3Z765_MOUSE 172
acquisitionNum filenames
<numeric> <Rle>
1 4303 120.mzid
2 4213 120.mzid
...
<7 more elements>
> pfeatures(p)
AAStringSetList of length 10
[["Q9Z2K1"]] Q9Z2K1=ASLENSLEETK Q9Z2K1=ASLENSLEETK ... Q9Z2K1=LAADDFR Q9Z2K1=LAADDFR
[["Q3SYP5"]] Q3SYP5=ASLENSLEETK Q3SYP5=ASLENSLEETK ... Q3SYP5=LAADDFR Q3SYP5=LAADDFR
[["Q8BTI8"]] Q8BTI8=PGPQALPK Q8BTI8=SGSSPEMKDKPR
[["Q5SUF2"]] Q5SUF2=KSYKHRSKSRDREQDRK Q5SUF2=KSYKHRSKSRDREQDRK ... Q5SUF2=MISAAQLLD
[["Q3TTY5"]] Q3TTY5=DYQELMNTK Q3TTY5=DYQELMNTK ... Q3TTY5=FASFIDK Q3TTY5=FASFIDK
[["A2A6A1"]] A2A6A1=RHSKRSHDSDDSDYTSSKHR A2A6A1=KKKHKKSSKHKRKHK ... A2A6A1=KKKHKKSSKHKRKHK
[["Q9CYI4"]] Q9CYI4=AEQLGAEGNVDESQK Q9CYI4=AEQLGAEGNVDESQK
[["A0A0R4J047"]] A0A0R4J047=AEQLGAEGNVDESQK A0A0R4J047=AEQLGAEGNVDESQK
[["D3Z765"]] D3Z765=AEQLGAEGNVDESQK D3Z765=AEQLGAEGNVDESQK
[["Q61781"]] Q61781=EVATNSELVQSGK Q61781=VTMQNLNDR ... Q61781=LAADDFR Q61781=LAADDFR
> lengths(pranges(p))
Q9Z2K1 Q3SYP5 Q8BTI8 Q5SUF2 Q3TTY5 A2A6A1 Q9CYI4 A0A0R4J047
10 10 2 7 7 7 2 2
D3Z765 Q61781
2 10
> delta <- pcols(p)[, "experimentalMassToCharge"] -
+ pcols(p)[, "calculatedMassToCharge"]
> ## assuming MSnID suggested a delta of 0.35
> sel <- abs(delta) < 0.35
> sel
LogicalList of length 10
[["A0A0R4J047"]] FALSE FALSE
[["A2A6A1"]] TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[["D3Z765"]] FALSE FALSE
[["Q3SYP5"]] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
[["Q3TTY5"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["Q5SUF2"]] TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[["Q61781"]] TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
[["Q8BTI8"]] FALSE TRUE
[["Q9CYI4"]] FALSE FALSE
[["Q9Z2K1"]] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
> pranges(p) <- pranges(p)[sel]
Error in replacePranges(object, value) :
Names of replacement pranges differ from current ones.
> targets <- pcols(p)[,"isDecoy"]==FALSE
> targets
LogicalList of length 10
[["A0A0R4J047"]] TRUE TRUE
[["A2A6A1"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["D3Z765"]] TRUE TRUE
[["Q3SYP5"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["Q3TTY5"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["Q5SUF2"]] TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[["Q61781"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[["Q8BTI8"]] TRUE TRUE
[["Q9CYI4"]] TRUE TRUE
[["Q9Z2K1"]] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> lengths(pranges(p))
Q9Z2K1 Q3SYP5 Q8BTI8 Q5SUF2 Q3TTY5 A2A6A1 Q9CYI4 A0A0R4J047
10 10 2 7 7 7 2 2
D3Z765 Q61781
2 10
> pranges(p) <- pranges(p)[targets]
Error in replacePranges(object, value) :
Names of replacement pranges differ from current ones
> sessionInfo()
R Under development (unstable) (2016-02-25 r70222)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets
[9] methods base
other attached packages:
[1] Pbase_0.11.3 Gviz_1.15.4 GenomicRanges_1.23.24 GenomeInfoDb_1.7.6
[5] IRanges_2.5.39 S4Vectors_0.9.41 Rcpp_0.12.3 BiocGenerics_0.17.3
loaded via a namespace (and not attached):
[1] Biobase_2.31.3 httr_1.1.0 vsn_3.39.2
[4] AnnotationHub_2.3.14 splines_3.3.0 foreach_1.4.3
[7] Formula_1.2-1 shiny_0.13.1 interactiveDisplayBase_1.9.0
[10] affy_1.49.0 latticeExtra_0.6-28 BSgenome_1.39.4
[13] Rsamtools_1.23.3 impute_1.45.0 RSQLite_1.0.0
[16] lattice_0.20-33 biovizBase_1.19.4 limma_3.27.13
[19] chron_2.3-47 digest_0.6.9 RColorBrewer_1.1-2
[22] XVector_0.11.7 colorspace_1.2-6 preprocessCore_1.33.0
[25] htmltools_0.3 httpuv_1.3.3 plyr_1.8.3
[28] MALDIquant_1.14 XML_3.98-1.4 biomaRt_2.27.2
[31] zlibbioc_1.17.0 xtable_1.8-2 scales_0.4.0
[34] affyio_1.41.0 cleaver_1.9.0 BiocParallel_1.5.19
[37] ggplot2_2.1.0 SummarizedExperiment_1.1.21 GenomicFeatures_1.23.25
[40] nnet_7.3-12 survival_2.38-3 magrittr_1.5
[43] mime_0.4 doParallel_1.0.10 foreign_0.8-66
[46] mzR_2.5.3 Pviz_1.5.0 BiocInstaller_1.21.3
[49] tools_3.3.0 data.table_1.9.6 matrixStats_0.50.1
[52] stringr_1.0.0 MSnbase_1.19.11 munsell_0.4.3
[55] cluster_2.0.3 AnnotationDbi_1.33.7 ensembldb_1.3.17
[58] Biostrings_2.39.12 pcaMethods_1.63.0 mzID_1.9.0
[61] RCurl_1.95-4.8 dichromat_2.0-0 iterators_1.0.8
[64] VariantAnnotation_1.17.18 bitops_1.0-6 gtable_0.2.0
[67] codetools_0.2-14 DBI_0.3.1 reshape2_1.4.1
[70] R6_2.1.2 GenomicAlignments_1.7.20 gridExtra_2.2.1
[73] rtracklayer_1.31.7 Hmisc_3.17-2 ProtGenerics_1.3.3
[76] stringi_1.0-1 rpart_4.1-10 acepack_1.3-3.3
>
packageDescription("Pbase")$Maintainer
in R).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Could you clarify what you are trying to do. Once you have a subset of proteins that you deem relevant, what do you want to do; where to you want to add identifications to?
I have multiple mzID files, which I can combine and filter by MSnId. I would like to create a 'Proteins' object and map the identified peptides back to the genome. Currently, I can create a 'Proteins' object and add the identified peptides in one 'mzID or more 'mzID' files, via
However these are unfiltered and multiple mzid file creates a separate maps rather than a combined one.
Thanks
VR