Adding annotation to MSnSet
1
0
Entering edit mode
lolli.langan ▴ 20
@lollilangan-15141
Last seen 4.3 years ago

I've analysed shotgun proteomics data using MSGFplus, MSnBase and MSnID. I now have a final combined MSnExp. When running MSGFplus, I used uniprot data which brings in accession numbers etc. Seperatly, i've used uniprot to download EntrezID, symbol and gene names and i would like to connect the two. Is there a logical way to do this or do i need to change the data? Any help would be appreciated.

bioconductor proteomics msnset • 744 views
ADD COMMENT
0
Entering edit mode
@laurent-gatto-5645
Last seen 1 day ago
Belgium

It would be useful to have some additional details on what you have done. I assume you have raw data that you read into R using readMSData. I also suspect you have identification data resulting from running MSGF+ (in this case through the Bioconductor package MSGFplus). Have you run addIdentificationData?

Additional matching from uniprot to EntrezID, ... (that you downloaded from the UniProt webapge) might need to be done manually, using dplyr::left_join and fData()<-, for example.

If you provide more details, and the code you ran, I might be able to help a bit more.

ADD COMMENT
0
Entering edit mode

Sorry for the vagueness of my answer. This is extremely new to me. I can confirm that i did download some annotations from the uniprot website but they are not as informative as i would like. 

I'm unsure of how to add the code i ran so i have just added it in text here. I've been trawling through the net trying to figure this out but i'm completely stuck. This is the code i have used but do i need to provide you with a minimum dataset? 

q1=c("Tissue-1.mzML")
i1=c("Tissue-1.mzid")
msexp=readMSData(q1, mode="onDisk") 
msexp=addIdentificationData(msexp,i1)
si=quantify(msexp, method="SI")

si=topN(si, groupBy=fData(si)$DatabaseAccess,  n=3)
npeps=nQuants(si, groupBy=fData(si)$DatabaseAccess)
si=combineFeatures(si,
                   fData(si)$DatabaseAccess,
                   redundancy.handler = "unique",
                   fun="sum",cv=FALSE)
exprs(si)=exprs(si)*(3/npeps)

Any help would be very much appreciated. 

ADD REPLY
0
Entering edit mode

Thank you for the code snippet. This looks reasonable to me. If I follow, you would like to add additional metadata from UniProt. If so, you'll need to have a column in that data. that matches the database accession numbers you used to combine the features. You can use dplyr::left_join to join the feature metadata fData and your additional data, that I assume is in a data.frame and is called uniprot below:

library("dplyr")
fd <- left_join(fData(si), uniprot)
## update the feature data
fData(si) <- fd

You'll have to adapt the left_join(fData(si), uniprot) call to match the column used to match the two tables. For example if uniprot also has a DatabaseAccess column, you would

fd <- left_join(fData(si), uniprot, by = "DatabaseAccess")

or even simply

fd <- left_join(fData(si), uniprot)

if that's the only column they share. See ?left_join for details.

Hope this helps.

ADD REPLY
1
Entering edit mode

Thank you so much!!! This worked beautifully and gave me all the information that is required for some downstream analysis. I've never come across anything that alludes to this. Its so simply and yet brilliant. 

ADD REPLY

Login before adding your answer.

Traffic: 306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6