Trouble using addMarkers from pRoloc on my own dataset
Entering edit mode
moldach ▴ 20
Last seen 2.3 years ago
Canada/Montreal/Douglas Mental Health I…

I'm having trouble adapting the pRoloc vignette to my own data set; specifically section 2.4 "pRoloc's organelle markers".

The vignette tell you to see ?addMarkers in R in order to add these to a new MSnSet class object.

First you need to specify the parameters for the available marker set obtained using pRolocmarkers function.

# Set parameters for Homo sapiens with Gene symbol identifier

>hsap <- pRolocmarkers("hsap")

# Next load my custom data a `MSnSet` S4 object using the readMSnSet2 function.

>df <- f <- ""
>e <- readMSnSet2(df, ecol = 2:20)
>sampleNames(e)<- 1:19
>e$group <- c(rep("Treatment", 17),rep("Control",2))
>e$rep <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","1","2")

# Try to add the markers to the MSnSet object

Error in addMarkers(e, hsap) :
  No markers found. Are you sure that the feature names match?
  Feature names: 1, 2, 3...
  Markers names: P08865, P0CW22, P15880...

This doesn't seem to be the issue as these markers are in my data:

> grep("P08865", hap1$UNIPROT)
[1] 6370
proloc proteomics msnset • 846 views
Entering edit mode
Last seen 1 day ago

You need to tell addMarkers how to match the proteins in the MSnSet and in the marker vector.

> ## named marker vector
> hsap <- pRolocmarkers("hsap")
> head(hsap)
        P08865         P0CW22         P15880         P22090         P23396
"40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome"
"40S Ribosome"
> head(names(hsap)) ## names, used for matching proteins
[1] "P08865" "P0CW22" "P15880" "P22090" "P23396" "P25398"

> ## your data
> f <- ""
> e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
> head(featureNames(e)) ## typically used for matching markers
[1] "1" "2" "3" "4" "5" "6"

One would typically set the feature names to uniprot identifiers. But that won't be possible in your case because they aren't unique (and feature names must be unique).

> ## What you want are UNIPROT identifiers
> fvarLabels(e)
[1] "Gene_symbol" "UNIPROT"    
> ## BUT they aren't unique
> sum(duplicated(fData(e)$UNIPROT))
[1] 12

You can also specify a feature variable (by setting fcol) to be used to do the matching:

> e <- addMarkers(e, hsap, fcol = "UNIPROT")
Markers in data: 552 out of 12353
         40S Ribosome          60S Ribosome    actin cytoskeleton
                   30                    44                    35
              Cytosol Endoplasmic Reticulum       Golgi Apparatus
                   47                    63                    23
             Lysosome          Mitochondria               Nucleus
                    8                   109                   102
           PEROXISOME            plasma mem            Proteasome
                   17                    44                    30
Entering edit mode

I used clusterProfiler to give Uniprot identifiers to my HGNC gene symbols.

# Working back from the data I provided just to show the original genes I had - there were only 6538 unique genes

f <- read.csv("", sep="\t")
>genelist <- f$Gene_symbol
>genelist <- unique(genelist) # Get only the unique gene symbols
[1] 6538

# This is the code I used to get Uniprot identifiers with clusterProfiler:

>id <- as.character(genelist)
>id <- bitr(id, fromType = "SYMBOL", toType = "UNIPROT", OrgDb="")
> nrow(id)
[1] 12345

This assigned multiple Uniprot identifies to each gene symbol.

> head(id)
1  VPS18 A0A024R9R3
2  VPS18     Q9P253
3   CNN2     B4DDF4
4   CNN2     Q99439
5   CNN2     B4DUT8
6 RPS27A     B2RDW1

Looking at VPS18 for example, it's assigned to multiple Uniprot identifiers because this gene makes different isoforms/protein products: A0A024R9R3 and Q9P253.

So if every gene is repeated it makes sense that I can't use these as feature names but I'm wondering why you have set the feature names from 1 to 12345 rather than using Uniprot identifiers because these are all unique?

Entering edit mode

I'm also trying to make sense of the plot2D I get with my data.

>f <- ""
>e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
>hsap <- pRolocmarkers("hsap")
>e <- addMarkers(e, hsap, fcol = "UNIPROT")
>plot2D(e, fcol = "markers")
>addLegend(e, fcol = "markers", cex = .7,
          where = "bottomright", ncol = 2)

Is this PCA telling me that the pRolocmarkers are not good organelle markers for my data, or that the sub-cellular niche isn't resolved in my data, or that my data is noisy, or maybe something else?

Entering edit mode

Re PCA plot, the funny share of the points is because you have integers. You should start by removing proteins that have only 0 rows (and possibly those that have few and low values), then try to rescale between 0 and 1 (use normalise(e, method = "sum")).

But even with this pre-processing, I think you won't get a plot like those from typical experimental data. It's difficult to say much more without knowing more about what these values represent.

Entering edit mode

You can use whatever you want as feature names. The default is to use indices, but you can also set then with readMSnSet2(..., fnames = "UNIPROT") - see ?readMSnSet2 for details about fnames.

You can also set the feature names later with

featureNames(e) <- fData(e)$UNIPROT

Login before adding your answer.

Traffic: 323 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6