Question: Trouble using addMarkers from pRoloc on my own dataset
0
gravatar for moldach
18 months ago by
moldach10
Canada/Montreal/Douglas Mental Health Institute
moldach10 wrote:

I'm having trouble adapting the pRoloc vignette to my own data set; specifically section 2.4 "pRoloc's organelle markers".

The vignette tell you to see ?addMarkers in R in order to add these to a new MSnSet class object.

First you need to specify the parameters for the available marker set obtained using pRolocmarkers function.

# Set parameters for Homo sapiens with Gene symbol identifier

>hsap <- pRolocmarkers("hsap")

# Next load my custom data a `MSnSet` S4 object using the readMSnSet2 function.

>df <- f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
>e <- readMSnSet2(df, ecol = 2:20)
>sampleNames(e)<- 1:19
>e$group <- c(rep("Treatment", 17),rep("Control",2))
>e$rep <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","1","2")

# Try to add the markers to the MSnSet object

>try(addMarkers(e,hsap))
Error in addMarkers(e, hsap) :
  No markers found. Are you sure that the feature names match?
  Feature names: 1, 2, 3...
  Markers names: P08865, P0CW22, P15880...

This doesn't seem to be the issue as these markers are in my data:

> grep("P08865", hap1$UNIPROT)
[1] 6370
proteomics msnset proloc • 398 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by moldach10
Answer: Trouble using addMarkers from pRoloc on my own dataset
0
gravatar for Laurent Gatto
18 months ago by
Laurent Gatto1.2k
Belgium
Laurent Gatto1.2k wrote:

You need to tell addMarkers how to match the proteins in the MSnSet and in the marker vector.

> ## named marker vector
> hsap <- pRolocmarkers("hsap")
> head(hsap)
        P08865         P0CW22         P15880         P22090         P23396
"40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome"
        P25398
"40S Ribosome"
> head(names(hsap)) ## names, used for matching proteins
[1] "P08865" "P0CW22" "P15880" "P22090" "P23396" "P25398"

> ## your data
> f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
> e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
> head(featureNames(e)) ## typically used for matching markers
[1] "1" "2" "3" "4" "5" "6"

One would typically set the feature names to uniprot identifiers. But that won't be possible in your case because they aren't unique (and feature names must be unique).

> ## What you want are UNIPROT identifiers
> fvarLabels(e)
[1] "Gene_symbol" "UNIPROT"    
> ## BUT they aren't unique
> sum(duplicated(fData(e)$UNIPROT))
[1] 12

You can also specify a feature variable (by setting fcol) to be used to do the matching:

> e <- addMarkers(e, hsap, fcol = "UNIPROT")
Markers in data: 552 out of 12353
organelleMarkers
         40S Ribosome          60S Ribosome    actin cytoskeleton
                   30                    44                    35
              Cytosol Endoplasmic Reticulum       Golgi Apparatus
                   47                    63                    23
             Lysosome          Mitochondria               Nucleus
                    8                   109                   102
           PEROXISOME            plasma mem            Proteasome
                   17                    44                    30
              unknown
                11801
ADD COMMENTlink written 18 months ago by Laurent Gatto1.2k

I used clusterProfiler to give Uniprot identifiers to my HGNC gene symbols.

# Working back from the data I provided just to show the original genes I had - there were only 6538 unique genes

f <- read.csv("https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data", sep="\t")
>genelist <- f$Gene_symbol
>genelist <- unique(genelist) # Get only the unique gene symbols
>nrow(genelist)
[1] 6538

# This is the code I used to get Uniprot identifiers with clusterProfiler:

>library(clusterProfiler)
>id <- as.character(genelist)
>id <- bitr(id, fromType = "SYMBOL", toType = "UNIPROT", OrgDb="org.Hs.eg.db")
> nrow(id)
[1] 12345

This assigned multiple Uniprot identifies to each gene symbol.

> head(id)
  SYMBOL    UNIPROT
1  VPS18 A0A024R9R3
2  VPS18     Q9P253
3   CNN2     B4DDF4
4   CNN2     Q99439
5   CNN2     B4DUT8
6 RPS27A     B2RDW1

Looking at VPS18 for example, it's assigned to multiple Uniprot identifiers because this gene makes different isoforms/protein products: A0A024R9R3 and Q9P253.

So if every gene is repeated it makes sense that I can't use these as feature names but I'm wondering why you have set the feature names from 1 to 12345 rather than using Uniprot identifiers because these are all unique?

ADD REPLYlink written 18 months ago by moldach10

I'm also trying to make sense of the plot2D I get with my data.

>f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
>e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
>hsap <- pRolocmarkers("hsap")
>e <- addMarkers(e, hsap, fcol = "UNIPROT")
>plot2D(e, fcol = "markers")
>addLegend(e, fcol = "markers", cex = .7,
          where = "bottomright", ncol = 2)

 
http://tinypic.com/r/2ce3vqa/9

Is this PCA telling me that the pRolocmarkers are not good organelle markers for my data, or that the sub-cellular niche isn't resolved in my data, or that my data is noisy, or maybe something else?

ADD REPLYlink written 18 months ago by moldach10

Re PCA plot, the funny share of the points is because you have integers. You should start by removing proteins that have only 0 rows (and possibly those that have few and low values), then try to rescale between 0 and 1 (use normalise(e, method = "sum")).

But even with this pre-processing, I think you won't get a plot like those from typical experimental data. It's difficult to say much more without knowing more about what these values represent.

ADD REPLYlink written 18 months ago by Laurent Gatto1.2k

You can use whatever you want as feature names. The default is to use indices, but you can also set then with readMSnSet2(..., fnames = "UNIPROT") - see ?readMSnSet2 for details about fnames.

You can also set the feature names later with

featureNames(e) <- fData(e)$UNIPROT
ADD REPLYlink written 18 months ago by Laurent Gatto1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour