Trying to do a PCA using edgeR with expression data
2
0
Entering edit mode
waalkes • 0
@waalkes-24135
Last seen 4.3 years ago

Hi,

I am trying to do a pca plot for some gene expression data in R using edgeR. I have done a MDS plot but was hoping to do a pca. I have transformed my data into TPM to do this analysis. I have three wild type samples, three with one phenotype and and one from a second phenotype. The a bit of the data is below. Matrix is 7 samples x ~27000 genes. Here is how I did the MDS plot:

group2 <- r("WT","WT","WT","M1","M1","M1","other")
y_gene <- DGEList(X200909_ genes_TPM [c(2:8)], group=group2, lib.size = colSums(X200909_ genes_TPM [c(2:8)]))
y_gene <- calcNormFactors(y_gene)
plotMDS(y_gene)

Any help appreciated.

Adam

WT2-P1  WT3-P1  WT4-P1  M4-P1   M5-P1   M6-P1   F5-P1
gene    
A1BG    51.27004445 65.36898684 59.90938705 31.66942191 29.18648916 46.55213548 58.65045307
A1CF    0   0   0   0   0   0   0
A2ML1   0   0   0   0   1.389832817 0   0
A3GALT2 0   0   0   0   0   0   0
A4GALT  0   0   0   0   0   0   0
A4GNT   0   0   0   0   0   0   0
AAAS    0   2.108676995 0   0   1.389832817 3.879344624 1.629179252
AACS    1.709001482 0   0   0   0   0   0
AADACL2 0   0   0   0   0   0   0
AADACL3 0   0   0   0   0   0   0
AADACL4 0   0   0   0   0   0   0
AADAT   0   0   1.872168345 2.111294794 0   1.939672312 0
AAED1   0   0   0   0   0   0   0
AAGAB   3.418002963 2.108676995 7.488673382 2.111294794 2.779665634 9.698361559 4.887537756
AAK1    1.709001482 0   3.744336691 2.111294794 0   1.939672312 1.629179252
AAMDC   1.709001482 0   5.616505036 0   0   1.939672312 0
edger expression data pca • 8.6k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

You've already made an MDS plot with

plotMDS(x)

To make a PCA plot, simply use

plotMDS(x, gene.selection="common")

Don't use TPMs

Transforming to TPM is absolutely unnecessary and will just make everything work poorly. As the edgeR User's Guide explains, nothing in edgeR is designed not to work on TPMs and that includes DGEList, calcNormFactors and plotMDS.

ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 9 days ago
Republic of Ireland

Hi,

I am not sure that your workflow is suitable. The last time that I checked, DGEList() expects raw counts, not TPM expression levels. TPM is suitable for neither differential expression analysis nor any other analysis where cross-sample differences are being measured. There is 'debate' about the usage of this (and other, like FPKM, RPKM) expression metrics generally in bioinformatics.

A very simple workflow from the User's Guide is:

y <- DGEList(counts = x)
y <- calcNormFactors(y)
plotMDS(y)

So, you should start with raw counts, if you have them.

For PCA, I'd recommend my own package, PCAtools ( http://www.bioconductor.org/packages/devel/bioc/html/PCAtools.html ), and, for this, you 'could' use the TPM data that you already seem to have (possibly first transforming it via log(TPM + 1)), but it is not ideal. I'd rather you started from raw counts, go through the standard EdgeR workflow, use plotMDS(), and then transform your EdgeR counts via cpm() for usage in PCAtools.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin,

I did as you suggested and went back to the counts and created the MDS plot:

> X200910_genes <- read_excel("C:/200910_genes.xlsx")
> x <- DGEList(counts = X200910_genes[c(2:8)],group=group2)
> x <- calcNormFactors(x)
> plotMDS(x)
> x$samples
   group lib.size norm.factors
43    WT 34384726    1.0096729
44    WT 30261598    0.9323146
45    WT 34315753    1.0440895
46    M1 29864666    0.9346562
47    M1 40398546    1.0077561
48    M1 32171705    1.0319720
49 other 34544186    1.0467520

I had previously tried to use your software but couldn't figure out how to set up the metadata object.

based on your comment above I can transform my counts data with cpm:

x_cpm <- cpm(x)

But how do I set up the metadata object?

Thanks again,

Adam

ADD REPLY
0
Entering edit mode

I see - thanks! Regarding the metadata, for all intents and purposes, the metadata object for PCAtools can be x$samples; however, the rownames of this object have to be equal to the colnames of the expression data.

Hope that this helps.

ADD REPLY

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6