Trying to do a PCA using edgeR with expression data
2
0
Entering edit mode
waalkes • 0
@waalkes-24135
Last seen 23 months ago

Hi,

I am trying to do a pca plot for some gene expression data in R using edgeR. I have done a MDS plot but was hoping to do a pca. I have transformed my data into TPM to do this analysis. I have three wild type samples, three with one phenotype and and one from a second phenotype. The a bit of the data is below. Matrix is 7 samples x ~27000 genes. Here is how I did the MDS plot:

group2 <- r("WT","WT","WT","M1","M1","M1","other")
y_gene <- DGEList(X200909_ genes_TPM [c(2:8)], group=group2, lib.size = colSums(X200909_ genes_TPM [c(2:8)]))
y_gene <- calcNormFactors(y_gene)
plotMDS(y_gene)


Any help appreciated.

WT2-P1  WT3-P1  WT4-P1  M4-P1   M5-P1   M6-P1   F5-P1
gene
A1BG    51.27004445 65.36898684 59.90938705 31.66942191 29.18648916 46.55213548 58.65045307
A1CF    0   0   0   0   0   0   0
A2ML1   0   0   0   0   1.389832817 0   0
A3GALT2 0   0   0   0   0   0   0
A4GALT  0   0   0   0   0   0   0
A4GNT   0   0   0   0   0   0   0
AAAS    0   2.108676995 0   0   1.389832817 3.879344624 1.629179252
AACS    1.709001482 0   0   0   0   0   0
AADACL2 0   0   0   0   0   0   0
AADACL3 0   0   0   0   0   0   0
AADACL4 0   0   0   0   0   0   0
AADAT   0   0   1.872168345 2.111294794 0   1.939672312 0
AAED1   0   0   0   0   0   0   0
AAGAB   3.418002963 2.108676995 7.488673382 2.111294794 2.779665634 9.698361559 4.887537756
AAK1    1.709001482 0   3.744336691 2.111294794 0   1.939672312 1.629179252
AAMDC   1.709001482 0   5.616505036 0   0   1.939672312 0

edger expression data pca • 2.6k views
1
Entering edit mode
@gordon-smyth
Last seen 4 minutes ago
WEHI, Melbourne, Australia

plotMDS(x)


To make a PCA plot, simply use

plotMDS(x, gene.selection="common")


Don't use TPMs

Transforming to TPM is absolutely unnecessary and will just make everything work poorly. As the edgeR User's Guide explains, nothing in edgeR is designed not to work on TPMs and that includes DGEList, calcNormFactors and plotMDS.

0
Entering edit mode
@kevin
Last seen 3 hours ago
Republic of Ireland

Hi,

I am not sure that your workflow is suitable. The last time that I checked, DGEList() expects raw counts, not TPM expression levels. TPM is suitable for neither differential expression analysis nor any other analysis where cross-sample differences are being measured. There is 'debate' about the usage of this (and other, like FPKM, RPKM) expression metrics generally in bioinformatics.

A very simple workflow from the User's Guide is:

y <- DGEList(counts = x)
y <- calcNormFactors(y)
plotMDS(y)


For PCA, I'd recommend my own package, PCAtools ( http://www.bioconductor.org/packages/devel/bioc/html/PCAtools.html ), and, for this, you 'could' use the TPM data that you already seem to have (possibly first transforming it via log(TPM + 1)), but it is not ideal. I'd rather you started from raw counts, go through the standard EdgeR workflow, use plotMDS(), and then transform your EdgeR counts via cpm() for usage in PCAtools.

Kevin

0
Entering edit mode

Thanks Kevin,

I did as you suggested and went back to the counts and created the MDS plot:

> X200910_genes <- read_excel("C:/200910_genes.xlsx")
> x <- DGEList(counts = X200910_genes[c(2:8)],group=group2)
> x <- calcNormFactors(x)
> plotMDS(x)
> x$samples group lib.size norm.factors 43 WT 34384726 1.0096729 44 WT 30261598 0.9323146 45 WT 34315753 1.0440895 46 M1 29864666 0.9346562 47 M1 40398546 1.0077561 48 M1 32171705 1.0319720 49 other 34544186 1.0467520  I had previously tried to use your software but couldn't figure out how to set up the metadata object. based on your comment above I can transform my counts data with cpm: x_cpm <- cpm(x)  But how do I set up the metadata object? Thanks again, Adam ADD REPLY 0 Entering edit mode I see - thanks! Regarding the metadata, for all intents and purposes, the metadata object for PCAtools can be x$samples; however, the rownames of this object have to be equal to the colnames of the expression data.

Hope that this helps.