Differential expression analysis - Clariom S array -workflow
1
0
Entering edit mode
ala • 0
@1c2265e5
Last seen 12 weeks ago
Malaysia

Hi, I am trying to determine the differentially expressed genes between two groups, the raw data in form of CEL files generated by the Clariom S array, The two groups of comparison, Case=6 arrays, Control= 7 arrays. (total 13 arrays).

My question is

• How I can filter the genes with low variance and QC probes.
• considering my experiment design, how to design a matrix?

Code should be placed in three backticks as shown below

library(oligo)
library(affycoretools)
library(limma)
library(clariomshumantranscriptcluster.db)
library(pd.clariom.s.human)
list.celfiles()
probeset.eset=rma(rawdata, background=TRUE, normalize=TRUE, subset=NULL)
probeset.eset <- annotateEset(probeset.eset,annotation(probeset.eset))

#  I Stopped here cause I don't know how to formulate the design matrix


# please also include the results of running the following in an R session

sessionInfo( )



design-matrix limma • 352 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Let's assume you have a data.frame that lists the different sample types, like this:

> samples <- data.frame(Status = factor(rep(c("Case","Control"), c(6,7))))
> samples
Status
1     Case
2     Case
3     Case
4     Case
5     Case
6     Case
7  Control
8  Control
9  Control
10 Control
11 Control
12 Control
13 Control


Now it's a simple matter of specifying the design matrix and making the comparisons.

> library(limma)
> design <- model.matrix(~ 0 + Status, samples)
> colnames(design) <- gsub("Status", "", colnames(design))
> cont <- makeContrasts(Case - Control, levels = design)
> design
Case Control
1     1       0
2     1       0
3     1       0
4     1       0
5     1       0
6     1       0
7     0       1
8     0       1
9     0       1
10    0       1
11    0       1
12    0       1
13    0       1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$Status [1] "contr.treatment" > cont Contrasts Levels Case - Control Case 1 Control -1 > fit <- lmFit(probeset.eset, design) > fit2 <- contrasts.fit(fit, cont) > fit2 <- eBayes(fit2) > topTable(fit2,1)  Also see section 9 in the limma User's Guide, which starts on p41. 0 Entering edit mode Thank you so much for your quick response. regarding filtering step, how to filter the genes with low variance in order to increase the number of differentially expressed genes. ADD REPLY 0 Entering edit mode It's a bad idea to filter genes based on variance. The eBayes step uses an estimate of the variance across all genes measured as a prior, and if you filter out the low variance genes you will probably bias towards the null (e.g., it will be worse than what you are getting already). You can filter on other things though, like requiring the expression level of at least 6 samples to be larger than some value or getting rid of probesets that don't have any known transcript (and all the controls). I would in general wait to filter the fit2 object though, rather than prior to that. If you used annotateEset on your probeset.eset object, the topTable results should have things like the gene name and symbol. You could do something like fit2 <- fit2[!is.na(fit2$genes$SYMBOL),]  To get an MArrayLM object that just has the 'real' genes (for some definition of 'real'). As an example, here's some random data I got from GEO: > dat <- read.celfiles(list.celfiles(listGzipped = TRUE)) Platform design info loaded. Reading in : GSM5994355_01_M38_L_WT_Clariom_S_Mouse_.CEL.gz Reading in : GSM5994356_02_M40_L_WT_Clariom_S_Mouse_.CEL.gz Reading in : GSM5994357_03_M31_L_WT_Clariom_S_Mouse_.CEL.gz Reading in : GSM5994358_04_M39_L_KO_Clariom_S_Mouse_.CEL.gz Reading in : GSM5994359_05_M41_L_KO_Clariom_S_Mouse_.CEL.gz Reading in : GSM5994360_06_M32_L_KO_Clariom_S_Mouse_.CEL.gz > eset <- rma(dat) Background correcting Normalizing Calculating Expression > library(clariomsmousetranscriptcluster.db) > eset <- annotateEset(eset, clariomsmousetranscriptcluster.db) 'select()' returned 1:many mapping between keys and columns 'select()' returned 1:many mapping between keys and columns 'select()' returned 1:many mapping between keys and columns > design <- model.matrix(~0+ gl(2,3)) > colnames(design) <- c("WT","KO") > cont <- makeContrasts(KO - WT, levels = design) > fit2 <- eBayes(contrasts.fit(lmFit(eset, design), cont)) > topTable(fit2) PROBEID ENTREZID SYMBOL MG-1-neg-10340613_st MG-1-neg-10340613_st <NA> <NA> TC1900000529.mm.2 TC1900000529.mm.2 16551 Kif11 MG-1-neg-10342962_st MG-1-neg-10342962_st <NA> <NA> MG-1-neg-10339690_st MG-1-neg-10339690_st <NA> <NA> MG-1-neg-10340578_st MG-1-neg-10340578_st <NA> <NA> TC0800000629.mm.2 TC0800000629.mm.2 97165 Hmgb2 MG-1-neg-10344413_st MG-1-neg-10344413_st <NA> <NA> MG-1-neg-10338594_st MG-1-neg-10338594_st <NA> <NA> MG-1-neg-10340922_st MG-1-neg-10340922_st <NA> <NA> MG-1-neg-10343337_st MG-1-neg-10343337_st <NA> <NA> GENENAME logFC AveExpr t MG-1-neg-10340613_st <NA> 1.6654789 3.835293 11.627987 TC1900000529.mm.2 kinesin family member 11 -1.2753855 3.116560 -9.227560 MG-1-neg-10342962_st <NA> -1.4670381 2.298939 -8.504792 MG-1-neg-10339690_st <NA> 1.1554747 3.613898 8.187279 MG-1-neg-10340578_st <NA> 1.0112744 2.497108 8.112280 TC0800000629.mm.2 high mobility group box 2 -1.3608453 4.484379 -7.432833 MG-1-neg-10344413_st <NA> -1.4715614 2.282866 -6.489185 MG-1-neg-10338594_st <NA> -0.9325457 4.518174 -6.487135 MG-1-neg-10340922_st <NA> 0.7789265 4.421446 6.462993 MG-1-neg-10343337_st <NA> -0.9719129 2.868469 -6.194258 P.Value adj.P.Val B MG-1-neg-10340613_st 1.000045e-05 0.2913031 -3.455081 TC1900000529.mm.2 4.412922e-05 0.5772712 -3.497761 MG-1-neg-10342962_st 7.378051e-05 0.5772712 -3.517216 MG-1-neg-10339690_st 9.357378e-05 0.5772712 -3.527199 MG-1-neg-10340578_st 9.908874e-05 0.5772712 -3.529704 TC0800000629.mm.2 1.700041e-04 0.8253414 -3.555369 MG-1-neg-10344413_st 3.858223e-04 0.9989105 -3.602326 MG-1-neg-10338594_st 3.865476e-04 0.9989105 -3.602446 MG-1-neg-10340922_st 3.952060e-04 0.9989105 -3.603866 MG-1-neg-10343337_st 5.078991e-04 0.9989105 -3.620542  Filter to just genes with symbols > fit2 <- fit2[!is.na(fit2$genes\$SYMBOL),]
> topTable(fit2)
PROBEID  ENTREZID  SYMBOL
TC1900000529.mm.2 TC1900000529.mm.2     16551   Kif11
TC0800000629.mm.2 TC0800000629.mm.2     97165   Hmgb2
TC1300001926.mm.2 TC1300001926.mm.2    105193  Nhlrc1
TC0700003439.mm.2 TC0700003439.mm.2    117589    Asb7
TC1700001407.mm.2 TC1700001407.mm.2    381059 Gm1604b
TC0700001721.mm.2 TC0700001721.mm.2    233806 Tmem159
TC0400004202.mm.2 TC0400004202.mm.2 100041433  Zfp981
TC1700002087.mm.2 TC1700002087.mm.2    258324 Olfr129
TC1900001479.mm.2 TC1900001479.mm.2     13099 Cyp2c40
TC0900001845.mm.2 TC0900001845.mm.2    244723   Olfm2
GENENAME
TC1900000529.mm.2                               kinesin family member 11
TC0800000629.mm.2                              high mobility group box 2
TC1300001926.mm.2                                NHL repeat containing 1
TC0700003439.mm.2               ankyrin repeat and SOCS box-containing 7
TC1700001407.mm.2                                   predicted gene 1604b
TC0700001721.mm.2                              transmembrane protein 159
TC0400004202.mm.2                                zinc finger protein 981
TC1700002087.mm.2                                 olfactory receptor 129
TC1900001479.mm.2 cytochrome P450, family 2, subfamily c, polypeptide 40
TC0900001845.mm.2                                         olfactomedin 2
TC1900000529.mm.2 -1.2753855 3.116560 -9.227560 4.412922e-05 0.9265812
TC0800000629.mm.2 -1.3608453 4.484379 -7.432833 1.700041e-04 0.9995449
TC1300001926.mm.2  0.6125437 4.701354  5.304300 1.238365e-03 0.9995449
TC0700003439.mm.2  0.7175511 4.188782  5.157766 1.447949e-03 0.9995449
TC1700001407.mm.2 -0.5834391 3.688742 -5.095065 1.549502e-03 0.9995449
TC0700001721.mm.2 -0.6385654 4.450923 -5.069707 1.592808e-03 0.9995449
TC0400004202.mm.2  0.6481561 3.408812  5.064616 1.601665e-03 0.9995449
TC1700002087.mm.2  0.7947954 3.422708  5.003970 1.711499e-03 0.9995449
TC1900001479.mm.2 -1.1836657 7.789961 -4.953239 1.809845e-03 0.9995449
TC0900001845.mm.2 -0.8662049 3.467330 -4.951292 1.813744e-03 0.9995449
B
TC1900000529.mm.2 -3.497761
TC0800000629.mm.2 -3.555369
TC1300001926.mm.2 -3.689537
TC0700003439.mm.2 -3.703369
TC1700001407.mm.2 -3.709539
TC0700001721.mm.2 -3.712078
TC0400004202.mm.2 -3.712591
TC1700002087.mm.2 -3.718783
TC1900001479.mm.2 -3.724080
TC0900001845.mm.2 -3.724286


Which still didn't give me any significant genes...

0
Entering edit mode

You could also hypothetically filter on the negative controls, but they don't seem that useful:

> summary(rowMeans(exprs(eset)))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.048   3.499   4.930   5.249   6.769  13.451
> summary(rowMeans(exprs(eset)[grep("^MG-1-neg", row.names(eset)),]))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.048   2.629   3.643   3.780   4.638  13.034
`

This is because the GC content for the controls varies from 0 - 100%, and once you get past about 40% GC content, the probes will bind to anything. You could filter on GC content of the controls, but that's more work than I have the time to do right now.

0
Entering edit mode

Thank you for your explanation, really appreciated it.

what about filtering genes based on intensity, is it going to improve the result, as you can see no gene shows differential expression.

Before using the R software I used the TAC, and the result showed that 963 genes were differentially expressed after correction (using FDR p value) the list diminished to one gene only (downregulated with -13 FC). However, now no gene is differentially expressed and that gene appears to be upregulated with FC 2. which result should I consider?

0
Entering edit mode

please guide me on how to do what you have suggested here, (You can filter on other things though, like requiring the expression level of at least 6 samples to be larger than some value or getting rid of probesets that don't have any known transcript (and all the controls). I would in general wait to filter the fit2 object though, rather than prior to that). if it will improve the result. thank you so much.