Entering edit mode
Hi, I am trying to determine the differentially expressed genes between two groups, the raw data in form of CEL files generated by the Clariom S array, The two groups of comparison, Case=6 arrays, Control= 7 arrays. (total 13 arrays).
My question is
- How I can filter the genes with low variance and QC probes.
- considering my experiment design, how to design a matrix?
Code should be placed in three backticks as shown below
library(oligo)
library(affycoretools)
library(limma)
library(clariomshumantranscriptcluster.db)
library(pd.clariom.s.human)
list.celfiles()
rawdata <- read.celfiles(list.celfiles())
probeset.eset=rma(rawdata, background=TRUE, normalize=TRUE, subset=NULL)
probeset.eset <- annotateEset(probeset.eset,annotation(probeset.eset))
# I Stopped here cause I don't know how to formulate the design matrix
include your problematic code here with any corresponding output
please also include the results of running the following in an R session
sessionInfo( )
```
Thank you so much for your quick response.
regarding filtering step, how to filter the genes with low variance in order to increase the number of differentially expressed genes.
It's a bad idea to filter genes based on variance. The
eBayes
step uses an estimate of the variance across all genes measured as a prior, and if you filter out the low variance genes you will probably bias towards the null (e.g., it will be worse than what you are getting already). You can filter on other things though, like requiring the expression level of at least 6 samples to be larger than some value or getting rid of probesets that don't have any known transcript (and all the controls). I would in general wait to filter thefit2
object though, rather than prior to that.If you used
annotateEset
on yourprobeset.eset
object, thetopTable
results should have things like the gene name and symbol. You could do something likeTo get an
MArrayLM
object that just has the 'real' genes (for some definition of 'real'). As an example, here's some random data I got from GEO:Filter to just genes with symbols
Which still didn't give me any significant genes...
You could also hypothetically filter on the negative controls, but they don't seem that useful:
This is because the GC content for the controls varies from 0 - 100%, and once you get past about 40% GC content, the probes will bind to anything. You could filter on GC content of the controls, but that's more work than I have the time to do right now.
Thank you for your explanation, really appreciated it.
what about filtering genes based on intensity, is it going to improve the result, as you can see no gene shows differential expression.
Before using the R software I used the TAC, and the result showed that 963 genes were differentially expressed after correction (using FDR p value) the list diminished to one gene only (downregulated with -13 FC). However, now no gene is differentially expressed and that gene appears to be upregulated with FC 2. which result should I consider?
please guide me on how to do what you have suggested here, (You can filter on other things though, like requiring the expression level of at least 6 samples to be larger than some value or getting rid of probesets that don't have any known transcript (and all the controls). I would in general wait to filter the fit2 object though, rather than prior to that). if it will improve the result. thank you so much.