Question

T-test for microarray data

0

Entering edit mode

eslamias • 0

@eslamias-7413

Last seen 9.2 years ago

United States

Hello

I have a micrroarray data with 2667 variale for 16 sample and I want to check th ediffrental expression genes between control and sample . I want to use R but each method I saw in internet about t test in R doesn't work . I want to use limma package ,

some of my error are this

 t.test(dataset.1 = NULL,
+ alternative = c("two.sided", "less", "greater"),
+ mu = 0, paired = FALSE, var.equal = FALSE,
+ conf.level = 0.95)
Error in t.test.default(dataset.1 = NULL, alternative = c("two.sided",  : 
  argument "x" is missing, with no default

exprSet = read.delim("GSE53179_series_matrix.txt")
 colnames(exprSet)
#how I can read more than 1 gene???
dataset.1 = exprSet[1, c("GSM1286068", "GSM1286072")]
dataset.2 = exprSet[1, c("GSM1286073", "GSM1286083")]
#Another way of putting data 
dataset.1 = exprSet[1, c(1,5)]
dataset.2 = exprSet[1, c(6,16)]
#t-test for one gene in each sample
t.test.gene.1 = t.test(dataset.1, dataset.2, "two.sided")
#read the samples , 
dataset.1
dataset.2
# Print just the p-value from the t-test
t.test.gene.1$p.value

this one give me the anser jist for one gene but I need for all 16 sample the include 5 control and 11 patients and all 2667 gene of them

please help me to find a code which work

limma microarray t.test • 6.2k views

ADD COMMENT • link updated 9.2 years ago by Martin Morgan 25k • written 9.2 years ago by eslamias • 0

2

Entering edit mode

I recommend you read the Limma User's Guide, which explains in detail how to use limma to analyze microarray data. Anything else I write here would just be duplicating the explanation in the guide.

ADD REPLY • link 9.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I read everything and I m not sure I can use limma

my data neither is raw file nor from a gds experiments I think I should use another method in r for t test o you have any idea ?

ADD REPLY • link 9.2 years ago eslamias • 0

0

Entering edit mode

If your data is from GEO, you can use the GEOQuery package to get it into an ExpressionSet. But you can also use limma directly on your data matrix.

ADD REPLY • link 9.2 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2015-03-08

Hopefully you are familiar with basic R data types like vectors, factors, matrix, list, and data.frame, as well as basic data input functions like read.table(), scan(), and related functions. If not a starting point is the documentation that comes with R, especially "An Introduction to R" and "R Data Import / Export".

If you had a matrix of values, where rows were independent of one another and values in each row conformed to the usual t-test assumptions, you might use the genefilter package rowttests() function.

> m <- matrix(rnorm(2667 * 16), nrow=2667, dimnames=list(NULL, LETTERS[1:16]))
> f <- factor(c(rep("Control", 11), rep("Treatment", 5)))
> ttests <- rowttests(m, f)
> head(ttests)
    statistic          dm     p.value
1 -3.73436323 -1.56396205 0.002221023
2  0.33806524  0.22477913 0.740330027
3  0.57756344  0.38064961 0.572731198
4  1.56461409  0.90778264 0.139990940
5 -0.07785181 -0.04329018 0.939047670
6 -0.80640959 -0.36288556 0.433492328

HOWEVER, your rows are not independent and individual rows do not follow the assumptions of standard t-tests; the above approach will be under-powered and result in inflated false discovery rate. Fortunately, you can 'stand on the shoulders of giants' and benefit from sophisticated understanding of microarrays. I used the GEOquery package to retrieve the data, and the limma package for analysis.

library(GEOquery)
library(limma)

I retrieved the data using getGEO(). This returns a list of experiments, although in the present case there is only one experiment.

> gse_list = getGEO("GSE53179")
> class(gse_list)
[1] "list"
> names(gse_list)
[1] "GSE53179_series_matrix.txt.gz"
> gse = gse_list[["GSE53179_series_matrix.txt.gz"]]
> class(gse)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

I took a look at the 'pData' (description of sample phenotypes) of the expression set

> View(pData(gse))

and guessed that you were interested in the characteristics_ch1.1 column. I specified the statistical model I'd like to apply (care needs to be taken here!)

> design = model.matrix(~characteristics_ch1.1, pData(gse))

and then let limma fit the statistical model to each gene

> fit = lmFit(gse, design)  ## THIS STEP IS NOT CORRECT!

Finally, I assessed statistical significance and generated a 'top table' of differentially expressed miRNAs.

> topTable(eBayes(fit))

I do not have to use an ExpressionSet in lmFit(); I could have used a matrix() of expression values and created the design matrix from any factor or factors describing the allocation of samples (matrix columns) to treatment groups. Getting the matrix of expression values from files downloaded from GEO, or from some other source, takes some level of R knowledge.

HOWEVER, returning to the lmFit() step, the help page for lmFit says that the expression values should be log-ratios or log-values of expression, and the general assumption is that these have been pre-processed (possibly background corrected and normalized) in a standard microarray work flow. But when I look at the distribution of expression values

> range(exprs(gse))
[1]    -18.59629 431352.00000
> hist(exprs(gse))
> table(exprs(gse) < 0)

FALSE  TRUE 
22099 19757

it's clear that I do not have log ratios or log values of expression (the range is too large for log-ratios, and in addition log values would be strictly positive). So you will have to explore the provenance of the data, identify how it was processed before adding to GEO, and from there determine what remedial steps are required. If the data have been pre-processed in a reasonable manner there may be a simple transformation of the data that makes it appropriate for analysis in limma. On the other hand it may be that the provenance of the data is not clear, or the steps taken inappropriate or sub-optimal. You might then start from the supplementary files included with the GEO record, do your own pre-processing, and then use limma as above. A starting point for a new analysis might be this article and associated package, but I am not an expert in the analysis of miRNAs.