Closed:limma: How to define correct design matrix for fitting linear model and keep highly correlated genes?
0
0
Entering edit mode
@jurat-shahidin-9488
Last seen 4.0 years ago
Chicago, IL, USA

Hi:

I am going to ask this question in Bioconductor support site because I tried several ways to do genes filtering by a measuing correlation coefficient, but still no luck with it. So I have no choice to raise my concern here and hearing expert's opinion and community help. I found some nice thread in Biostar but still some issue exists. Any criticism about my question would be appreciated.

let me get straight to my point. I have Affymetrix gene level expression matrix (genes on the rows and sample ID on the columns), and I have annotation data of microarray experiment observation where sample ID in the rows and description identification on the columns.

> head(expr_mat[1:4, 1:3])
        Tarca_001_P1A01 Tarca_003_P1A03 Tarca_004_P1A04
1_at           6.062215        6.125023        5.875502
10_at          3.796484        3.805305        3.450245
100_at         5.849338        6.191562        6.550525
1000_at        3.567779        3.452524        3.316134

> dim(expr_mat)
[1] 120000   832

    > head(phenoDat)
         SampleID   GA Batch     Set Train Platform
1 Tarca_001_P1A01 11.0     1 PRB_HTA     1    HTA20
2 Tarca_013_P1B01 15.3     1 PRB_HTA     1    HTA20
3 Tarca_025_P1C01 21.7     1 PRB_HTA     1    HTA20
4 Tarca_037_P1D01 26.7     1 PRB_HTA     1    HTA20
5 Tarca_049_P1E01 31.3     1 PRB_HTA     1    HTA20
6 Tarca_061_P1F01 32.1     1 PRB_HTA     1    HTA20

> dim(phenoDat)
[1] 832   6

I intend to see how the genes in each sample from expr_mat are correlated with GA value of corresponding samples in phenoDat. I learned a few threads from Biostar and using phenoDat@GA as a covariate, I come up following simple solution to fit the linear model in order to see the correlation. Here is what I tried by using limma user guide:

library(limma)
fit <- limma::lmFit(exprs_mat, design = model.matrix( ~ 0 + t(phenoDat$GA))
fit <- eBayes(fit)
topTable(fit, coef=2)

but I got dimension error because of my way of constructing the design matrix was wrong. I don't know how to build a proper design matrix and find highly correlated genes by using lmFit from limma. Any thought?

goal:

I just want to keep the genes from exprs_mat which has high correlated with phenoDat$GA by sample-wise. Essentially, I am expecting to get a correlation matrix and to subset original expression matrix with threshold, something like this:

cors <- stats::cor(t(exprs_mat))
idx <- which( (abs(cors) > 0.8) & (upper.tri(cors)), arr.ind=TRUE)
idx <- unique(c(idx[, 1],idx[, 2])
correlated.genes <- exprs_mat[idx, ]

How can I do that? any possible approach to get this done? Thanks

r limma • 90 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6