Question: Association analysis between gene expression and Phenotype
1
gravatar for hrishi27n
4 months ago by
hrishi27n20
hrishi27n20 wrote:

Hello All,

I have RNA data generated using the smart-seq2 protocol and I am interested in running association analysis (some thing like logistic regression but with limma) between the gene expression values and cohort status. I have quantified the transcripts as TPM using RSEM. I am considering normalizing the data with TMM, estimating mean-variance relationship using voomWithQualityWeights and then using lmFit and eBayes to estimate the association.

I am not interested in seeing difference between cohort but to predict cohort status using gene expression values.

Following is my dummy code:

 design <- model.matrix(~0+Cohort+sex+age, data=PhenoType)   #Cohort:1/0
 v <- voomWithQualityWeights(myNormalized_data, design=design, normalization="none", plot=TRUE)
 fit1 <- lmFit(v,design)
 fit2 <- eBayes(fit1)

All help and comments are highly appreciated.

Thanks.

limma rna singlecell • 211 views
ADD COMMENTlink modified 4 months ago by Aaron Lun24k • written 4 months ago by hrishi27n20

I’m removing the DESeq2 tag here.

ADD REPLYlink written 4 months ago by Michael Love25k

You're going to have to be more specific here. Put down some code and scientific context.

ADD REPLYlink written 4 months ago by Aaron Lun24k

Aaron, thank you for responding to my post. I have updated the post to include more info and code. Appreciate all your help and comments.

ADD REPLYlink written 4 months ago by hrishi27n20
Answer: Association analysis between gene expression and Phenotype
4
gravatar for Aaron Lun
4 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I'll assume this is single-cell RNA-seq data.

I have quantified the transcripts as TPM using RSEM.

You shouldn't use TPMs as input into voom, as stated here. Get the raw counts instead.

I am considering normalizing the data with TMM,

TMM may not play well with single-cell data if you have too many zeros. After all, M-values are not defined if either of the inputs is a zero. See here for details.

estimating mean-variance relationship using voomWithQualityWeights

The mean-variance relationship of the log-expression values is very wonky for single-cell data, see Figure 8 for an example. As a result, very high and very low coverage libraries have undue influence on the result because their variance is so much lower than the cells with moderate coverage.

and then using lmFit and eBayes to estimate the association.

I don't know what you mean by "estimate the association". Do you mean to detect differentially expressed genes between cohort?

I am not interested in seeing difference between cohort but to predict cohort status using gene expression values.

Well, then, you probably should be using a classifier rather than performing a differential expression analysis!

However, if you do want to identify DE genes, then I would suggest creating "pseudo-bulk" samples, as discussed here. This allows you to plug it into existing DE analysis pipelines without having to worry about single-cell variability. The full statistical reasoning is a bit more subtle, but you can read the linked article for that.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Aaron Lun24k

Aaron, Thank you for your input. This is very helpful.

ADD REPLYlink written 4 months ago by hrishi27n20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 166 users visited in the last hour