Question

Cell-type specific RNAseq analysis

0

Entering edit mode

Merienne Nicolas ▴ 120

@merienne-nicolas-6729

Last seen 6.4 years ago

Switzerland

Dear all,

We are working with RNAseq data to characterize specific cell populations. We have extracted 4 distinct cell populations (A, B, C and D) and performed Illumina RNAseq on these sample. Reads were mapped with TopHat and counts were determined with HTseq count. The sequencing platform has advised us to use edgeR-voom for data normalization and transformation and limma package for identification of differentially expressed genes. We compared each cell population one by one with these contrasts:

A vs B

A vs C

A vs D

B vs C

B vs D

C vs D

We obtained our lists of up and down regulated transcripts for each contrasts. However, we are interested to identify genes that are specifically expressed in one cell type and not in the others. We thought of 2 methods for this:

-first: take the 3 contrasts implying each cell populations (i.e A vs B, A vs C and A vs D for the cell population A) and extract genes that are differentially expressed in the 3 contrasts. With this, we obtained a few number of "cell-type specific transcripts" (classically between 100-200).

-second: design new contrasts comparing each cell type with all the other (i.e A vs (B+C+D)) and apply limma. With this method, the vast majority of the genes have significant adjusted p values (but all have negative logFC, indicating they are not specific for the cell population A...)

It seems evident for us that the second method is not suitable but the reasons are not really clear (we are thinking that pooling all the populations creates an imbalance for the analysis, as if we are comparing A with mean of B+C+D). However, is our first method right or is there another way to statistically identify cell-type specific mRNA?

Please, do not hesitate to indicate me if my explanations are not clear.

Thank you in advance.

Best regards,

Nicolas

rnaseq • 1.4k views

ADD COMMENT • link updated 9.6 years ago by James W. MacDonald 65k • written 9.6 years ago by Merienne Nicolas ▴ 120

score 1 · Answer 1 · 2014-09-18

Hi Nicolas, You can do it either way (well, the second way with a modification), but you are doing two different things. In the first case you are finding genes that are consistently differentially expressed between A and the other three cell types. Think Venn diagram, where your genes are in the center of a three-circle Venn diagram. There will be some genes that are unique to each individual contrast, as well as those that are in the individual intersections. The second way, what you really want to do is the contrast (A vs (B+C+D)/3), where you are comparing the A cell type versus the mean expression of the other three types. If you don't take the average of the B+C+D, what you are testing for are genes where the expression in A is equal to the sum of the expression in B, C, and D (or conversely, you are looking for genes in A that are 3X the average expression in B+C+D). So for glmLRT, you would do something like contrast = c(1,-0.33,-0.33, -0.33), assuming that your design is A B C D. Does that make sense? Best, Jim On Thu, Sep 18, 2014 at 3:25 PM, Merienne Nicolas on Biostar < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Merienne Nicolas <https: support.bioconductor.org="" u="" 6729=""/> wrote Question: > Cell-type specific RNAseq analysis > <https: support.bioconductor.org="" p="" 61550=""/>: > > Dear all, > > > > We are working with RNAseq data to characterize specific cell populations. > We have extracted 4 distinct cell populations (A, B, C and D) and performed > Illumina RNAseq on these sample. Reads were mapped with TopHat and counts > were determined with HTseq count. The sequencing platform has advised us to > use edgeR-voom for data normalization and transformation and limma package > for identification of differentially expressed genes. We compared each cell > population one by one with these contrasts: > > A vs B > > A vs C > > A vs D > > B vs C > > B vs D > > C vs D > > We obtained our lists of up and down regulated transcripts for each > contrasts. However, we are interested to identify genes that are > specifically expressed in one cell type and not in the others. We thought > of 2 methods for this: > > -first: take the 3 contrasts implying each cell populations (i.e A vs B, A > vs C and A vs D for the cell population A) and extract genes that are > differentially expressed in the 3 contrasts. With this, we obtained a few > number of "cell-type specific transcripts" (classically between 100-200). > > -second: design new contrasts comparing each cell type with all the other > (i.e A vs (B+C+D)) and apply limma. With this method, the vast majority of > the genes have significant adjusted p values (but all have negative logFC, > indicating they are not specific for the cell population A...) > > It seems evident for us that the second method is not suitable but the > reasons are not really clear (we are thinking that pooling all the > populations creates an imbalance for the analysis, as if we are comparing A > with mean of B+C+D). However, is our first method right or is there another > way to statistically identify cell-type specific mRNA? > > > > Please, do not hesitate to indicate me if my explanations are not clear. > > Thank you in advance. > > > > Best regards, > > > > Nicolas > > ------------------------------ > > You may reply via email or visit Cell-type specific RNAseq analysis > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099