Question: Batch effect and edgeR analysis
0
8 months ago by
Sara0
Sara0 wrote:

Hi all,

I have two different RNA-seq data from mouse; 1) paired-end read (40 bp) data of neonatal cardiac myocytes were dissociated from whole mouse hearts at 0, 4 and 7 days of birth with two replicates and 2) paired-end read (100 bp) data from the ventricular myocardium of mouse treated with factor X at 1th and 7 th days after birth with corresponding control with two replicates. I would like to get the differentially expressed genes between group 1 and group 2. Here is the MDS plot showing two distinct groups 1 and 2 at the right and left end of the graph:

Considering such a batch effect, could you please kindly tell me how I can include this batch effect in edgeR analysis?

edger rna-seq batch effect • 225 views
modified 8 months ago by Gordon Smyth39k • written 8 months ago by Sara0

What do the labels "vp" and "vd" mean in your plot? Does that mean treatment and control, or something else?

You seem to have a treatment effect rather than a batch effect.

The yellow labels are completely unreadable, at least by me.

Hi Gordon,

Sorry for the inconvenience. The vp group is the neonatal cardiac myocytes (dissociated from whole mouse hearts) at 0 (vp0), 4 (vp4) and 7 days (vp7) after birth with two replicates, here, RNA-seq data is 40 bp paired-end read. vd group is the ventricular myocardium of mouse treated with factor X at 1 day after birth (vdr1) and its control (vds1) as well as 7 days after birth (vdr7, yellow color) and its control (vds7) with two replicates, here, RNA-seq data is 100 bp paired-end read. I would like to find DE genes between vp and vd group at various time points. Given your previous comment, there isn’t any batch effect here, so I can do the analysis without considering it, yes? For example, for comparing the mouse treated with factor x after 1 day of birth (vd_r1) with the healthy mouse at 4 days of birth (vp4), after defining experimental conditions (groups and treatments) and normalization, I can use something like below in edgeR:

mc <- makeContrasts (vd1vsvp4 = (vd_r1 – vd_s1) - (vp4 – vp0))
lrt <- glmLRT(fit, contrast=mc[,"vd1vsvp4"])


Could you please kindly let me know if it’s correct?

Many thanks

Answer: Batch effect and edgeR analysis
0
8 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

If I understand your experiment correctly, you want to compare neonatal cardiac myocytes (vp) and ventricular myocardium (vd), but all the vp samples were sequenced to 40bp whereas all the vd samples were sequenced in a different batch at 100bp.

The design of this experiment has fundamental flaw in that the batches are entirely confounded with vp vs vd. That means it is impossible to remove the batch effect in edgeR.

You can still compare different times (vp7 vs vp0) or test for a treatment effect (vdr7 vs vds7), and you can interact such comparisons between vp and vd (for example vp7vsvp0 - vd7vsvd0), but you cannot make any direct comparisons of vp with vd, because you cannot separate the biological difference from the batch effect.

Hi Gordon,

Many thanks for your explanation. Could you please kindly share me your opinion about finding co-expressed genes for "vp" and vd" groups, separately, and then look for which genes different or similar among co-expressed genes between two groups? is it meaningful for comparing "vp vs vd" in your professional view?

Thank you