Question

Batch effect and edgeR analysis

0

Entering edit mode

Sara ▴ 10

@sara-9865

Last seen 16 months ago

Germany

Hi all,

I have two different RNA-seq data from mouse; 1) paired-end read (40 bp) data of neonatal cardiac myocytes were dissociated from whole mouse hearts at 0, 4 and 7 days of birth with two replicates and 2) paired-end read (100 bp) data from the ventricular myocardium of mouse treated with factor X at 1th and 7 th days after birth with corresponding control with two replicates. I would like to get the differentially expressed genes between group 1 and group 2. Here is the MDS plot showing two distinct groups 1 and 2 at the right and left end of the graph:

Considering such a batch effect, could you please kindly tell me how I can include this batch effect in edgeR analysis?

Thank you in advance

batch effect edgeR RNA-seq • 1.0k views

ADD COMMENT • link updated 5.1 years ago by Gordon Smyth 50k • written 5.2 years ago by Sara ▴ 10

0

Entering edit mode

What do the labels "vp" and "vd" mean in your plot? Does that mean treatment and control, or something else?

You seem to have a treatment effect rather than a batch effect.

The yellow labels are completely unreadable, at least by me.

ADD REPLY • link 5.2 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

Sorry for the inconvenience. The vp group is the neonatal cardiac myocytes (dissociated from whole mouse hearts) at 0 (vp0), 4 (vp4) and 7 days (vp7) after birth with two replicates, here, RNA-seq data is 40 bp paired-end read. vd group is the ventricular myocardium of mouse treated with factor X at 1 day after birth (vdr1) and its control (vds1) as well as 7 days after birth (vdr7, yellow color) and its control (vds7) with two replicates, here, RNA-seq data is 100 bp paired-end read. I would like to find DE genes between vp and vd group at various time points. Given your previous comment, there isn’t any batch effect here, so I can do the analysis without considering it, yes? For example, for comparing the mouse treated with factor x after 1 day of birth (vd_r1) with the healthy mouse at 4 days of birth (vp4), after defining experimental conditions (groups and treatments) and normalization, I can use something like below in edgeR:

mc <- makeContrasts (vd1vsvp4 = (vd_r1 – vd_s1) - (vp4 – vp0))
lrt <- glmLRT(fit, contrast=mc[,"vd1vsvp4"])

Could you please kindly let me know if it’s correct?

Many thanks

ADD REPLY • link 5.1 years ago Sara ▴ 10

score 0 · Answer 1 · 2019-03-01

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 57 minutes ago

WEHI, Melbourne, Australia

If I understand your experiment correctly, you want to compare neonatal cardiac myocytes (vp) and ventricular myocardium (vd), but all the vp samples were sequenced to 40bp whereas all the vd samples were sequenced in a different batch at 100bp.

The design of this experiment has fundamental flaw in that the batches are entirely confounded with vp vs vd. That means it is impossible to remove the batch effect in edgeR.

You can still compare different times (vp7 vs vp0) or test for a treatment effect (vdr7 vs vds7), and you can interact such comparisons between vp and vd (for example vp7vsvp0 - vd7vsvd0), but you cannot make any direct comparisons of vp with vd, because you cannot separate the biological difference from the batch effect.

ADD COMMENT • link 5.1 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

Many thanks for your explanation. Could you please kindly share me your opinion about finding co-expressed genes for "vp" and vd" groups, separately, and then look for which genes different or similar among co-expressed genes between two groups? is it meaningful for comparing "vp vs vd" in your professional view?

Thank you

ADD REPLY • link 5.1 years ago Sara ▴ 10