Question: is it necessary to check batch effect in this case? and how to?
gravatar for amoltej
16 months ago by
amoltej10 wrote:

Hello everyone,

I have a data set generated from 88 samples in a single run using all 8 flow cells. All libraries were mixed together and sequenced on all the flow cells at a time.

So Basically I received 8 fastq files corresponding to each sample. I merged all 8 files for each sample and created single fastq file per sample. 

My question is, is it necessary to look at the batch effect in this situation? if yes how shall I specify batch when I am using SVA package? If you have any other suggestion, that's also welcome.

Thank you


ADD COMMENTlink modified 16 months ago by Jakub30 • written 16 months ago by amoltej10
gravatar for Gordon Smyth
16 months ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

The flow cell does not constitute a batch effect. There is no effect to remove, and no way to remove it if there was one.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Gordon Smyth32k

Thanks for the reply Gordon Smyth.


ADD REPLYlink written 16 months ago by amoltej10
gravatar for Jakub
16 months ago by
United Kingdom
Jakub30 wrote:

If I understand correctly, you have run the same 88 samples in each of the flow cell lanes? In that case that's an optimal experimental design and goes a long way in preventing the sort of batch effects you might be worried about. I would argue that an optimal block design is the main answer to your question (

I personally do not merge at the fastq stage but do all my quality control of fastq files independently. That way you can still obtain data on the individual lanes and you could theoretically discard lanes that failed in some way. I would also do the mapping (alignment) independently (if you still do that), as you then can get your independent mapping statistics for each lane, which can again be helpful for QC. I then tend to merge my .bam files.

I agree with Gordon though, and I have never used lanes at later stages of the analysis, and I anticipate that would be futile and incorrect.

ADD COMMENTlink written 16 months ago by Jakub30

Thanks for the reply Jakub.

after merging respective fastq files, and quality control using FastQC and trimmomatic, I am getting only 1-2% reads discarded from each file. can I take these readings as an indication that all the lanes produced good quality reads?


ADD REPLYlink written 16 months ago by amoltej10

Personally I think, unless something is seriously wrong with your experiment, that trimming the reads is both unnecessary and harmful. It is better to allow a good quality aligner to make decisions about this. See my comments to one of the referees in this workflow:

The workflow also give some brief guidelines regarding quality checking. The real proof is aligning the reads successfully to a good quality reference genome.

If you want to discuss QC further, it would be advisable to post a new question rather than to continue this thread (which is about batch correction).

ADD REPLYlink modified 16 months ago • written 16 months ago by Gordon Smyth32k

Glad to help. I assume you used default trimmomatic settings, which are reasonable. The result looks fine.

You will have to decide how much QC you do on 'lanes', and whether you want to know per lane GC content etc... Trimming is only one part of QC. There are many ways to explore biases from sequencing (they do exist), but it all depends if this is relevant to your question and block design.

ADD REPLYlink written 16 months ago by Jakub30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 131 users visited in the last hour