Search
Question: is it necessary to check batch effect in this case? and how to?
0
gravatar for amoltej
13 months ago by
amoltej10
Australia
amoltej10 wrote:

Hello everyone,

I have a data set generated from 88 samples in a single run using all 8 flow cells. All libraries were mixed together and sequenced on all the flow cells at a time.

So Basically I received 8 fastq files corresponding to each sample. I merged all 8 files for each sample and created single fastq file per sample. 

My question is, is it necessary to look at the batch effect in this situation? if yes how shall I specify batch when I am using SVA package? If you have any other suggestion, that's also welcome.

Thank you

Amol

ADD COMMENTlink modified 13 months ago by Jakub30 • written 13 months ago by amoltej10
1
gravatar for Gordon Smyth
13 months ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

The flow cell does not constitute a batch effect. There is no effect to remove, and no way to remove it if there was one.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Gordon Smyth32k

Thanks for the reply Gordon Smyth.

Amol

ADD REPLYlink written 13 months ago by amoltej10
1
gravatar for Jakub
13 months ago by
Jakub30
United Kingdom
Jakub30 wrote:

If I understand correctly, you have run the same 88 samples in each of the flow cell lanes? In that case that's an optimal experimental design and goes a long way in preventing the sort of batch effects you might be worried about. I would argue that an optimal block design is the main answer to your question (http://www.genetics.org/content/185/2/405.full).

I personally do not merge at the fastq stage but do all my quality control of fastq files independently. That way you can still obtain data on the individual lanes and you could theoretically discard lanes that failed in some way. I would also do the mapping (alignment) independently (if you still do that), as you then can get your independent mapping statistics for each lane, which can again be helpful for QC. I then tend to merge my .bam files.

I agree with Gordon though, and I have never used lanes at later stages of the analysis, and I anticipate that would be futile and incorrect.

ADD COMMENTlink written 13 months ago by Jakub30

Thanks for the reply Jakub.

after merging respective fastq files, and quality control using FastQC and trimmomatic, I am getting only 1-2% reads discarded from each file. can I take these readings as an indication that all the lanes produced good quality reads?

Amol 

ADD REPLYlink written 13 months ago by amoltej10
1

Personally I think, unless something is seriously wrong with your experiment, that trimming the reads is both unnecessary and harmful. It is better to allow a good quality aligner to make decisions about this. See my comments to one of the referees in this workflow:

https://f1000research.com/articles/5-1438/

The workflow also give some brief guidelines regarding quality checking. The real proof is aligning the reads successfully to a good quality reference genome.

If you want to discuss QC further, it would be advisable to post a new question rather than to continue this thread (which is about batch correction).

ADD REPLYlink modified 13 months ago • written 13 months ago by Gordon Smyth32k

Glad to help. I assume you used default trimmomatic settings, which are reasonable. The result looks fine.

You will have to decide how much QC you do on 'lanes', and whether you want to know per lane GC content etc... Trimming is only one part of QC. There are many ways to explore biases from sequencing (they do exist), but it all depends if this is relevant to your question and block design.

ADD REPLYlink written 13 months ago by Jakub30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 316 users visited in the last hour