Question

Using SVA with RRBS data

0

Entering edit mode

mrodrigues.fernanda ▴ 10

@mrodriguesfernanda-9433

Last seen 7.0 years ago

University of Illinois, Urbana-Champaign

Hello

I have seen a lot of posts with this question, but have not found an answer to this. I am analyzing my RRBS data with methylKit, and I can see in my PCA plots that there are some batch effects.

Is there a way to use SVA to estimate surrogate variables from methylation data? If yes, how to do that properly?

I have used it before with RNA-seq data, but am not sure on how to use with methylation data.

Thank you!

sva RRBS • 1.8k views

ADD COMMENT • link updated 7.1 years ago by Jeff Leek ▴ 650 • written 7.1 years ago by mrodrigues.fernanda ▴ 10

0

Entering edit mode

Hi,

I want to use combat in order to remove the batch effects. Did you use the methylation levels or used the logit transformed values for this? Any further information would be really useful.

Thanks in advance!

ADD REPLY • link 5.3 years ago Pinki • 0

score 0 · Answer 1 · 2017-03-13

0

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 3.2 years ago

United States

Hello I am not very familiar with RRBS data but if you observe the batch effects on the pcs you should be able to apply sva directly. If the data are constrained to be positive you might consider svaseq. If you already know the batch effect variable you could use Combat or removeBatchEffects. Hope that helps Jeff On Fri, Mar 10, 2017, 6:26 PM frodrgs2 [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User frodrgs2 <https: support.bioconductor.org="" u="" 9433=""/> wrote Question: > Using SVA with RRBS data <https: support.bioconductor.org="" p="" 93722=""/>: > > Hello > > I have seen a lot of posts with this question, but have not found an > answer to this. I am analyzing my RRBS data with methylKit, and I can see > in my PCA plots that there are some batch effects. > > Is there a way to use SVA to estimate surrogate variables from methylation > data? If yes, how to do that properly? > > I have used it before with RNA-seq data, but am not sure on how to use > with methylation data. > > Thank you! > > ------------------------------ > > Post tags: sva, RRBS > > You may reply via email or visit Using SVA with RRBS data >

ADD COMMENT • link 7.1 years ago Jeff Leek ▴ 650

0

Entering edit mode

Jeff,
Thank you for your response.

I read of people using sva, but I am not sure on how to put the data in the right format for it.
The reason why I want to use sva in this data is that I am almost certain of the existence of unknown batch effects.
I have used the same samples for RNA-seq data, where sva estimated 6 surrogate variables and removing them made a huge difference.

Looking at my PCA plots from methylkit, I see the same "messy" clustering of samples. I do not know what exactly the batch effects are, but I believe sva could estimate them.

My main problem is putting the data into the right format for sva.
I am using the cytosine report files generated from the bismark methylation extractor, which gives me the following information: chr,start, strand, number of cytosines (methylated bases) , number of thymines (unmethylated bases),context and trinucletide context format.

This is how it looks like:

1   182   +   0   0   CG   CGA
1   183   -   0   0   CG   CGG
1   191   +   0   0   CG   CGC
1   192   -   0   0   CG   CGT
1   339   +   0   0   CG   CGA
1   340   -   0   0   CG   CGC
1   984   +   0   0   CG   CGG
1   985   -   0   0   CG   CGT
1   1095   +   0   0   CG   CGT
1   1096   -   0   0   CG   CGT
1   1176   +   0   0   CG   CGA
1   1177   -   0   0   CG   CGG
1   1243   +   0   0   CG   CGC
1   1244   -   0   0   CG   CGG
1   1560   +   0   0   CG   CGA
1   1561   -   0   0   CG   CGA
1   1660   +   0   0   CG   CGG

I am not sure if sva would work well with this data. I have read some people used the logit transformation for methylation percentages, but I am clueless on how to do that properly. Would you have any insight on how to input this data properly into sva? Thank you!

ADD REPLY • link 7.1 years ago mrodrigues.fernanda ▴ 10

1

Entering edit mode

Sorry about the delay in my response. The thing that you would need to do is get all of your samples into a position x sample matrix. You would need to do somethng like the following (and someone with more expertise in RRBS may have something to say about the first steps. You may want to talk to a statistician at your local place before doing this). 1. Divide the M/(M + U) for each position 2. Merge the files so that a single position/strand/start is a single row and each sample is a different column. 3. There will be many NaNs because you will have 0/(0 +0) for many positions, so you need to filter out any rows that have an NaN 4. Perform a logit transform on the data values 5. Feed this transformed data to sva as "dat" Hope that helps! Jeff On Tue, Mar 14, 2017 at 2:38 PM mrodrigues.fernanda [bioc] < noreply@bioconductor.org> wrote: Activity on a post you are following on support.bioconductor.org User mrodrigues.fernanda <https: support.bioconductor.org="" u="" 9433=""/> wrote Comment: Using SVA with RRBS data <https: support.bioconductor.org="" p="" 93722="" #93834="">: Jeff, Thank you for your response. I read of people using sva, but I am not sure on how to put the data in the right format for it. The reason why I want to use sva in this data is that I am almost certain of the existence of unknown batch effects. I have used the same samples for RNA-seq data, where sva estimated 6 surrogate variables and removing them made a huge difference. Looking at my PCA plots from methylkit, I see the same "messy" clustering of samples. I do not know what exactly the batch effects are, but I believe sva could estimate them. My main problem is putting the data into the right format for sva. I am using the cytosine report files generated from the bismark methylation extractor, which gives me the following information: chr,start, strand, number of cytosines (methylated bases) , number of thymines (unmethylated bases),context and trinucletide context format. This is how it looks like: 1 182 + 0 0 CG CGA 1 183 - 0 0 CG CGG 1 191 + 0 0 CG CGC 1 192 - 0 0 CG CGT 1 339 + 0 0 CG CGA 1 340 - 0 0 CG CGC 1 984 + 0 0 CG CGG 1 985 - 0 0 CG CGT 1 1095 + 0 0 CG CGT 1 1096 - 0 0 CG CGT 1 1176 + 0 0 CG CGA 1 1177 - 0 0 CG CGG 1 1243 + 0 0 CG CGC 1 1244 - 0 0 CG CGG 1 1560 + 0 0 CG CGA 1 1561 - 0 0 CG CGA 1 1660 + 0 0 CG CGG I am not sure if sva would work well with this data. I have read some people used the logit transformation for methylation percentages, but I am clueless on how to do that properly. Would you have any insight on how to input this data properly into sva? Thank you! ------------------------------ Post tags: sva, RRBS You may reply via email or visit C: Using SVA with RRBS data

ADD REPLY • link 7.1 years ago Jeff Leek ▴ 650