Search
Question: Using SVA with RRBS data
0
gravatar for mrodrigues.fernanda
14 months ago by
University of Illinois, Urbana-Champaign
mrodrigues.fernanda10 wrote:

Hello

I have seen a lot of posts with this question, but have not found an answer to this. I am analyzing my RRBS data with methylKit, and I can see in my PCA plots that there are some batch effects.

Is there a way to use SVA to estimate surrogate variables from methylation data? If yes, how to do that properly?

I have used it before with RNA-seq data, but am not sure on how to use with methylation data.

Thank you!

ADD COMMENTlink modified 14 months ago by Jeff Leek510 • written 14 months ago by mrodrigues.fernanda10
0
gravatar for Jeff Leek
14 months ago by
Jeff Leek510
United States
Jeff Leek510 wrote:
Hello I am not very familiar with RRBS data but if you observe the batch effects on the pcs you should be able to apply sva directly. If the data are constrained to be positive you might consider svaseq. If you already know the batch effect variable you could use Combat or removeBatchEffects. Hope that helps Jeff On Fri, Mar 10, 2017, 6:26 PM frodrgs2 [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User frodrgs2 <https: support.bioconductor.org="" u="" 9433=""/> wrote Question: > Using SVA with RRBS data <https: support.bioconductor.org="" p="" 93722=""/>: > > Hello > > I have seen a lot of posts with this question, but have not found an > answer to this. I am analyzing my RRBS data with methylKit, and I can see > in my PCA plots that there are some batch effects. > > Is there a way to use SVA to estimate surrogate variables from methylation > data? If yes, how to do that properly? > > I have used it before with RNA-seq data, but am not sure on how to use > with methylation data. > > Thank you! > > ------------------------------ > > Post tags: sva, RRBS > > You may reply via email or visit Using SVA with RRBS data >
ADD COMMENTlink written 14 months ago by Jeff Leek510

Jeff,
Thank you for your response.

I read of people using sva, but I am not sure on how to put the data in the right format for it.
The reason why I want to use sva in this data is that I am almost certain of the existence of unknown batch effects.
I have used the same samples for RNA-seq data, where sva estimated 6 surrogate variables and removing them made a huge difference.

Looking at my PCA plots from methylkit, I see the same "messy" clustering of samples. I do not know what exactly the batch effects are, but I believe sva could estimate them.

My main problem is putting the data into the right format for sva.
I am using the cytosine report files generated from the bismark methylation extractor, which gives me the following information: chr,start, strand, number of cytosines (methylated bases) , number of thymines (unmethylated bases),context and trinucletide context format.

This is how it looks like:

1    182    +    0    0    CG    CGA
1    183    -    0    0    CG    CGG
1    191    +    0    0    CG    CGC
1    192    -    0    0    CG    CGT
1    339    +    0    0    CG    CGA
1    340    -    0    0    CG    CGC
1    984    +    0    0    CG    CGG
1    985    -    0    0    CG    CGT
1    1095    +    0    0    CG    CGT
1    1096    -    0    0    CG    CGT
1    1176    +    0    0    CG    CGA
1    1177    -    0    0    CG    CGG
1    1243    +    0    0    CG    CGC
1    1244    -    0    0    CG    CGG
1    1560    +    0    0    CG    CGA
1    1561    -    0    0    CG    CGA
1    1660    +    0    0    CG    CGG


 

I am not sure if sva would work well with this data. I have read some people used the logit transformation for methylation percentages, but I am clueless on how to do that properly. Would you have any insight on how to input this data properly into sva? Thank you!
ADD REPLYlink written 14 months ago by mrodrigues.fernanda10
1
Sorry about the delay in my response. The thing that you would need to do is get all of your samples into a position x sample matrix. You would need to do somethng like the following (and someone with more expertise in RRBS may have something to say about the first steps. You may want to talk to a statistician at your local place before doing this). 1. Divide the M/(M + U) for each position 2. Merge the files so that a single position/strand/start is a single row and each sample is a different column. 3. There will be many NaNs because you will have 0/(0 +0) for many positions, so you need to filter out any rows that have an NaN 4. Perform a logit transform on the data values 5. Feed this transformed data to sva as "dat" Hope that helps! Jeff On Tue, Mar 14, 2017 at 2:38 PM mrodrigues.fernanda [bioc] < noreply@bioconductor.org> wrote: Activity on a post you are following on support.bioconductor.org User mrodrigues.fernanda <https: support.bioconductor.org="" u="" 9433=""/> wrote Comment: Using SVA with RRBS data <https: support.bioconductor.org="" p="" 93722="" #93834="">: Jeff, Thank you for your response. I read of people using sva, but I am not sure on how to put the data in the right format for it. The reason why I want to use sva in this data is that I am almost certain of the existence of unknown batch effects. I have used the same samples for RNA-seq data, where sva estimated 6 surrogate variables and removing them made a huge difference. Looking at my PCA plots from methylkit, I see the same "messy" clustering of samples. I do not know what exactly the batch effects are, but I believe sva could estimate them. My main problem is putting the data into the right format for sva. I am using the cytosine report files generated from the bismark methylation extractor, which gives me the following information: chr,start, strand, number of cytosines (methylated bases) , number of thymines (unmethylated bases),context and trinucletide context format. This is how it looks like: 1 182 + 0 0 CG CGA 1 183 - 0 0 CG CGG 1 191 + 0 0 CG CGC 1 192 - 0 0 CG CGT 1 339 + 0 0 CG CGA 1 340 - 0 0 CG CGC 1 984 + 0 0 CG CGG 1 985 - 0 0 CG CGT 1 1095 + 0 0 CG CGT 1 1096 - 0 0 CG CGT 1 1176 + 0 0 CG CGA 1 1177 - 0 0 CG CGG 1 1243 + 0 0 CG CGC 1 1244 - 0 0 CG CGG 1 1560 + 0 0 CG CGA 1 1561 - 0 0 CG CGA 1 1660 + 0 0 CG CGG I am not sure if sva would work well with this data. I have read some people used the logit transformation for methylation percentages, but I am clueless on how to do that properly. Would you have any insight on how to input this data properly into sva? Thank you! ------------------------------ Post tags: sva, RRBS You may reply via email or visit C: Using SVA with RRBS data
ADD REPLYlink written 14 months ago by Jeff Leek510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 150 users visited in the last hour