Question: Error while using champ.svd() function in subset of data
1
20 months ago by

Hi,

I am trying to analyze Illumina EPIC methylation array data for a project. Since the project design was not done correctly, I am trying to subset/subdivide the samples into three groups, and do the differential comparisons within those groups. When I try to subset the data after the myLoad object has been generated, I get an error asking me to check the dimensions of the subset, even though the dimensions of the matrices seem to be correct. When I change my SampleSheet file to just list those samples that I want to subset, and run champ.svd(), I get the following error:

[===========================]
[<<<<< ChAMP.SVD START >>>>>]
-----------------------------
champ.SVD Results will be saved in ./CHAMP_SVDimages/ .

[SVD analysis will be proceed with 741930 probes and 16 samples.]

[ champ.SVD() will only check the dimensions between data and pd, instead if checking if Sample_Names are correctly matched (because some user may have no Sample_Names in their pd file),thus please make sure your pd file is in accord with your data sets (beta) and (rgSet).]

<< Following Factors in your pd(sample_sheet.csv) will be analysised: >>
<Sample_ID>(character):TCC1, TCC2, TCC3, TCC4, TCC5, TCP1, TCP2, TCP3, TCP4, TCP5, TCP6, TCP7, TCP8, TCP9, TCP10, TCP11
<Sample_Well>(character):A1, B1, C1, D1, E1, F1, G1, H1, A2, B2, C2, D2, E2, F2, G2, H2
<Sample_Group>(character):TCC, TCP
<Slide>(numeric):201496710011, 201496710034
<Array>(character):R01C01, R02C01, R03C01, R04C01, R05C01, R06C01, R07C01, R08C01
<X>(factor):, .
[champ.SVD have automatically select ALL factors contain at least two different values from your pd(sample_sheet.csv), if you don't want to analysis some of them, please remove them manually from your pd variable then retry champ.SVD().]

<< Following Factors in your pd(sample_sheet.csv) will not be analysis: >>
<Sample_Name>
<Sample_Plate>
<Pool_ID>
[Factors are ignored because they only indicate Name or Project, or they contain ONLY ONE value across all Samples.]

<< PhenoTypes.lv generated successfully. >>
Error in summary(lm(svd.o$v[, c] ~ PhenoTypes.lv[[f]]))$coeff[2, 4] :
subscript out of bounds

I am not sure what is going wrong, so any suggestions or help is appreciated. Thanks!

modified 20 months ago by rcavalca130 • written 20 months ago by karthikrpad10
Answer: Error while using champ.svd() function in subset of data
1
20 months ago by
rcavalca130
United States
rcavalca130 wrote:

Hello,

I'm a colleague of the poster, but I figured I'd post my findings here in case anyone else has this issue. The problem turned out to be that one of the columns (Slide) in our phenotype data could be construed as numeric, and so in the block

    for(c in 1:topPCA)
for(f in 1:ncolPhenoTypes.lv))
ifclassPhenoTypes.lv[,f])!="numeric")
svdPV.m[c,f] <- kruskal.test(svd.o$v[,c] ~ as.factorPhenoTypes.lv[[f]]))$p.value
else
svdPV.m[c,f] <- summary(lm(svd.o$v[,c] ~ PhenoTypes.lv[[f]]))$coeff[2,4];


We were falling into the else statement, and were getting the error. We resolved the issue by doing as.factor() on the Slide column.

Thanks, Raymond Cavalcante

This helped a lot! Thanks!

Hi,

I am also facing the same problem as you mentioned above. Your colleague wrote something about solution but I didn't get your arguments. Can you be able to explain me your function to resolve this..?

Thanks!

Hi,

The general idea is that any column in your phenotype matrix that might be mistakenly considered a numeric, but is actually a factor, like Slide, should be coerced as a factor in the matrix at some point before starting the analysis.

This happens for Slide because it just looks like a big integer, if you go up to the original post and look at the output from Champ, you'll see Slide is considered a numeric accidentally.

Hope that clarifies what I mean.