#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: R: estimating error rate of replicated samples
0
4 weeks ago by
meriam.nef0 wrote:

Hi, I am studying the genetic diversity of a population and I'm using R for filtering my genotypic data (Single Nucleotide Polymorphism/SNP), dendrogram construction and estimating the error rate between replicated samples (duplicates/triplicates).

My code :

mat <- 1-ibsmat
mat1 <- mat[-99:-100,-99:-100]

#ERROR RATE FOR triplicates
x <-  seq(1,274,3)
err.r <-  rep(NA, length(x))
for (i in 1:(length(x)-1)){
k <- x[i]
k1=k+2
ibx <- mat1[k:k1,k:k1]
print(ibx)
err.r[i] <- mean(ibx[lower.tri(ibx)])
}
errorrate <- mean(na.omit(err.r))

#ERROR RATE FOR duplicates
x <-  seq(1,274,2)
err.r <-  rep(NA, length(x))
for (i in 1:(length(x)-1)){
k <- x[i]
k1=k+1
ibx <- mat1[k:k1,k:k1]
print(ibx)
err.r[i] <- mean(ibx[lower.tri(ibx)])
}
errorrate <- mean(na.omit(err.r))


My questions are: 1) my .csv document should contain only sorted triplicates and duplicates or all data (duplicates, triplicates, no replicated samples)

2) Should I filter that csv.document before estimating error rate? what I mean by filteriing is %NA (missing) by genotypes for example.

3) If there's an error in my code, please feel free to comment.

Thanks, Meriam