Question

R: estimating error rate of replicated samples

0

Entering edit mode

meriam.nef • 0

@meriamnef-19505

Last seen 5.3 years ago

Hi, I am studying the genetic diversity of a population and I'm using R for filtering my genotypic data (Single Nucleotide Polymorphism/SNP), dendrogram construction and estimating the error rate between replicated samples (duplicates/triplicates).

My code :

mat <- 1-ibsmat
mat1 <- mat[-99:-100,-99:-100]

#ERROR RATE FOR triplicates
x <-  seq(1,274,3)
err.r <-  rep(NA, length(x))
for (i in 1:(length(x)-1)){
  k <- x[i]
  k1=k+2
  ibx <- mat1[k:k1,k:k1]
  print(ibx)
  err.r[i] <- mean(ibx[lower.tri(ibx)])
}
errorrate <- mean(na.omit(err.r))

#ERROR RATE FOR duplicates
x <-  seq(1,274,2)
err.r <-  rep(NA, length(x))
for (i in 1:(length(x)-1)){
  k <- x[i]
  k1=k+1
  ibx <- mat1[k:k1,k:k1]
  print(ibx)
  err.r[i] <- mean(ibx[lower.tri(ibx)])
}
errorrate <- mean(na.omit(err.r))

My questions are: 1) my .csv document should contain only sorted triplicates and duplicates or all data (duplicates, triplicates, no replicated samples)

2) Should I filter that csv.document before estimating error rate? what I mean by filteriing is %NA (missing) by genotypes for example.

3) If there's an error in my code, please feel free to comment.

Thanks, Meriam

R SNP genotyping genetic diversity • 740 views

ADD COMMENT • link 5.3 years ago meriam.nef • 0