Genotype calling plays important roles in population-genomic studies, which have been

Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18) sequencing data of 83 clones from a population of 2010) start from genotype calls at each SNP site. In addition, when accurate genotypes AZD4547 are called at each SNP site, traditional statistical methods, including the four-gamete test (Hudson and Kaplan 1985) and composite disequilibrium measures (Cockerham AZD4547 and Weir 1977; Weir 1996), can be used to examine the pattern of linkage disequilibrium. Despite the advantages, some difficulties are associated with high-throughput sequencing technologies. One of the main difficulties is the high sequencing error rates, which typically range from 0.001 to 0.01 per read per site with commonly used sequencing platforms (Glenn 2011; Quail 2012). Second, because sequencing occurs randomly among sites, individuals, and chromosomes in diploid organisms, depths of coverage are variable at all levels. As a result, when depths of coverage are low, there are often missing data, which introduces biases in subsequent population-genetic analyses unless they are statistically accounted for (Pool 2010). To call AZD4547 genotypes from high-throughput sequencing data, many statistical methods have been recently developed (2008, 2009b; Hohenlohe 2010; Martin 2010; McKenna 2010; Catchen 2011, 2013; DePristo 2011; Li 2011; Nielsen 2012; Vieira 2013). The performance of the widely used genotype callers in population-genomic analyses is not well understood, especially when the population deviates from the HardyCWeinberg equilibrium (HWE). Recent studies (Kim 2011; Han 2014) found that allele frequencies estimated directly from the sequence reads are unbiased, whereas those estimated via genotype calling are biased when depths of coverage are low. These studies assumed a population in HWE. Vieira (2013) recently showed that the performance of genotype calling can be improved by first estimating inbreeding coefficients from the sequencing data, and then calling genotypes incorporating the information on estimated inbreeding coefficients. Unfortunately, their method is applicable only when the inbreeding coefficients are non-negative (Maruki and Lynch 2015), and does not always take full advantage of the population-level information. Negative inbreeding coefficients are common in some organisms, including asexual aphids (Delmotte 2002), in permanent ponds/lakes (Hebert 1978), fruit bats (Storz 2001), partially inbreeding plant species (Brown 1979), Laysan finches (Tarr 1998), prairie dogs (Foltz and Hoogland 1983), rhesus monkeys (Melnick 1987), and water voles (Aars 2006). Furthermore, some regions under balancing selection may contain excess heterozygotes and therefore show negative inbreeding coefficients (Black and Salzano 1981; Markow 1993; Hedrick 1998; Black 2001; Ferreira and Amos 2006; Tollenaere 2008). In this study, we develop a maximum-likelihood (ML) method for calling genotypes from high-throughput sequencing data that incorporates the prior information from a genotype-frequency estimator (GFE) (Maruki and Lynch 2015). We examine the performance of the proposed method using computer simulations under different genetic conditions, including those where HWE is violated, and compare the performance with that of other widely used methods. The results show that our method yields more accurate genotype calls with low or moderately high depths of coverage than the current widely used methods, which is supported by analysis of human data. In addition, we develop another ML method for calling genotypes from high-coverage sequencing data, which relaxes the AZD4547 assumption of biallelic polymorphisms made in many existing methods. We also examine the necessary depth of coverage for identifying triallelic sites and accurate genotype calling with the proposed method, using computer simulations. Taking the results of the performance evaluation into account, we apply the proposed method Kif2c to high-throughput sequencing data of 83 clones from a population of the microcrustacean 2016). The results show that the proposed.

Posted in My Blog

Tags: ,

Permalink

Comments are closed.

Categories