We demonstrate the potential of differentiating embryonic and induced pluripotent stem

We demonstrate the potential of differentiating embryonic and induced pluripotent stem cells by the regularized linear and decision tree machine learning classification algorithms, based on a number of intragene methylation measures. circumventing ethical and logistical issues of obtaining and supplying stem cells for therapy. At the same time the functional equivalence of ESCs and iPSCs for experimental, therapeutic, or diagnostic purposes remains questioned, since noticeable differences in gene expression and methylation profiles have been reported along with a considerably higher heterogeneity of iPSCs [5]. The potential candidates for the underlying mechanisms are somatic memory [6], laboratory-specific stochasticity [7], and reprogramming aberrations [8]. Importantly, it was found that reprogramming process manifests deletions of tumor-suppressor genes, and passaging tends to produce duplications of oncogenic genes [9], which poses the question of the stability and clinical safety of iPSCs. Moreover, it was demonstrated that the DNA hypermethylation in cancers preferentially targets the development-associated polycomb group (PcG) proteins and other stemness related loci, and expression patterns of particularly poor differentiated tumors are similar to ESCs, including repression of PcS targets (PCGTs) [10C13]. In this light, identifying markers that would discriminate ESCs and iPSCs and analyzing their potential functional impact, including oncogenetic, appear to be a promising solution. Considerable advance has been achieved by analyzing variations in methylation profiles of ESCs and iPSCs that evoked dozens of markers, which would account for the differences [14C16]. Furthermore, there is an increasing evidence on the collective nature of such methylation markers, and the first successes due to the large scale machine learning analysis have been reported [17]. These studies, however, concentrated on the variations of methylation levels in separate CpG dinucleotides, which themselves do not characterize the aggregate changes to gene methylation and its coordinated variations in the SCH-527123 groups of genes. Here, led by the results of [13], where intragene methylation measures were introduced to efficiently discriminate cancerous and normal samples by machine learning techniques, we explore their potential as descriptors for EPCs/iPSCs differentiation. We access applicability of the well-established regularized linear and random forest models to confirm their performance. We implement feature selection and analyze the derived sets of top-rank genes for the ESCs/iPSCs for enrichment by the stemness genes and the top cancer gene methylation markers [13]. Altogether, it provides a consistent approach to uncover coordinated variations in the gene methylation profiles between embryonic and induced pluripotent stem cells and quantify similarity of the found best discriminators to the other SCH-527123 sets of the known or hypothesized functionality, aiding the quality assessment of reprogramming. 2. Materials and Methods 2.1. DNA Methylation Data and Descriptors We analyze genome-wide DNA methylation data collected via the Illumina Infinium Human Methylation 450 BeadChip [14] Rabbit polyclonal to PCDHB16 and available at the NCBI GEO database under the accession designation “type”:”entrez-geo”,”attrs”:”text”:”GSE30654″,”term_id”:”30654″GSE30654. They contain DNA methylation levels at >450,000 CpG sites, mapped on 18,272 genes for 31 ESCs and 35 iPSCs samples. A vast number of methylation values as potential features render extremely high-dimensional spaces for machine learning algorithms, additionally complicated by a relatively small number of available samples. Another difficulty is the biological interpretation of a single CpG site methylation importance in distinguishing between different cell types. To overcome these SCH-527123 difficulties we propose to describe methylation patterns on a gene level. Following [13], we implement mean (MEAN), variance (VAR), and mean derivative (DERIV) measures, which have proved to be valid in cancer/norm discrimination tasks. In addition, we introduce deviation from a linear pattern (DEV) and asymmetry (ASYMM) measures. The raw methylation values are arranged as they appear along the DNA strand and identify the probes value, that the genes from the functional group found among the best classifiers have entered this set by a random choice from the whole pool of genes. The null hypothesis is rejected if < 0.01. To probe.

Comments are closed.