Персона: Коротков, Евгений Вадимович
Email Address
Birth Date
Научные группы
Организационные подразделения
Статус
Фамилия
Имя
Имя
Результаты поиска
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes
2019, Suvorova, Y. M., Skryabin, K. G., Korotkova, M. A., Korotkov, E. V., Короткова, Мария Александровна, Коротков, Евгений Вадимович
© The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.
A mathematical method for the classification of promoter sequences from the A.thaliana genome
2020, Kamionskya, A. M., Korotkova, M. A., Korotkov, E. V., Короткова, Мария Александровна, Коротков, Евгений Вадимович
© Published under licence by IOP Publishing Ltd.A mathematical method for creating classes of promoter sequences has been developed. The method was used to calculate the classes of promoter sequences from the A.thaliana genome. A total of 16 statistically significant classes of promoter sequences were obtained with class sizes ranging from 8,000 to 100 promoters. The classes obtained allow us to identify potential promoter sequences in various genomes with the number of false positives not exceeding 103 per genome.
Developing mathematical method for multi alignment of DNA sequences with weak similarity
2019, Korotkov, E. V., Korotkova, M. A., Коротков, Евгений Вадимович, Короткова, Мария Александровна
© 2019 Published under licence by IOP Publishing Ltd.A new mathematical method is proposed for constructing multiple alignment of weakly similar amino acid or nucleotide sequences. The method uses a multiple alignment representation in the form of position-weight matrices (PWM) and global dynamic programming. An optimization procedure for PWM is developed with the aim of finding a multiple alignment with maximum statistical significance. The method allows to find multiple alignment of sequences with the number of nucleotide or amino acid substitutions more than 2.5. The developed approach is applied to obtain multiple alignment of promoter sequences of 600 DNA bases length.
Search for Tandem Repeats in the First Chromosome from the Rice Genome
2020, Kamionskaya, A. M., Korotkov, E. V., Korotkova, M. A., Коротков, Евгений Вадимович, Короткова, Мария Александровна
© 2020, Springer Nature Switzerland AG.Using the RPWM method, we searched for tandem repeats of 2 to 50 nucleotides long in the rice genome. We compared the effectiveness of the RPWM method with Mreps, T-reks, Tandem Repeat Finder and ATR Hunter. About 70% of the tandem repeats found could not be found by other algorithms. The correlation of dispersed repeats and transposons with tandem repeats was studied in this work. We assumed that some of the dispersed repeats and transposons originated from tandem repeats
Detection of highly divergent tandem repeats in the rice genome
2021, Kamionskya, A. M., Korotkov, E. V., Korotkova, M. A., Коротков, Евгений Вадимович, Короткова, Мария Александровна
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions. Comparison of the RPWM algorithm with the other methods of TR identification showed that RPWM could detect TRs in which the average number of base substitutions per nucleotide (x) was between 1.5 and 3.2, whereas T-REKS and TRF methods could not detect divergent TRs with x > 1.5. Applied to the search of TRs in the rice genome, the RPWM method revealed that TRs occupied 5% of the genome and that most of them were 2 and 3 bases long. Using RPWM, we also revealed the correlation of TRs with dispersed repeats and transposons, suggesting that some transposons originated from TRs. Thus, the novel RPWM algorithm is an effective tool to search for highly divergent TRs in the genomes.
Multiple alignment of promoter sequences from the human genome
2020, Kamionskaya, A. M., Korotkov, E. V., Korotkova, M. A., Коротков, Евгений Вадимович, Короткова, Мария Александровна
© 2020.A new algorithm for multiple alignment of nucleotide sequences of MAHDS has been developed. A statistically significant multiple alignment of promoter sequences from the human genome was first created using this algorithm. Based on the constructed alignments, 25 classes of promoter sequences were created with the volume of each class exceeding 100 sequences. The classes of promoters can be used to search for promoter sequences in eukaryotic genomes.