Publication:
Mathematical algorithm for identification of eukaryotic promoter sequences

dc.contributor.authorKorotkov, E. V.
dc.contributor.authorSuvorova, Y. M.
dc.contributor.authorNezhdanova, A. V.
dc.contributor.authorGaidukova, S. E.
dc.contributor.authorKorotkova, M. A.
dc.contributor.authorКороткова, Мария Александровна
dc.date.accessioned2024-12-02T10:11:33Z
dc.date.available2024-12-02T10:11:33Z
dc.date.issued2021
dc.description.abstract© 2021 by the authors. Licensee MDPI, Basel, Switzerland.Identification of promoter sequences in the eukaryotic genome, by computer methods, is an important task of bioinformatics. However, this problem has not been solved since the best algorithms have a false positive probability of 10−3–10−4 per nucleotide. As a result of full genome analysis, there may be more false positives than annotated gene promoters. The probability of a false positive should be reduced to 10−6–10−8 to reduce the number of false positives and increase the reliability of the prediction. The method for multi alignment of the promoter sequences was developed. Then, mathematical methods were developed for calculation of the statistically important classes of the promoter sequences. Five promoter classes, from the rice genome, were created. We developed promoter classes to search for potential promoter sequences in the rice genome with a false positive number less than 10−8 per nucleotide. Five classes of promoter sequences contain 1740, 222, 199, 167 and 130 promoters, respectively. A total of 145,277 potential promoter sequences (PPSs) were identified. Of these, 18,563 are promoters of known genes, 87,233 PPSs intersect with transposable elements, and 37,390 PPSs were found in previously unannotated sequences. The number of false positives for a randomly mixed rice genome is less than 10−8 per nucleotide. The method developed for detecting PPSs was compared with some previously used approaches. The developed mathematical method can be used to search for genes, transposable elements, and transcript start sites in eukaryotic genomes.
dc.identifier.citationMathematical algorithm for identification of eukaryotic promoter sequences / Korotkov, E.V. [et al.] // Symmetry. - 2021. - 13. - № 6. - 10.3390/sym13060917
dc.identifier.doi10.3390/sym13060917
dc.identifier.urihttps://www.doi.org/10.3390/sym13060917
dc.identifier.urihttps://www.scopus.com/record/display.uri?eid=2-s2.0-85107178104&origin=resultslist
dc.identifier.urihttp://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000666779300001
dc.identifier.urihttps://openrepository.mephi.ru/handle/123456789/25531
dc.relation.ispartofSymmetry
dc.titleMathematical algorithm for identification of eukaryotic promoter sequences
dc.typeArticle
dspace.entity.typePublication
oaire.citation.issue6
oaire.citation.volume13
relation.isAuthorOfPublication409f2a9c-cb66-4177-bc3c-ed7c1027fe27
relation.isAuthorOfPublication.latestForDiscovery409f2a9c-cb66-4177-bc3c-ed7c1027fe27
relation.isOrgUnitOfPublication010157d0-1f75-46b2-ab5b-712e3424b4f5
relation.isOrgUnitOfPublication.latestForDiscovery010157d0-1f75-46b2-ab5b-712e3424b4f5
Файлы
Original bundle
Теперь показываю 1 - 1 из 1
Загружается...
Уменьшенное изображение
Name:
W3165627755.pdf
Size:
2.52 MB
Format:
Adobe Portable Document Format
Description:
Коллекции