Publication:
Applying probabilistic models to C++ code on an industrial scale

Дата
2020
Авторы
Palachev, I.
Kvochko, A.
Semenov, A.
Sun, K.
Shedko, A.
Journal Title
Journal ISSN
Volume Title
Издатель
Научные группы
Организационные подразделения
Организационная единица
Институт интеллектуальных кибернетических систем
Цель ИИКС и стратегия развития - это подготовка кадров, способных противостоять современным угрозам и вызовам, обладающих знаниями и компетенциями в области кибернетики, информационной и финансовой безопасности для решения задач разработки базового программного обеспечения, повышения защищенности критически важных информационных систем и противодействия отмыванию денег, полученных преступным путем, и финансированию терроризма.
Выпуск журнала
Аннотация
© 2020 ACM.Machine learning approaches are widely applied to different research tasks of software engineering, but C/C++ code presents a challenge for these approaches because of its complex build system. However, C and C++ languages still remain two of the most popular programming languages, especially in industrial software, where a big amount of legacy code is still used. This fact prevents the application of recent advances in probabilistic modeling of source code to the C/C++ domain. We demonstrate that it is possible to at least partially overcome these difficulties by the use of a simple token-based representation of C/C++ code that can be used as a possible replacement for more precise representations. Enriched token representation is verified at a large scale to ensure that its precision is good enough to learn rules from. We consider two different tasks as an application of this representation: coding style detection and API usage anomaly detection. We apply simple probabilistic models to these tasks and demonstrate that even complex coding style rules and API usage patterns can be detected by the means of this representation. This paper provides a vision of how different research ML-based methods for software engineering could be applied to the domain of C/C++ languages and show how they can be applied to the source code of a large software company like Samsung.
Описание
Ключевые слова
Цитирование
Applying probabilistic models to C++ code on an industrial scale / Palachev, I. [et al.] // Proceedings - 2020 IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW 2020. - 2020. - P. 595-602. - 10.1145/3387940.3391477
Коллекции