Determining the Authorship of a Ukrainian-Language Literary Text by Means of Artificial Intelligence from Ultra-Short Excerpts
DOI:
https://doi.org/10.15802/stp2023/288289Keywords:
authorship detection, natural language text, artificial intelligence, generative language models, ChatGPT, Bing bot, Skype, Microsoft, Bard, GoogleAbstract
Purpose. The intelligent search engine Bing can be used as a method and a means of determining the author of a Ukrainian-language test. Bing helps to find information about a text fragment and its author, but the search results may be inaccurate or incomplete. The main purpose of the paper is to study the effectiveness of establishing the authorship of literary texts by state-of-the-art artificial intelligence tools based on ultra-short excerpts. Methodology. Ten Ukrainian authors with a rich body of fiction reflecting various aspects of Ukrainian culture and history were selected, as well as random fragments of 3–7 words each from different works of these authors. An experiment was conducted to determine the authorship of 2,000 fragments. Findings. Using the Python programming language and the skpy package, we developed software that sends questions and receives answers from the Bing bot built into Microsoft Skype. The answers were checked for the name of the author of the phrase and the corresponding title of the work. According to the results, Ivan Franko has the highest percentage of answers where the author's name was mentioned (65%), and Oleksandr Dovzhenko has the lowest result (23%). The answers were analyzed by the length of the fragments. Of course, the longer the length of a text fragment, the greater the likelihood of accurately identifying its authorship. Features of the author's style are manifested in 20–40 % of short fragments. The remaining 60–80% may be commonly used language constructions that the author relayed from the external environment. Originality. In this work, for the first time, the method of checking the authorship of fragments of Ukrainian-language text using the Bing bot with artificial intelligence is presented. A comparative analysis was performed and experiments were given to determine the authorship of short fragments of 3–7 words. It has been established that even quite small fragments of the text have signs characteristic of the original style of the author of artistic works. Practical value. It has been determined to what extent experts in determining the authorship of natural language texts can rely on existing state-of-the-art artificial intelligence tools in combination with an extensive database of texts in the Internet space.
References
An unofficial Python library for interacting with the Skype HTTP API. (2023). SkPy 0.10.6. Retrieved from https://pypi.org/project/SkPy/ (in English)
Bengio, Y. (2008). Neural net language models. Scholarpedia, 3(1), 3881. DOI: https://doi.org/10.4249/scholarpedia.3881 (in English)
Bonifacic, I. (2023). Microsoft’s next-gen Bing uses a ‘much more powerful’ language model than ChatGPT. Retrieved from https://www.engadget.com/microsofts-next-gen-bing-more-powerful-language-model-than-chatgpt-182647588.html (in English)
Chowdhery, A., Narang, Sh., Devlin, J., Bosma, M., Mishra, G., Roberts, A., … & Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. arXiv, 1-87. (in English)
Confirmed: the new Bing runs on OpenAI’s GPT-4. (2023). Retrieved from https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4/ (in English)
Hoffmangoogle, Ch. (2023). Google Bard: How to Use Google's AI Chatbot. How-To Geek. Retrieved from https://www.howtogeek.com/880668/google-bard-how-to-use-googles-ai-chatbot/ (in English)
Montti, R. (2023). How Bing AI Search Uses Website Content. Search Engine Journal. Retrieved from https://www.searchenginejournal.com/how-bing-ai-search-uses-web-content/480643/ (in English)
Perez, E., Kiela, D., & Cho, K. (2021). True few-shot learning with language models. Advances in Neural Information Processing Systems, 34. (in English)
Reinventing search with a new AI-powered Microsoft Bing and Edge, your copilot for the web.... Retrieved from https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/ (in English)
Shynkarenko, V. I., & Demidovich, I. M. (2022). Natural Language Texts Authorship Establishing Based on the Sentences Structure. In COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems (pp. 328-337). Gliwice, Poland. (in English)
Shynkarenko, V. I., & Demidovich, I. M. (April, 2021). Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights. In COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems (Vol. 1, pp. 832-844). Kharkiv, Ukraine. (in English)
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Science and Transport Progress
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright and Licensing
This journal provides open access to all of its content.
As such, copyright for articles published in this journal is retained by the authors, under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0). The CC BY license permits commercial and non-commercial reuse. Such access is associated with increased readership and increased citation of an author's work. For more information on this approach, see the Public Knowledge Project, the Directory of Open Access Journals, or the Budapest Open Access Initiative.
The CC BY 4.0 license allows users to copy, distribute and adapt the work in any way, provided that they properly point to the author. Therefore, the editorial board of the journal does not prevent from placing published materials in third-party repositories. In order to protect manuscripts from misappropriation by unscrupulous authors, reference should be made to the original version of the work.