AI sentiment analysis and Russia’s war in Ukraine 

AI sentiment analysis and Russia’s war in Ukraine 

By Justin Young  

AI sentiment analysis offers a potential solution to the problem of determining the attitudes of the Russian public towards the war in Ukraine. Traditional polling conducted within an authoritarian state suffers from the limitation of its inability to accurately assess public opinion in the context of being unable to honestly report public opinion which may be hostile to the state. Polls conducted in Russia are no exception to state censorship and respondents are reluctant to state their genuine opinions under questioning, potentially resulting in an inflated image of widespread public support for the war. AI sentiment analysis can provide more accurate insights on the state of public opinion through direct analysis of social media posts but suffers from its own biases dependent on sample size, program accuracy, and translation flaws. This post examines the potentials and drawbacks of AI sentiment analysis compared to traditional polling methods in determining Russian public opinion on the war in Ukraine.   

  

Objective public polling is practically nonexistent in Russia, with the Levada Center being the only independent polling group in the country. However, the Levada Center has been listed as a foreign agent by the government and thus faces state pressure in regards to its continued activity in the country. The center conducts polling of a representative sample of 1,600 people through a broad swathe of cities in Russia using in-person interviews. Levada’s polling of the war in Ukraine points towards overwhelming public support for the war in Ukraine, a trend that has remained relatively constant over the past twenty months since Russia began its invasion of Ukraine. According to Levada’s polling, about 40 percent of those polled strongly support the war and another 30 percent support the war, with outright opposition reduced to under twenty percent of those polled. [1] While Levada’s polling is often quoted in Western media as an indicator of Russian domestic support for the war, its accuracy is dependent on the willingness of those polled to give honest answers, which cannot be guaranteed when legal penalties for expressing dissent in Russia have become increasingly harsh and punitive. These considerations indicate that the results of traditional polling on public opinion towards the war should be assessed with a degree of skepticism.   

  

In contrast, AI sentiment analysis of social media attempts to gauge the feelings of Russians towards the war through the utilization of “neutral” networks to mine the text of social media posts. One case study that has been published in Western media is the analysis of public opinion in Buryatia using FilterLabs AI, a for-profit organization, which concluded that Russian government propaganda was losing effectiveness at controlling local sentiments regarding mobilization. [2] Since then FilterLabs data has been the basis of several articles in prominent Western news media such as the New York Times in which the data has been used to draw conclusions about Russian public opinion on major events such as Prigozhin’s coup. Articles informed by the FilterLabs generally promote the narrative that Russian public sentiment toward government actions is more negative than might be expected but lack detailed methodology or explanation of what specific data they drew their conclusions from. Although FilterLabs claim to have found meaningful conclusions from their data, their lack of transparency on their specific methodology undermines their usefulness.  

  

Academic research on AI sentiment analysis offers more objective insights on its reliability. Although there is substantial scholarship on English-language sentiment analysis, the effectiveness of sentiment analysis on Russian text has been relatively little studied. Sentiment analysis has proven significantly more effective when run on English language datasets than on Russian, since AIs encounter difficulty interpreting meaning when confronted by the complicated grammar structures of Russian. As a result, relatively few Russian-language datasets have been compiled for sentiment analysis. These complications can be partially resolved by machine translation of the source texts. However, machine translation brings with it the problem of potential flawed translations reducing the source fidelity of the data. Despite these drawbacks, analysis of Chat GPT assisted translation often demonstrated higher effectiveness than models relying on untranslated Russian itself.[3] Sentiment analysis in Russian demonstrates higher effectiveness when sentiment is explicitly contained in the dataset, such as in customer previews, achieving an F1 score based on precision and accuracy as high as .841.[9] However, with more ambiguous texts where sentiment is implicit or indirectly expressed, such as news, the F1 score drops to .6, which means that almost half the content is incorrectly scored.[3] Social media falls in between these extremes, but the fidelity of sentiment analysis methods varies based on the number of options to define sentiment within the classification system A study of Russian language tweets conducted by Smetanin found that the social media F1 score of the RuBert sentiment classification model dropped to .6675 when a smaller dataset was used, compared to a high F1 of .82 found using the same model on a  larger dataset in Popova and Spitsyn’s earlier study.[7]   

 

The table below shows the F1 scores of the main studies conducted with Russian sentiment analysis. From this, it is evident that sentiment analysis of news results in conclusions not much better than random guessing, but higher fidelity can be achieved with social media and reviews. This can reach as high as .82 on Twitter posts with “fine tuning” of the sentiment analysis model, which adapts the model parameters for greater accuracy with a smaller subset of the dataset.[6] This demonstrates that more precise methods of sentiment analysis tend to show greater effectiveness.  

One of the broadest Russian-language sentiment analysis datasets is RuSentiment, which draws on posts from the Russian analogue to Facebook, VKontakte. With a corpus of over 20,000 posts, RuSentiment contains a broad variety of vocabulary that it divides into positive, negative, and neutral categories. This vocabulary is in turn reduced to vector embeddings enabling computerized processing of large amounts of natural language. A sentiment analysis of the RuSentiment dataset conducted by Sidorov and Slastnikov (2021) concluded that specific text embeddings needed to be used depending on the nature of the text, as text embeddings were only effective when used on the dataset that they were intended for. When these embeddings were used on the appropriate datasets, they achieved relatively high textual fidelity and effectiveness.[4] Through the leveraging of AI sentiment analysis, it is possible for researchers to paint a more accurate picture of Russian public opinion regarding the war, although a margin of error must be considered when drawing conclusions from the data. 

 

Resources  

[1] Levada. “КОНФЛИКТ С УКРАИНОЙ: ОЦЕНКИ КОНЦА АВГУСТА 2023 ГОДА,” September 5, 2023. https://www.levada.ru/2023/09/05/konflikt-s-ukrainoj-otsenki-kontsa-avgusta-2023-goda/.  

[2] Erol, Yayboke, Teubner, Edwards, and Stroubolis. “AI Can Tell Us How Russians Feel About the War. Putin Won’t Like the Results.” Politico, February 25, 2023. https://www.politico.com/news/magazine/2023/02/25/ai-russians-feel-war-putin-ukraine-00084145.  

[3] Golubev, Rusachenko, and Loukachevitch. “RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts.” Cornell University Computer Science Arxiv, 2023. https://arxiv.org/abs/2305.17679.  

[4] Sidorov, and Slastnikov. “Some Features of Sentiment Analysis for Russian Language Posts and Comments from Social Networks.” Journal of Physics: Conference Series, 2021. https://doi.org/10.1088/1742-6596/1740/1/012036.  

[5] Kotelnikova, Paschenko, and Razova. “Lexicon-Based Methods and BERT Model for Sentiment Analysis of Russian Text Corpora.” CEUR Workshop Proceedings, 2021.   

 [6] Popova, and Spitsyn. “Sentiment Analysis of Short Russian Texts Using BERT and Word2Vec Embeddings.” GraphiCon 2021: 31st International Conference on Computer Graphics and Vision, September 27-30, 2021, Nizhny Novgorod, Russia, 2021. https://ceur-ws.org/Vol-3027/paper109.pdf 

[7] Smetanin. “RuSentiTweet: A Sentiment Analysis Dataset of General Domain Tweets in Russian.” PeerJ Consumer Science, 2022. https://doi.org/10.7717/peerj-cs.1039

[8] Rogers, Romanov, Rumshisky, Volkova, Gronas, and Gribov. “RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian.” Proceedings of the 27th International Conference on Computational Linguistics, 2018. https://aclanthology.org/C18-1064.pdf

[9] Kotelnikov. “Current Landscape of the Russian Sentiment Corpora.” Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021,” 2021. https://arxiv.org/pdf/2106.14434.pdf.