By Krish Ganotra, Vicky Lam, Matthew Brenningmeyer - Lab Undergraduate Affiliates
Disinformation has become a much larger problem globally within the past decade, and with the increasing volume of disinformation comes a technological revolution. Automated disinformation has taken over the field, with bots amplifying, spreading, and even creating content autonomously. Many new advancements have allowed individuals as well as state actors to create heretofore unseen amounts of content and reach larger audiences than ever before. These effects could be seen during the 2016 election in the US, during the present COVID-19 pandemic globally, in Chinese political efforts to undermine Taiwan, and in many other instances around the world.
Automated and semi-automated disinformation methods generally have 3 main goals, to amplify human generated content, foster human-machine generated content, and develop machine generated content. Bots are increasingly adept at amplifying human-created content on social media to build user audiences and overwhelm other content. Automation has also be used to create content in a human-bot paired cycle, a system often referred to as a - ‘cyborg’ – that generates an augmentation of the disinformation production and dissemination chain. Lastly, in emerging cases, bots can be used to generate large amounts of content autonomously after using training data on the the most effective and convincing forms of disinformation. This last category of content is used to create a feedback loop that continuously refines disinformation methodology and content.
Content amplification by bots is by far the most common method of automation. This method relies on bots as simple amplifiers and spreaders of information. Content is created by a human, then bots provide feedback and share content, even going so far as to tag popular users in an attempt to further amplify a message. Exploiting the algorithms of many social media platforms, bots push disinformation to a stage where unwitting humans will further spread a given piece of information they believe to be true. Bot amplification is the least sophisticated method of employing bots, but it can still enable a single individual to broadcast disinformation to thousands quickly. However, the ease of use comes at a price, as the predictable, routine activity of such bots coupled with extremely tight networks of fellow bots make detection easier
The second method is more complex and uses natural language processing (NLP), a batch of technologies that can analyze and subsequently create novel texts autonomously with a simple input into a pre-trained model. Generally, these cyborg setups function by having a bot create a large amount of disinformation content from a single prompt, such as 30 or 40 posts on the social media platform of choice. A human actor familiar with the technology will then refine these posts, choosing the ones they believe will be the most successful and adjusting the bot’s initial prompt to attempt to create more of these successful posts. Through this iterative process, the human collaborator eventually develops an inventory of a large number of usable disinformation posts and a bot that can create them much faster than the human actor alone. These posts are then amplified using the methods described earlier, with the bottleneck of content creation being minimized by the new addition of autonomous content creation.
While both the above methods help these actors to create and spread content more quickly in the short-term, the final method has a much longer-term goal. It utilizes a method similar to the cyborg method above, but with substantially less human input. Prioritizing quantity over quality and creating and spreading a large amounts of disinformation content, it generates a large amount of data on what makes a truly a post, article, or picture effective disinformation. The data generated in this process allows bot creators to manipulate input prompts to create more effective bots, and to develop a feedback loop where in which data gathering process can repeat itself and improve iteratively over time.
Understanding the NLP frameworks that these bots use for content creation is essential when analyzing the strengths and weaknesses of these strategies. There is a constant struggle between required processing power and efficacy present in each of these models. One leading private NLP model, GPT-3, is an artificial intelligence trained on over 45 terabytes of text and containing 175 billion parameters. While GPT-3 was created for general NLP applications, some malicious actors may choose to exploit the software to spread disinformation. State actors can feasibly recreate and host this type of software, but an individual or private group will have a difficult time and would most likely resort to less effective models that require fewer financial resources to create and host.
One challenge is to develop models offset human costs efficiently while still generating comprehensible content. The bar for comprehensible content is low, particularly as it pertains to disinformation disseminated via social media platforms. The shorter form of social media posts improves the and plays to the strengths of often weakly constructed NLPs. Despite their relative successes, the strengths of these models are present in quantity, not quality, and this fact is constantly reflected in their output.
Efforts undertaken by both government entities and social media platforms to monitor and regulate inauthentic behavior are consistently overwhelmed by the sheer quantity of disinformation produced and the rapidly changing nature of the social media landscape. The weaknesses inherent to content generation bots aid human detection. As a result, some social media platforms have established war rooms to identify and respond to disinformation on their platforms. These counter-bot strategies present new problems, however, as it places the platforms in a perpetual game of after the fact whack-a-mole. Liberal democracies in particular are often more restricted by the laws and policies on free speech in many countries and are forced to try to staunch the tide of disinformation via other means. Unfortunately, both the responses by social media firms and governments largely ineffective and leaves a majority of the burden to users and creators on social media platforms.
Without users supporting and spreading disinformation, disinformation campaigns lose their momentum before they can reach their audience. Educated users may help eradicate more basic forms of disinformation, but this is becoming increasingly difficult. Recent research indicates that human users can only identify fake articles generated by GPT-3 52% of the time. Despite human cognitive failures to identify machine generated disinformation it might be possible to use larger number of users to identify disinformation and in effect foster a smart crowd. If such a crowd could harness their combined knowledge they might be able to more effectively identify disinformation. Twitter is already beginning to test similar concepts. While these programs might be more effective they are akin to a modified and crowd enhance game whack-a-mole. Countering bots with AIs is another concept increasingly gaining favor and has found some corporate backing in several start-ups. The future is likely to be a mix of operations on both sides - human, human-machine, machine. The automation of disinformation is likely in its infancy and will see improvements in both in the creation and countering of disinformation.