Forecasting with Frequency: How I used radio broadcasts and Python to predict the 2023 DRC Election

We uncovered early warnings signs using media monitoring tools.

Jan 22, 2024

On December 23rd, 2023, the Congolese public took to the polls amidst a tense and uncertain election cycle. The incumbent president, Felix Tshisekedi, was tasked with holding free and fair elections while also securing his own political future. The alleged bias in favor of the ruling party was a key grievance of the opposition, and the weeks preceding the election were rife with violence, scandal, and logistical challenges.

Over the past several months, I have been conducting research for Rootwise, a humanitarian start-up building innovative media monitoring tools for missions in the Global South. Prior to the election, I was tasked with gathering some election insights to share with non-governmental organizations (NGOs) on the ground in the DRC.

To provide these NGOs with unique insights, I gathered data from Rootwise Frequency. Frequency is, a radio broadcast monitoring tool uses AI and NLP to automate collection, transcription, and translation of radio news and information. It is the only scalable tool of its kind that can listen to streaming and airwaves-only broadcasts and deliver the results in near real time. With our partners, we cross-referenced insights from Frequency with what's available on social media and confirmed that radio has a focus on hyperlocal issues and concerns that is not replicated in national news or social media.

The report has since been published, and the election has unsurprisingly been called in favor of the incumbent president while political opponents and some election observers have questioned the legitimacy of the vote. Many of the insights gathered in our report predicted this outcome - so I wanted to share more about the software and methodology used during this project.

Research questions

Prior to beginning the research in earnest, I worked with a small, distributed team of collaborators who were also involved with humanitarian work in the Congo. Being that the audience for our report were aid groups on the ground, we wanted to focus on insights concerning political violence, ethnic fault lines, and perceptions of NGOs operating in the region. After some discussion and initial research, we decided that we wanted to know:

What are the main topics being discussed in the election? (armed actors, corruption, etc.)
Where are there geographical hotspots that might face security/access constraints due to political violence, hate speech and stigmatization?
What is the current perception of NGOs + the UN mission on the eve of the election?

With our research questions sorted, it was time to conduct some exploratory data analysis and compile the report.

Exploratory Data Analysis

The core functionality of Frequency allows users to perform a boolean string search for transcribed and translated radio broadcast snippets and the corresponding audio clips. In order to whittle down this large body of audio and text data to a relevant sample, we applied a relevant data range (the entire month of November) and a simple keyword taxonomy to target our research topics. Since all of the original audio broadcasts are in French, I listened to each snippet in order to verify the machine translation and the context of each clip. After combing through the Frequency tool and exploring the relevant keywords for each research question, I exported the files in CSV format to begin the next phase of my research in a Jupyter notebook.

Low-tech topic modeling

Once the CSVs were imported into my notebook, I wanted to understand how these topics stacked against each other in terms of relevance. I converted the CSV into a Pandas DataFrame and then split the transcribed audio snippets into individual, countable strings. While this approach gave me a high level understanding of the most-discussed topics, it did not filter out any stopwords, making the data very noisy. The remainder of this topic modeling took place via spreadsheet , where I could more quickly manipulate and sum the mentions. Upon completion, I used these tabulations to create a barchart for our final report.

Mapping political unrest

In order to answer our second research question, I needed to identify clips that referenced specific locations in the DRC. Frequency does a fine job of recognizing and transcribing many locations (and also allows you to filter by location), but I made sure to conduct a manual review of locations as well. Additionally, our topic modeling exercise also helped us determine some of the hotspots. Once we had those areas identified, I took to SnazzyMaps to create a small area map for the report. In a future iteration, I would want to create a Python dictionary with all of the towns/cities in the country and use that as a lookup table for the audioclips.

Locations of potential unrest in Eastern DRC mentioned on radio, Nov 2023

Translation + Teleprompter

One of the final research that I wanted to do before drafting the report was re-reading all of the snippets to ensure that I was thoroughly analyzing our target data. Since the Frequency browser tool has limitations concerning pagination and displaying translations alongside each other, I wanted to make the CSV data easier to digest. The end product was a short python script that used a translation library along with a timed sleep method to go through the CSV row by row and print any headlines missed in my previous exploration of the data. This allowed our team the ability to quickly extract official statements, figures, and key details for the final report.

Improving conflict forecasting

In conclusion, I was able to apply a mix of data science and analytical approaches to a highly unique data set sourced from local radio broadcasts. The result - key contributions to a report that correctly predicted several features of the election including very poor access to polling in some regions and election-related violence in the days leading up to the vote. In future reporting, I would like to monitor a longer time horizon and have the ability to apply other data science tools such as geomapping, named entity recognition, or sentiment analysis for early warning applications. The biggest takeaway from this project is that radio analysis is a critical element of any conflict forecasting framework in the global south, and this work allows humanitarian operations teams to plan around the risks they face in the field.

AJ’s Substack

Discussion about this post