Big Data ‘Early Alarm’ for Ukraine Abuses
System Analyzes Millions of Tweets Daily to Quickly Pinpoint Atrocities, Refugee Flows
By Chris Carroll
Painting by Servio Kuub/Unsplash
Photo by Rodrigo Abd/AP
From searing images of mass killings by Russian forces to accounts of families struggling to flee frontline fighting, journalists have created a kaleidoscopic view of the suffering that has engulfed Ukraine since Russia invaded—but the news media can’t be everywhere.
Social media faces no such limitations, however, and a University of Maryland researcher is part of a U.S.-Ukrainian multi-institutional team harvesting data from Twitter and analyzing it with machine-learning algorithms. The result is a real-time system that maps out humanitarian needs, displaced people, civilian resistance and human rights violations—constructed from the accounts of people in the path of the war.
The project, Data for Ukraine, sprang to life in mid-March, and has shown itself able to reveal important events a few hours ahead of Western or even Ukrainian media sources. In one instance, its tracking of civilian resistance and human rights abuses immediately identified the beginning of a major event—Russian forces firing on peaceful protesters in the southern city of Kherson—that registered as a spike on one of the main graphs on the project’s public website. The group is also providing reports to a range of nonprofit and governmental organizations seeking to aid refugees and track war crimes.
“It’s an early alarm system for human rights abuses,” says Ernesto Calvo, professor of government and politics and director of UMD’s Interdisciplinary Lab for Computational Social Science. “For it to work, we need to know two basic things: what is happening or being reported, and who is reporting those things.”
He and his lab focus on the second of those two requirements, and constructed a “community detection” system to identify important groups of Twitter users from which to use data.
Calvo, who honed his approach analyzing social media from political and environmental crises in Latin America, started with a list of about 400 verified users who tweet on relevant topics. He and his team deepened the collection by drawing on connections and followers so that millions of tweets per day now feed the system.
Knowing who to exclude—accounts started the day before the invasion, for instance, or with few long-term connections—is key, Calvo says.
“The objective was not to capture as much data as possible, but to make sure it’s quality data,” he says.
Other team members hail from Duke University, the University of North Carolina at Chapel Hill and the Kyiv School of Economics; another, Olga Onuch of the University of Manchester, U.K., a Ukrainian associate professor of politics, helped guide the initial selection of Twitter accounts and shape the list of more than 600 Ukrainian and Russian keywords the system monitors for. It captures “living language,” she says—for instance, a protest might be referred to in Ukrainian or Russian with the Soviet-era colloquialism of “a meeting.”
Onuch says the work can help aid agencies direct resources to people fleeing fighting and, in the long term, provide documentation of abuses and atrocities for eventual justice.
“Social scientists have a duty in a time of crisis—if they have special or technical knowledge that can be useful—to use it,” she says. “Even if they can’t directly save human lives, they can use it to record what happened.”
Leave a Reply
* indicates a required field