Decoding the Narrative: Using Machine Learning to Uncover Trends in News Coverage of Police Use of Force

Last updated on May 2, 2023 15 min read topic modeling, text analysis, police, bayesian

“We shall assume that what each man [believes] is based not on direct and certain knowledge, but on pictures…given to him.” - Lippmann, 1922

Media Narratives of Police Use-of-Force

What do we know about media coverage of police use-of-force and its effect on public opinion? While it is axiomatic to say that news media influences public opinion, surprisingly little work has been done to understand how the news media frames the narrative(s) surrounding police use-of-force over time. This is not to say that no one has studied media effects on public opinion surrounding police use-of-force. Lawrence (2000)¹ and Fridkin and colleagues (2017)² are just a couple of works I have found immensely influential. However, these works typically examine media surrounding a specific controversial use-of-force event. Or, if they examine news media coverage over a longer period of time, the sample size is relatively small. This ‘small-n’ problem is a function of the constraints that conventional qualitative work places on researchers, namely time.

What is largely missing from the literature, and what impedes a better understanding of media effects on public opinion of police use-of-force, is longitudinal work with large corpora of news stories. I aimed to begin to fill in that gap in our understanding with my dissertation work. A brief and consolidated portion of that work makes up this post.

A New Corpus of News Stories

I have compiled a new dataset of 259,575 English-language media stories mentioning police use of force within the United States, covering the period of 1990 to 2021. The sources for these stories came from 2,770 different publications/platforms, including national, regional, and local newspapers (such as Wall Street Journal, Washington Post, New York Times, Tampa Tribune, Wisconsin State Journal, etc.), cable news television transcripts (FOX News, CNN, MSNBC, etc.), broadcast news transcripts (NBC, ABC, CBS, FOX), radio transcripts (NPR, etc.), magazines (Newsweek, American Rifleman, etc.), news wires (e.g., AP), and government press releases (Department of Justice, Federal Register, etc.).

Due to the computational demand for such a large corpus, I restricted the corpus to the following newspaper publications: USA Today, New York Times, Washington Post, St. Louis Post, Atlanta Journal-Constitution, and Wall Street Journal. I chose these publications guided by Enns’ (2016)³ work. Enns explains that using newspapers, while not perfect, is often the best option when working with news time-series data.

The corpus was further narrowed by restricting the analysis to headlines. Headlines were chosen as the unit of analysis for three reasons. First, research has found that more than half of news consumers do not attend to information beyond headlines. Accordingly, if there is an association with public opinion, one may expect headlines to have more of an effect than the actual stories since the majority of headline viewers do not read the body of the story and thus form their impression of the information being provided from the headlines. Second, Lawrence (2000) recognizes this in her research by suggesting that framing effects may be more detectable in the headlines than in the whole story. Finally, although of lesser importance, reducing the corpus to headlines reduces the computational demand on the machine-learning algorithms employed.

When the corpus is narrowed in the manner described above, I am left with 29,035 headlines consisting of 331,593 words, spanning 1990 to 2021. The mean word count per headline is 11.42, with a standard deviation of 7.33.

One simple but informative way to gain an initial understanding of a corpus is to plot the most frequent terms within it. The figure below plots the twenty most frequent terms within the headline corpus. One can begin to discern prevalent themes in headlines. For example, the five most frequent terms in the headlines are police officer shooting black man. Looking at the other fifteen most frequent words, one can see that ‘ferguson’ was a frequent topic, as well as ‘trump’ and ‘protests.’

However, unigram frequency analysis, while helpful, is an unsharpened instrument for understanding a corpus. Next, I move beyond using simple term frequencies as a measure by identifying latent themes that consider probabilistic word relationships.

Topic Modeling

For the particular question at hand, topic modeling is ideal. I prefer using structural topic modeling (STM) for several reasons I do not take the time to explain in this post. If you are interested in learning more about STM, you can read more about it in this paper I co-authored with Ian T. Adams. In a nutshell, STM identifies underlying latent variables through an unsupervised machine learning process in which themes are inferred from the distribution of words across topics. The topics are generated inductively from the data, utilizing a probabilistic approximation of Bayesian inference. Essentially, STM counts words, determines the probability of words appearing with other words within and across documents (i.e., headlines), and estimates the best fit for latent themes based on those probabilities.

Determining the Number of Topics

A data-driven approach was used to determine the appropriate number of topics. Measures of held-out likelihood, residual analysis, and semantic coherence were examined for models ranging from 3 to 50 topics. Based on these measures, the models containing 3 to 10 topics were retained. While these measures are efficient and interpretable, they cannot replace human judgment. Accordingly, I performed a manual examination of semantic coherence and the exclusivity of words to topics within the remaining models, finding the eight-topic model to have the best goodness of fit.

What are the News Media Narratives about Police Use-of-Force?

I examined fifty highly associated documents for each topic to better identify their meanings. Through several close readings of these documents, I identified an underlying theme for each topic.

Topic 1 — Police Controversies

Comments highly associated with Topic 1 mostly appear to report on various police-related controversies (not related to Ferguson or George Floyd):

“Fatal beating spurs brutality claims; Tape shows that police did nothing improper, Cincinnati mayor says.”

“Police don’t always tell the truth. Where are South Carolina’s body cameras? South Carolina has been thinking very slowly about equipping every officer with body cameras.”

“Jury awards $3.9 million in Rodney King beating.”

“Philadelphia tries to calm furor over police beating that was shown on tv; This is not like Rodney King case, official says”

“Body-camera footage of N.C. deputies fatally shooting Black man will not be released to the public, judge rules.”

Topic 2 — Race

Comments highly associated with Topic 2 tend to racialize headlines about police use-of-force:

“Why SPLC says White Lives Matter is a hate group but Black Lives Matter is not”

“The Capitol siege shows how White Americans can express anger that Black Americans cannot.”

“In Seattle’s Capitol Hill autonomous protest zone, some Black leaders express doubt about white allies.”

“Black American has been playing by white America’s rules. If we want reconciliation, it’s time white America shared the burden; For too long, after tragedies like Charleston, black America has had to mute our unspoken anger. We must be able to express it, and white America must confront the residual racism that still persists in our society.”

“From the Black Panthers to Black Lives Matter, the ongoing fight to end police violence against black Americans.”

Topic 3 — Protest & Reform

Comments highly associated with Topic 3 describe protests surrounding police use-of-force and calls for policing reforms:

“Protesters Sue New York City Mayor, Police Officials Over Handling of Protests; The protests earlier this year were sparked by the May 25 killing of George Floyd.”

“Marches, Rallies and Vigils Settle In as Part of Daily Life in New York”

“Louisville Reaches $12 Million Settlement With Breonna Taylor’s Family; Agreement also requires city to implement police reforms, including mandate for supervisors to sign off on search warrants.”

“As Cuomo Pushes Police Reforms, State Police Funding Rises; New York’s governor has signed an executive order mandating local departments develop a reform plan by next year or risk losing state funding.”

“A City-by-City Look at Where Things Stand Amid Days of Unrest; Cities, states impose curfews after days of protests sparked by the death of George Floyd.”

Topic 4 — Garbage Can Topic

Topic 4 is a miscellaneous topic that gathers headlines that do not contain any substance:

“Across the USA: News from every state.”

“Metro digest news from around the region”

“What’s News: World-Wide”

Topic 5 - Just the Facts

Topic 5 describes instances of police use-of-force without any editorialization and simply outlines the actions taken by officers and the tangible outcomes of those actions. Frequently, these headlines are based on statements from official sources.

“Police shoot, kill man outside Glenn Dale daycare; A man who was wielding an axe outside an in-home daycare in the Glenn Dale area was shot and killed by three Prince George’s County Police officers Thursday, authorities said”

“D.C. Police shoot, kill man in Southeast; A D.C. police officer shot and killed a man Wednesday afternoon after the man pointed a handgun at the officer, authorities said”

“Fairfax police knew that man shot by officer is mentally ill; 25-year-old held BB replica of handgun as he allegedly lunged at authorities”

“Man kills youth, injures six in attack in Kansas police kill assailant holding knife on child.”

“Man Shot Police Had Gun, Drugs; Crack, Bullets Found in Car, Officers Said”

Topic 6 - Ferguson

Topic 6 highlights the Michael Brown shooting in Ferguson, Missouri:

“Michael Brown grand jury to reconvene Monday.”

“Court: Judge hasn’t agreed to release Ferguson grand jury evidence if no indictment.”

“Marchers in St. Louis, Ferguson call for justice in death of Michael Brown.”

“Al Sharpton returning to St. Louis for Michael Brown tribute”

“Police, demonstrators gearing up for Michael Brown anniversary weekend in Ferguson.”

Topic 7 - National Politics

Topic 7 indicates the subject of police use-of-force being used in national politics:

“Trump simply doesn’t get how to talk to and about people of color; President Trump seems not to understand the relationship between police and people of color in cities.”

“Biden: Trump ignores pandemic, stokes unrest, solves neither.”

“‘Now is not the time for divisiveness’: Wisconsin governor urges Trump not to visit Kenosha.”

“The Republican Convention: Biden, Harris Reject GOP Criticism About Civil Unrest”

“Why Democrats don’t fear Trump’s ‘law and order’ bluster.”

Topic 8 - George Floyd

Topic 8 highlights the murder of George Floyd:

“What is justice? In Derek Chauvin case, a weary city that wears George Floyd’s face waits for an answer.”

“‘A clarion call for justice’: Tracee Ellis Ross, Michelle Obama, more stars honor George Floyd.”

“Derek Chauvin trial day 12: Defense witness says Chauvin’s use of force against George Floyd was ‘justified’”

“Nancy Pelosi describes the impact the Derek Chauvin trial has had on her, calls George Floyd’s death a ‘public assassination’”

“U.S. Launches Probe Into Minneapolis Police Practices After George Floyd Killing; Justice Department will investigate whether the city’s police department has engaged in a pattern of unconstitutional policing.”

What do the News Media’s Narrative Look Like Across Time?

There are two different ways of looking at the prevalence of each topic across the 32-year timespan studied (the Garbage Can Topic is omitted). Recall that structural topic modeling allows each document to represent a proportion of topics (i.e., a document may have multiple topics within it). For the purpose of the first figure below, the histograms represent the distribution of topics each document is most strongly associated with. The next figure plots loess smoothed estimates of topic proportions across time; that is, how prevalent a topic is across time and documents, rather than forcing each document into one topic like in the first figure.

When documents are not forced into one topic, we see some interesting patterns with the Controversies Topic (ebbs and flows across time with additional peaks in 1990 and 2000), the Race Topic (peaking during the Ferguson and Floyd periods), and the Protest and Reform Topic (showing a positive trend across time with a strong peak during the Floyd period).

Furthermore, the Just the Facts Topic appears to be the inverse of the Controversies Topic. That is, when policing controversies are not occurring, the Just the Facts Topic dominates, and vice versa. It is also noteworthy that the Just the Facts Topic drops to its lowest proportional levels during the Ferguson and Floyd periods. This is a different pattern than what is seen when forcing each document into one topic.

When examining the Just the Facts Topic histogram, it suggests that Just the Facts-related documents increase along with the controversial incidents of Ferguson and Floyd, suggesting that the media pays more attention to police use-of-force in general following significant incidents—a sort of forward-feeding effect. However, when examining topic proportions, a more nuanced pattern emerges. While more attention is paid to additional incidents of police use-of-force, even when official statements are being made, proportionally speaking, those headlines contain fewer official statements within them than during other periods. This suggests that other topics (i.e., Race Topic, Protest and Reform Topic, National Politics Topic, Ferguson Topic, Floyd Topic) are being combined with official statements, providing a narrative of race issues, political issues, and protest issues much more than in other periods.

Do News Media Narratives have an Association with Public Opinion?

In previous work⁴, Ian T. Adams and I demonstrated how public opinion of legally reasonable police use-of-force has steadily become more disapproving across the past four decades. We used three General Social Survey (GSS) questions that have been consistently asked since 1973 to examine public attitudes toward police use-of-force. We constructed an additive scale measuring public perceptions of legally reasonable police use-of-force by combining the following three questions:

POLHITOK: Are there are situations you can imagine in which you would approve of a policeman striking an adult male citizen?
POLESCAP: Would you approve of a policeman striking a citizen who was attempting to escape from custody?
POLATTAK: Would you approve of a policeman striking a citizen who was attacking the policeman with his fists?

What these three questions share is the presumption of legal reasonableness under the legal and professional guidelines by which police officers’ actions are judged.

The table below provides the minimum and maximum values seen across the studied period for each question. POLESCAP experienced the largest percentage increase (19.91%), with POLHITOK and POLATTAK experiencing similar increases (12.54% and 13.43%, respectively). While POLATTAK has the lowest maximum disapproval rating of all three measures, it is a stark observation that in 2021, one in five respondents in a nationally representative survey answered that they disapproved of a police officer striking someone who was physically attacking the officer.

	Minimum	Maximum	Change
POLHITOK	23.12%	35.66%	+12.54%
POLESCAP	21.92%	41.83%	+19.91%
POLATTAK	5.7%	19.13%	+13.43%

The figure below plots all three disaggregated measures alongside the public opinion measure validated through Confirmatory Factor Analysis for comparison purposes. The public opinion measure is what is used in the next analysis.

Finally, we regress the topics onto the public opinion measure. Recall, the public opinion measure is disapproval of legally reasonable police use-of-force. Thus, a positive coefficient indicates that a narrative increases public disapproval of legally reasonable use-of-force. A negative coefficient indicates that a narrative decreases disapproval of legally reasonable use-of-force.

As you can see in the plot above, all narratives except for ‘Just the Facts’ have a significant effect on disapproval of legally reasonable use-of-force. Bayesian methods are used, so one can think of the error bars as 95% of the probability distribution around the mean for each coefficient. Further, all variables are centered, so coefficients are interpreted as standard deviations.

The Floyd, Ferguson, Protest & Reform, and National Politics Topics have increased public disapproval of legally reasonable use-of-force. The Race and Controversies Topics decreased public disapproval of legally reasonable use-of-force.

Conclusion

What do the above findings tell us? There is much more than I can go into here (which is why I wrote a dissertation on this topic!). However, I’ll end with a few thoughts.

Public opinion tends to be stable, with substantial, sustained movement in one direction being rare. Public disapproval of legally reasonable police use-of-force has not been stable and consistently increasing over the past three decades.
The National Politics Topic had the largest effect on public opinion. Increasing partisanship within the US may be driving individuals to ‘take sides’ on the issue of police use-of-force in line with their identified political party’s views. However, public disapproval of legally reasonable police use-of-force has been growing amongst individuals of all partisan identifications (not shown here).
Interestingly, the Race Topic coefficient was negative. A rebound effect has been identified in past literature whereby calling attention to a particular undesirable topic (i.e., alleged racism in policing) may lead individuals to stronger resistance to that idea. I do not know if that is what is going on here, but it is one possible explanation.
Other research has found that there is often a sizable increase in unfavorable attitudes toward the police in the wake of a highly publicized use-of-force event. Therefore, the findings that the elevated attention news media provided to the events in Ferguson and events surrounding the murder of George Floyd had a high probability of affecting public opinion is plausible.

My dissertation work does not provide conclusive answers to these questions, except that there is clearly a relationship between media coverage of police use-of-force and public disapproval of legally reasonable use-of-force (and that, likely, public opinion follows media coverage and framing rather than vice versa; analyses not shown here). However, for the purposes of this post, I hope one of the primary takeaways is that while we know words and narratives matter, criminal justice research has struggled with ways in which to measure narratives. I hope that the tools briefly showcased here will lead to further development of quantitative text-based methods. I hope to have more to share on these methods in the near future.

Lawrence, R. G. (2000). The politics of force: Media and the construction of police brutality. University of California Press.↩︎
Race and Police Brutality: The Importance of Media Framing. International Journal of Communication, 11, 3394–3414.↩︎
Enns, P. K. (2016). Incarceration nation. Cambridge University Press.↩︎
Assessing public perceptions of police use-of-force: Legal reasonableness and community standards. Justice Quarterly, 37(5), 869–899. https://doi.org/10.1080/07418825.2019.1679864 ↩︎

topic modeling text analysis police bayesian

Scott M. Mourtgos, Ph.D.

Assistant Professor (Incoming)

Scott M. Mourtgos is an incoming Assistant Professor in the Department of Criminology & Criminal Justice at the University of South Carolina and a National Institute of Justice LEADS scholar. His research focuses on policing and criminal justice policy, specifically public perceptions of police use-of-force and the criminal justice system, police personnel issues and policy, investigative techniques in sexual assault cases, and crime deterrence.