Methodology Media Bias Detector

Bonisiwe Shabane
-
methodology media bias detector

The initial focus of the Media Bias Detector is to showcase real-time dashboards of what we deemed most important and tractable about what is being produced in the news this week. Initially, we aim to analyze 10 online newspapers: Associated Press News, Breitbart News, CNN, Fox News, The Guardian, The Huffington Post, The New York Times, USA Today, The Wall Street Journal, and The Washington... We chose these publications for their mix of reach and agenda-setting influence, where we plan to add more publications going forward. We look at each publisher’s homepage five times daily, at 6 AM, 10 AM, 2 PM, 6 PM, and 10 PM EST. Then, we use article placement on the homepage to identify the top 20 most prominent articles displayed to readers. We assign position on the page as a combination of: distance from the top, size of the font, and inclusion and size of figures.

Currently, we disregard content that does not focus on text, such as videos, podcasts, and photo galleries. Next, we recover the article text, title, and date of publication. We preprocess the text beforehand to ensure superfluous text is not passed into the large language model. We remove advertisements, direct mentions of the publisher outside the context of the story, and repetitive phrases that are irrelevant to the article (e.g. “Listen 5 minutes”, “Click Here for More Information”, “Enter your email address”). We are continuing to grow this dictionary of boilerplate phrases as we collect more data.

Next, we use GPT to generate labels for each document. The labels are generated at two levels: the article level and the sentence level. For simpler tasks, such as determining the topic, we employ GPT-3.5 Turbo, while for more complex tasks, we use GPT-4o. In the following sections, we summarize the list of extracted features and provide the exact prompts used for each task. Starting in 2025, we have a new methodology that aims to assess media outlets’ ideological bias and factual reliability systematically. It uses a comprehensive, weighted scoring system to evaluate political, social, and journalistic dimensions.

This approach ensures an accurate and transparent assessment of a source’s political alignment and commitment to factual reporting, providing readers with a better understanding of media credibility and bias. (All reviewed and re-reviewed sources are subject to this methodology beginning Jan 1, 2025.) Bias is inherently subjective, and while no universally accepted scientific formula exists to measure it, our methodology uses objective indicators to approximate and represent bias. Each evaluated source is placed on a bias scale, visually represented by a yellow dot, to indicate its position. This is complemented by a “Detailed Report” that explains the source’s characteristics and the reasoning behind its bias rating. While this updated methodology reduces the influence of a strictly U.S.-centric political spectrum, it remains primarily tailored to the political landscape of the United States.

This ensures that evaluations are relevant to a significant audience while acknowledging that some biases in the U.S. context may not apply exactly in other countries where terms like “Liberal” may have a different meaning. Readers should consider this when comparing sources with political systems from other countries. For example, a left-leaning source looks like this: A strongly right-leaning source looks like this:

People Also Search

The Initial Focus Of The Media Bias Detector Is To

The initial focus of the Media Bias Detector is to showcase real-time dashboards of what we deemed most important and tractable about what is being produced in the news this week. Initially, we aim to analyze 10 online newspapers: Associated Press News, Breitbart News, CNN, Fox News, The Guardian, The Huffington Post, The New York Times, USA Today, The Wall Street Journal, and The Washington... We...

Currently, We Disregard Content That Does Not Focus On Text,

Currently, we disregard content that does not focus on text, such as videos, podcasts, and photo galleries. Next, we recover the article text, title, and date of publication. We preprocess the text beforehand to ensure superfluous text is not passed into the large language model. We remove advertisements, direct mentions of the publisher outside the context of the story, and repetitive phrases tha...

Next, We Use GPT To Generate Labels For Each Document.

Next, we use GPT to generate labels for each document. The labels are generated at two levels: the article level and the sentence level. For simpler tasks, such as determining the topic, we employ GPT-3.5 Turbo, while for more complex tasks, we use GPT-4o. In the following sections, we summarize the list of extracted features and provide the exact prompts used for each task. Starting in 2025, we h...

This Approach Ensures An Accurate And Transparent Assessment Of A

This approach ensures an accurate and transparent assessment of a source’s political alignment and commitment to factual reporting, providing readers with a better understanding of media credibility and bias. (All reviewed and re-reviewed sources are subject to this methodology beginning Jan 1, 2025.) Bias is inherently subjective, and while no universally accepted scientific formula exists to mea...

This Ensures That Evaluations Are Relevant To A Significant Audience

This ensures that evaluations are relevant to a significant audience while acknowledging that some biases in the U.S. context may not apply exactly in other countries where terms like “Liberal” may have a different meaning. Readers should consider this when comparing sources with political systems from other countries. For example, a left-leaning source looks like this: A strongly right-leaning so...