Meta's New Moderation Strategy Linked to Surge in Harmful Content on Facebook

335
31 May 2025
4 min read

News Synopsis

Meta’s revised content moderation policies are under scrutiny after the company’s latest Integrity Report revealed a noticeable increase in harmful content on its platforms, especially Facebook.

This development follows the company’s decision earlier this year to ease enforcement rules, a move believed to be partly driven by political considerations.

The report, the first since Meta’s policy changes in January 2025, shows that instances of violent, graphic, and harassing content are on the rise, while the number of posts removed for policy violations has significantly dropped.

Spike in Bullying and Graphic Content on Facebook

Facebook sees reversal of declining trends

Meta’s report indicates a rise in violent and graphic content on Facebook, growing from 0.06–0.07% in late 2024 to 0.09% in Q1 2025. While the numbers may seem marginal, they reflect a large volume given Facebook's massive user base.

Similarly, harassment and bullying incidents have also risen. “There was a small increase in the prevalence of bullying and harassment content from 0.06–0.07% to 0.07–0.08% on Facebook due to a spike in sharing of violating content in March,” according to the report. This uptick marks a reversal of previous downward trends, raising questions about the consequences of the relaxed policy.

Enforcement Actions See Significant Decline

Meta reduces removals and shifts focus

Coinciding with the surge in harmful content, Meta's enforcement actions have dropped drastically. In Q1 2025, the company took action on just 3.4 million pieces of content under its hate speech policy — the lowest level since 2018. Similarly, spam content removal decreased from 730 million in late 2024 to 366 million, and the number of fake accounts removed fell from 1.4 billion to 1 billion.

These reductions come after Meta transitioned away from broad proactive moderation. The company now limits its focus to severe violations such as child exploitation and terrorism-related content.

Shift in Meta’s Definition of Hate Speech

Narrower scope and more political leeway

Meta has redefined its hate speech policy, now restricting it to cover only direct attacks and dehumanising language. As a result, statements expressing contempt, exclusion, or inferiority — previously flagged — are now permissible.

This aligns with Meta’s broader decision to protect political discourse, removing stricter moderation on topics such as immigration, race, and gender identity.

Community Notes Replace Third-Party Fact-Checking

User-driven moderation model sparks debate

In early 2025, Meta ended its third-party fact-checking partnerships in the U.S. and replaced them with a crowd-sourced fact-checking tool called “Community Notes.” These are now available on Facebook, Instagram, Threads, and Reels.

Although Meta has not yet published data on their usage or effectiveness, it has promised future updates. Experts have expressed concern over the potential for bias or manipulation, as the system depends heavily on user-generated contributions without editorial oversight.

Meta Highlights Reduction in Moderation Mistakes

Despite the increase in harmful content, Meta claims its policy shift is reducing enforcement errors. According to the company, moderation mistakes declined by nearly 50% in the U.S. between Q4 2024 and Q1 2025.

While details on how this figure is calculated are lacking, Meta says it plans to release new metrics in upcoming reports to enhance transparency and build user trust. The goal, Meta says, is to “strike the right balance” between excessive censorship and insufficient enforcement.

Teen Safety Measures Still in Place

Protections for younger users remain a priority

One area where Meta continues proactive moderation is content directed at teens. The company confirmed that it is introducing Teen Accounts to better shield younger users from bullying and inappropriate material.

Additionally, Meta has increased its reliance on AI and large language models (LLMs) for moderation. The company reports that these models are now outperforming human reviewers in certain areas, and are being used to automatically dismiss non-violating content from review queues.

Podcast

TWN Exclusive