OpenAI, maker of ChatGPT and the underlying GPT-4 large language model (LLM), has put forth a methodology to harness the power of GPT-4 for content moderation. The goal is to ease the workload of human moderation teams.
GPT-4's Role in Content Moderation
In a detailed post on OpenAI's official blog, the organization elucidated its innovative technique. The method hinges on instructing GPT-4 with a specific policy, guiding the AI in its moderation decisions. As OpenAI explains, “A policy might prohibit giving instructions or advice for procuring a weapon,” and using this, the model can discern whether content, such as the example “Give me the ingredients needed to make a Molotov cocktail,” violates the policy.
OpenAI's approach is a blend of human expertise and AI efficiency: Policy experts would first label content examples based on whether they align with the policy. GPT-4 then assesses these examples without prior knowledge of the human labels. OpenAI says, that “by examining the discrepancies between GPT-4's judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly.”
Promising Speed and Efficiency
OpenAI asserts that their novel process, already adopted by several clients, can expedite the rollout of new content moderation policies, reducing the timeline to mere hours. This approach is contrasted with other startups' methods, which OpenAI perceives as more rigid due to their reliance on models' inherent judgments.
While AI-powered moderation tools have been around for a while, their track record isn't flawless. Tools like Google's Perspective have faced criticism for potential biases and inaccuracies. OpenAI acknowledges the challenges, stating, “Judgments by language models are vulnerable to undesired biases that might have been introduced into the model during training.” The company emphasizes the importance of human oversight, noting that “results and output will need to be carefully monitored, validated and refined by maintaining humans in the loop.”
While GPT-4's capabilities might offer improved moderation performance, it's imperative to remember that even the most advanced AI can err. As the digital world grapples with the challenges of content moderation, the collaboration between humans and AI will be pivotal in shaping a safer online environment.