OpenAI has released new model safety guidance, unveiling a document known as Model Spec, which delineates expected behaviors for models within the OpenAI API and ChatGPT. The guidance is designed to support machine learning researchers and data labelers in refining models through reinforcement learning from human feedback (RLHF). The guidelines stipulate that generative AI assistant applications should avoid producing content classified as Not Safe For Work (NSFW), including erotica, extreme gore, slurs, and unsolicited profanity.
Exploring NSFW Content Generation
Despite the limitations on NSFW content, OpenAI has initiated an exploration into the potential for generating such content in age-appropriate scenarios. The organization aims to provide developers and users with the flexibility to use its services based on their specific needs, within the confines of OpenAI's usage policies. This exploration seeks to better gauge societal and user expectations concerning the behavior of AI models in producing NSFW content.
Partnerships and Policy Implications
OpenAI is actively forming partnerships with publishers to boost content discoverability through ChatGPT and other services, proposing attribution links in AI-generated responses in return for content licensing. This approach suggests a shift towards a pay-to-play monetization model akin to web search engines. OpenAI's usage policies currently restrict sexually explicit or suggestive content for minors but permit legal NSFW material. The company has encountered challenges, such as the presence of inappropriate material in training datasets, underscoring the complexities of content management in AI models.
OpenAI is dedicated to preventing the creation of AI-generated pornography and deepfakes, focusing on robust safeguards and the protection of children. The company's exploration into NSFW content generation is conducted with careful consideration of discussions about sexuality in contexts suitable for different age groups.