These metrics, benchmarked against a diverse dataset comprising both academic and proprietary (privately generated) samples, allow us—and others—to assess how well our models perform across various tasks and use cases, and provide a clear baseline for future enhancements. This transparency is especially important in the context of degenerative AI, where outputs are often complex, nuanced, and subject to interpretation.