Cloudflare Unveils Unified Inference Layer: One API to Access All AI Models
Cloudflare has launched a unified inference layer, allowing developers to access any AI model from any provider using one API. The new feature, called AI Gateway, aims to simplify the process of building and deploying AI-powered applications by eliminating the need for multiple APIs and providers.
With AI Gateway, developers can call third-party models using the same AI.run() binding they already use for Workers AI. This means that switching from a Cloudflare-hosted model to one from OpenAI, Anthropic, or any other provider is just a one-line change. The platform currently supports over 70 models across 12+ providers.
The unified API also allows developers to manage all their AI spend in one place, providing a centralized view of costs and enabling better budgeting and resource allocation. Additionally, the platform includes custom metadata capabilities, allowing developers to track costs on specific attributes such as user type or workflow.
Cloudflare is also expanding its model offerings to include image, video, and speech models, enabling the development of multimodal applications. The company has partnered with several major model providers, including Alibaba Cloud, AssemblyAI, Bytedance, Google, and others, to bring their models onto the platform.
To further simplify the process, Cloudflare is working on allowing developers to "bring their own model" to Workers AI using Replicate's Cog technology. This will enable customers to package their own machine learning models and deploy them through the Workers AI API.
The launch of AI Gateway comes as part of Cloudflare's ongoing effort to make it easier for developers to build and deploy AI-powered applications. The platform is designed to be fast, reliable, and scalable, with automatic failover capabilities in case one provider goes down.