Akash Network Rolls Out AkashML, First Fully Managed AI Inference Service On Decentralized GPUs

MpostMediaGroup

2025-11-25 09:51:26

In Brief

Akash Network has launched AkashML, offering OpenAI‑compatible APIs, global low‑latency access, and up to 85% cost savings for deploying LLMs.

Akash Network, a cloud computing marketplace, has introduced the first fully managed AI inference service operating entirely on decentralized GPUs. This new service removes the operational challenges previously faced by developers in managing production-grade inference on Akash, providing the advantages of decentralized cloud computing without the need for hands-on infrastructure management.

At launch, AkashML offers managed inference for models including Llama 3.3-70B, DeepSeek V3, and Qwen3-30B-A3B, available for immediate deployment and scalable across more than 65 datacenters globally. This setup enables instant global inference, predictable pay-per-token pricing, and enhances developer productivity.

Akash has supported early AI developers and startups since the rise of AI applications following OpenAI’s initial advancements. Over the past few years, the Akash Core team has collaborated with clients such as brev.dev (acquired by Nvidia), VeniceAI, and Prime Intellect to launch products serving tens of thousands of users. While these early adopters were technically proficient and could manage infrastructure themselves, feedback indicated a preference for API-driven access without handling the underlying systems. This input guided the development of a non-public AkashML version for select users, as well as the creation of AkashChat and AkashChat API, paving the way for the public launch of AkashML.

AkashML To Cut LLM Deployment Costs By Up To 85%

The new solution addresses several key challenges that developers and businesses encounter when deploying large language models. Traditional cloud solutions often involve high costs, with reserved instances for a 70B model exceeding $0.13 per input and $0.40 per output per million tokens, while AkashML leverages marketplace competition to reduce expenses by 70-85%. Operational overhead is another barrier, as packaging models, configuring vLLM or TGI servers, managing shards, and handling failovers can take weeks of engineering time; AkashML simplifies this with OpenAI-compatible APIs that allow migration in minutes without code changes.

Latency is also a concern with centralized platforms that require requests to traverse long distances. AkashML directs traffic to the nearest of over 80 global datacenters, delivering sub-200ms response times suitable for real-time applications. Vendor lock-in limits flexibility and control over models and data; AkashML uses only open models such as Llama, DeepSeek, and Qwen, giving users full control over versioning, upgrades, and governance. Scalability challenges are mitigated by auto-scaling across decentralized GPU resources, maintaining 99% uptime and removing capacity limits while avoiding sudden price spikes.

AkashML is designed for fast onboarding and immediate ROI. New users receive $100 in AI token credits to experiment with all supported models through the Playground or API. A single API endpoint supports all models and integrates with frameworks like LangChain, Haystack, or custom agents. Pricing is transparent and model-specific, preventing unexpected costs. High-impact deployments can gain exposure through Akash Star, and upcoming network upgrades including BME, virtual machines, and confidential computing are expected to reduce costs further. Early users report three- to five-fold reductions in expenses and consistent global latency under 200ms, creating a reinforcing cycle of lower costs, increased usage, and expanded provider participation.

Getting started is simple: users can create a free account at playground.akashml.com in under two minutes, explore the model library including Llama 3.3-70B, DeepSeek V3, and Qwen3-30B-A3B, and see pricing upfront. Additional models can be requested directly from the platform. Users can test models instantly in the Playground or via the API, monitor usage, latency, and spending through the dashboard, and scale to production with region pinning and auto-scaling.

Centralized inference remains costly, slow, and restrictive, whereas AkashML delivers fully managed, API-first, decentralized access to top open models at marketplace-driven prices. Developers and businesses seeking to reduce inference costs by up to 80% can begin using the platform immediately.

AKT2.43%

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.