8 steps to manage AI costs and resources

It’s no secret that AI, especially generative AI, is the technological wunderkind of the moment. The potential benefits are enormous, although many of them are not yet fully realized. Across industries, the use cases for AI seem limitless – it often feels like AI is like a hammer looking for nails. However, it’s important to determine the right size of that hammer and whether that nail is really a nail.

Given the huge potential of AI, companies often experiment with different use cases, leading to over-provisioning of resources and therefore unnecessary spending. Due to the dynamic way in which AI models are trained and used, resource consumption can be difficult to predict and control. Every new dataset could lead to a breakthrough – or send you down an expensive rabbit hole.

As enterprises expand their cloud footprint, the need and desire for cost accountability and optimization becomes increasingly important. The experimental nature of AI can work against these goals, making transparency, accountability and optimization more important than ever. Like other cloud services, AI offerings tend to be easy to use—and even easier to overuse, leading to inflated bills.

This is where FinOps comes in. At their current level of maturity, FinOps practices and processes are most likely already in place in most enterprise organizations, and if not, they’re on the radar in the near future. Although AI is different from traditional cloud-based workloads, the core principles of FinOps still apply. To truly optimize AI costs, you need visibility into resource usage, accountability to attribute that usage (and cost) to the appropriate parties, and optimization opportunities based on your observations. Fortunately, FinOps provides the framework for this.

Managing AI costs step by step

Managing AI costs is done in parallel with managing other cloud costs. This is beneficial because many companies already have processes in place to manage them. To control and optimize your AI costs, make sure you have the following elements in place:

Step 1: Visibility
Transparency is key. Being able to see all the resources used across the organization related to AI resource consumption is the foundation for everything that comes after. Some PaaS AI offerings offer limited transparency because fees can appear as a single line item, so when it comes to transparency, your consumption may vary.

Step 2: Accountability
Once you have visibility into usage, the next step is to figure out who is using it. Identifying the users or groups responsible for resource consumption can reveal potential overuse or inefficiencies.

Step 3: Governance
Accountability is strengthened by governance. Excitement about the potential of AI can lead to experimentation that leads to overuse and overspending. Governance controls prevent uncontrolled use of AI resources or cloud resources. It is important that these governance controls act as guardrails, not obstacles. While you don’t want to stifle the efforts of well-meaning users, you do want to keep them on a responsible path (from a financial perspective).

Step 4: Tagging
Tagging, tagging, and more tagging. When resources are properly tagged, they can be assigned to the right user, team, project, application, business unit, etc. Tagging improves visibility and accountability and helps identify areas of potential cost overrun. However, the capabilities for tagging PaaS AI resources are not as robust as IaaS resources, so the ability to get granular tags may not be possible.

Step 5: Budgets and related alerts
Set budgets and alerts to prevent automation from getting out of control or “unconscious overprovisioning” happening. Make sure you have budgets for teams using AI services and that alerts are triggered when AI spending tends to exceed those budgets. Tagging, accountability, and governance can help with more granular budgets for individual teams, business units, etc. Usage data and patterns may show that one business unit is consuming far more resources than others, but the data may prove this to be acceptable and budgets for that group can be adjusted accordingly.

Step 6: Optimization
With transparency into AI resource usage and accountability, you can now optimize. When AI services are being used (rather than through the cloud infrastructure that runs those services), the concept of “underutilization” does not apply in the traditional sense. This means that resources are being used when they are needed, not sitting idle waiting for a task to be performed. Therefore, it is important to evaluate whether the resources being used are delivering a reasonable ROI. This is where organizations need to find a balance—just because you can add another document to index doesn’t mean you should. Optimization means using data wisely to reduce training time and scale back when needed. Optimization is an inexact science, and the decision threshold is different in every organization. There is no golden metric that applies to all use cases. The trick is to determine your golden metric and consider it in your optimization decisions.

Step 7: Operating model
The final phase of the FinOps framework is operations: defining strategies to optimize resources and refining workflows to implement those strategies. The same principles apply to AI resource management. In this phase, you can refine processes or create new processes to implement what was learned in the earlier phases, so that those lessons can be leveraged rather than re-learned.

Step 8: Continuous optimization
The phases of the FinOps framework are circular for a reason – optimization is an ongoing process, and you’re never done. Although it gets easier with practice, true cost optimization, AI or not, involves a finish line that is often in sight but never crossed.

Final thoughts

As with other cloud resources, transparency and accountability are essential to optimizing AI usage. While there may be gaps in how AI services are offered and consumed, the FinOps framework provides a solid foundation for AI optimization. The key is to adapt the framework to fill these gaps and ensure that AI cost optimization is part of your overall FinOps practice. Read more about adapting your FinOps strategy on the Flexera blog.

Breaking News

Grandtkitchenfilipinocuisine

Managing AI costs step by step

Final thoughts

Leave a Reply Cancel reply