The OWASP Top 10 for LLM Applications 2025 identifies LLM10: Unbounded Consumption as the risk of allowing users or attackers to consume excessive AI resources through unrestricted inference requests. This can result in denial-of-service (DoS), skyrocketing cloud costs (“Denial of Wallet”), degraded application performance, and even intellectual property theft through model extraction.

Unlike traditional web applications, Large Language Models are computationally expensive. Every prompt consumes CPU, GPU, and memory resources, as well as API resources. Without proper controls, a single attacker or even an unintended workload can overwhelm an AI system, significantly increasing operational costs and reducing service availability.

As enterprises deploy AI copilots, chatbots, coding assistants, and autonomous agents at scale, protecting AI infrastructure from excessive consumption is becoming as important as protecting it from traditional cyberattacks.

What is Unbounded Consumption?

Unbounded Consumption refers to the process where a Large Language Model (LLM) generates outputs based on input queries or prompts. Inference is a critical function of LLMs, involving the application of learned patterns and knowledge to produce relevant responses or predictions.

Unbounded Consumption occurs when a Large Language Model (LLM) application allows users to conduct excessive and uncontrolled inferences, leading to risks such as denial of service (DoS), economic losses, model theft, and service degradation.

The high computational demands of LLMs, especially in cloud environments, make them vulnerable to resource exploitation and unauthorized usage.

Common Examples of Unbounded Consumption

Variable-Length Input Flood

Attackers can overload the LLM with numerous inputs of varying lengths, exploiting processing inefficiencies. This can deplete resources and potentially render the system unresponsive, significantly impacting service availability.

Denial of Wallet (DoW)

By initiating a high volume of operations, attackers exploit the cost-per-use model of cloud-based AI services, leading to unsustainable financial burdens on the provider and risking financial ruin.

Continuous Input Overflow

Continuously sending inputs that exceed the LLM’s context window can lead to excessive computational resource use, resulting in service degradation and operational disruptions.

Resource-Intensive Queries

Submitting unusually demanding queries involving complex sequences or intricate language patterns can drain system resources, leading to prolonged processing times and potential system failures.

Model Extraction via API

Attackers may query the model API using carefully crafted inputs and prompt injection techniques to collect sufficient outputs to replicate a partial model or create a shadow model. This not only poses risks of intellectual property theft but also undermines the integrity of the original model.

Functional Model Replication

Using the target model to generate synthetic training data can allow attackers to fine-tune another foundational model, creating a functional equivalent. This circumvents traditional query-based extraction methods, posing significant risks to proprietary models and technologies.

Side-Channel Attacks

Malicious attackers may exploit input filtering techniques of the LLM to execute side-channel attacks, harvesting model weights and architectural information. This could compromise the model’s security and lead to further exploitation.

Real-World Attack Scenarios

Scenario 1: AI API Flooding Causes Service Outage

An attacker scripts thousands of requests against a public AI chatbot or API. Although each request appears legitimate, the cumulative effect overwhelms the underlying GPUs and inference infrastructure, causing legitimate users to experience slow responses or complete service outages.

Real-World Example

In 2024, developers of the open-source code search platform Sourcegraph disclosed a security incident in which attackers manipulated API limits to generate excessive AI requests, leading to denial-of-service conditions and increased operational costs.

Scenario 2: Denial of Wallet (DoW)

An attacker generates excessive operations to exploit the pay-per-use model of cloud-based AI services, causing unsustainable costs for the service provider. OWASP identifies Denial of Wallet (DoW) as one of the fastest-growing risks for cloud-hosted AI services due to their consumption-based pricing models. Similar cost-exhaustion attacks have been demonstrated against serverless cloud workloads and AI APIs.

Real-World Example of Denial of Wallet (DoW)
A reported cloud cost incident involved a cluster hit by a DDoS attack that automatically scaled up to 2,000 instances, generating a $120,000 bill in 72 hours.

Scenario 3: Model Extraction Through API Abuse

Attackers continuously query an organization’s proprietary AI model using carefully crafted prompts. Over time, they collect enough responses to train another model that closely replicates the behaviour of the original system. Rather than stealing the source code, they effectively clone the model through repeated inference requests.

Real-World Example

Researchers have demonstrated practical model extraction attacks, showing how production language models can be partially replicated using API responses alone.

Scenario 4: Resource-Intensive Prompt Abuse

An attacker deliberately crafts prompts containing extremely long context windows, deeply nested instructions, or computationally expensive reasoning tasks. Although each request is technically valid, the cumulative resource consumption significantly slows the system.

Real-World Example

Academic researchers have demonstrated Sponge Attacks, where carefully designed inputs dramatically increase the energy consumption and latency of neural networks without triggering traditional security defences.

Prevention and Mitigation Strategies

Input Validation

Implement strict input validation to ensure that inputs do not exceed reasonable size limits

Limit Exposure of Logits and Logprobs

Restrict or obfuscate the exposure of `logit_bias` and `logprobs` in API responses. Provide only the necessary information without revealing detailed probabilities.

Rate Limiting

Apply rate limiting and user quotas to restrict the number of requests a single source entity can make in a given time period.

Resource Allocation Management

Monitor and manage resource allocation dynamically to prevent any single user or request from consuming excessive resources.

Timeouts and Throttling

Set timeouts and throttle processing for resource-intensive operations to prevent prolonged resource consumption.

Sandbox Techniques

Restrict the LLM’s access to network resources, internal services, and APIs. This is particularly significant for all common scenarios as it encompasses insider risks and threats. Furthermore, it governs the extent of access the LLM application has to data and resources, thereby serving as a crucial control mechanism to mitigate or prevent side-channel attacks.

Comprehensive Logging, Monitoring and Anomaly Detection

Continuously monitor resource usage and implement logging to detect and respond to unusual patterns of resource consumption.

Watermarking

Implement watermarking frameworks to embed and detect unauthorized use of LLM outputs.

Graceful Degradation

Design the system to degrade gracefully under heavy load, maintaining partial functionality rather than complete failure

Limit Queued Actions and Scale Robustly

Implement restrictions on the number of queued actions and total actions, while incorporating dynamic scaling and load balancing to handle varying demands and ensure consistent system performance.

Adversarial Robustness Training

Train models to detect and mitigate adversarial queries and extraction attempts.

Glitch Token Filtering

Build lists of known glitch tokens and scan output before adding it to the model’s context window

Access Controls

Implement strong access controls, including role-based access control (RBAC) and the principle of least privilege, to limit unauthorized access to LLM model repositories and training environments.

Centralized ML Model Inventory

Use a centralized ML model inventory or registry for models used in production, ensuring proper governance and access control.

Automated MLOps Deployment

Implement automated MLOps deployment with governance, tracking, and approval workflows to tighten access and deployment controls within the infrastructure

Secure AI, Secure Your Budget

Many organizations focus on preventing AI from generating harmful content, but overlook a simpler question:

Can someone force your AI to become expensive?

At Reputiva, we view Unbounded Consumption as both a cybersecurity risk and a FinOps challenge.

Without proper governance, AI systems can become attractive targets for attackers seeking to disrupt operations, inflate cloud bills, or steal valuable AI capabilities through excessive API usage.

Secure AI Starts with Resource Governance

Organizations should implement controls such as:

  • API authentication and authorization
  • Rate limiting and request quotas
  • Token and context window limits
  • Budget alerts and cloud cost monitoring
  • AI workload monitoring and anomaly detection
  • Intelligent request throttling
  • RBAC for AI services
  • Continuous AI security assessments

Securing AI isn’t just about protecting data; it’s also about protecting compute, budgets, and business continuity.

Secure Your AI Before It Impacts Your Bottom Line

Every AI request consumes valuable computing resources. Without proper safeguards, excessive inference requests can lead to service outages, runaway cloud costs, degraded performance, and intellectual property theft.

Reputiva helps organizations secure enterprise AI through:

  • AI Security Assessments
  • AI Architecture Reviews
  • Cloud Security Assessments
  • AI Governance and Risk Reviews
  • FinOps Assessments for AI Workloads
  • API Security Reviews
  • Secure AI Deployment Best Practices

Book a consultation with Reputiva to assess your AI security posture, strengthen your cloud governance, and ensure your AI systems remain secure, resilient, and cost-efficient.


Reputiva

Reputiva is a cloud, cybersecurity, and FinOps advisory firm helping SMEs reduce cyber risk, strengthen cloud environments, and manage technology costs with confidence. We publish practical insights on cloud security, identity, AI risk, compliance, and digital transformation.

Author posts

Navigate

Let's talk

Networks

Privacy Preference Center