Every API has limits. The Humanizer API is no exception. Understanding how rate limits work, how to handle errors gracefully, and which patterns keep your integration reliable at scale is the difference between a production-ready system and one that breaks under load.
This guide covers everything you need: rate limit tiers, error handling patterns, retry strategies, and the best practices that experienced integrators follow.
Rate Limit Tiers
Rate limits depend on your plan tier. The free tier allows 10 requests per minute and 10,000 words per month. The Pro tier increases to 60 requests per minute with higher monthly word limits. Enterprise plans offer custom rate limits based on your volume needs.
Rate limits apply to both individual and batch endpoints. A single batch request containing 100 texts counts as 1 request against your per-minute limit, not 100. This is why the batch endpoint is so much more efficient for high-volume processing.
Check the API documentation for current limits by plan. Rate limits are enforced per API key, not per IP address. If you’re running multiple services with the same key, they share the same limits.
HTTP 429: Too Many Requests
When you exceed your rate limit, the API returns a 429 status code with a Retry-After header indicating how many seconds to wait before retrying. Your code must handle this response correctly.
response = requests.post(api_url, headers=headers, json=payload)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 5))
print(f"Rate limited. Waiting {retry_after} seconds...")
time.sleep(retry_after)
# Retry the request
elif response.status_code == 200:
result = response.json()
else:
handle_error(response)
Never ignore 429 responses. If you keep sending requests after being rate limited, subsequent requests may be rejected for longer periods. Respect the Retry-After header and your integration will stay healthy.
Exponential Backoff with Jitter
For transient errors (500, 502, 503, 504, and network timeouts), exponential backoff is the standard retry pattern. Wait 1 second after the first failure, 2 seconds after the second, 4 seconds after the third. Add random jitter to prevent multiple clients from retrying at exactly the same time.
import random
import time
def request_with_backoff(url, headers, payload, max_retries=4):
for attempt in range(max_retries):
try:
response = requests.post(
url, headers=headers, json=payload, timeout=30
)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
wait = int(response.headers.get('Retry-After', 5))
elif response.status_code >= 500:
wait = (2 ** attempt) + random.uniform(0, 1)
else:
# Client error (4xx), don't retry
raise APIError(f"Client error: {response.status_code}")
time.sleep(wait)
except requests.Timeout:
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
raise APIError("Max retries exceeded")
The jitter (random.uniform(0, 1)) is critical. Without it, if 50 clients all fail at the same time, they all retry at the same time, causing another failure. Jitter spreads retries out and reduces contention.
Circuit Breaker Pattern
If the API is experiencing extended downtime, retrying every request wastes resources and adds latency to your application. The circuit breaker pattern solves this by “opening” the circuit after a threshold of consecutive failures, then periodically testing whether the API has recovered.
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failures = 0
self.threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = 0
self.state = "closed" # closed = normal, open = failing
def can_request(self):
if self.state == "closed":
return True
# Check if recovery period has passed
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "half-open"
return True
return False
def record_success(self):
self.failures = 0
self.state = "closed"
def record_failure(self):
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.threshold:
self.state = "open"
Wrap your API calls with the circuit breaker. If the circuit is open, return a cached result or queue the request for later instead of waiting for a timeout.
Error Codes Reference
The API returns specific error codes that tell you exactly what went wrong. Handle each category differently:
400 Bad Request
Your request body is malformed. Check that you’re sending valid JSON with the required fields. Don’t retry these, fix the request.
401 Unauthorized
Your API key is invalid or expired. Check your key and regenerate if needed. Don’t retry.
403 Forbidden
Your plan doesn’t include the feature you’re trying to use, or your account is suspended. Check your account status on the dashboard.
413 Payload Too Large
Your text exceeds the maximum length for a single request. Split it into smaller chunks and process them separately.
429 Too Many Requests
Rate limited. Wait for the Retry-After period and try again.
500 Internal Server Error
Something went wrong on our end. Retry with exponential backoff. If it persists for more than 5 minutes, check the status page.
503 Service Unavailable
The API is temporarily down for maintenance. Retry after the Retry-After period.
Request Validation
Validate your requests before sending them. This prevents wasting API calls on requests that will fail with 400 errors.
def validate_request(text, tone=None):
if not text or not isinstance(text, str):
raise ValueError("Text must be a non-empty string")
if len(text) > 50000:
raise ValueError("Text exceeds maximum length (50,000 chars)")
if len(text.strip()) < 50:
raise ValueError("Text too short for meaningful humanization")
valid_tones = ["professional", "casual", "conversational", "academic"]
if tone and tone not in valid_tones:
raise ValueError(f"Invalid tone. Choose from: {valid_tones}")
return True
Catching these errors locally is instant. Sending them to the API adds latency and counts against your rate limit.
Logging Best Practices
Log every API interaction. At minimum, capture: request timestamp, response status code, response time, word count processed, and confidence score. This data helps you debug issues, track costs, and optimize performance.
import logging
logger = logging.getLogger("humanizer")
def log_api_call(response, start_time, word_count):
duration = time.time() - start_time
logger.info(
f"status={response.status_code} "
f"duration={duration:.2f}s "
f"words={word_count} "
f"confidence={response.json().get('confidence_score', 'N/A')}"
)
In production, send these logs to a monitoring system like Datadog, CloudWatch, or your existing observability stack. Set up alerts for elevated error rates or response times above your threshold.
Monitoring and Alerting
Track these metrics over time: request success rate (should be above 99%), p95 response time (should be under 3 seconds for single requests), daily word count (to predict when you'll hit your quota), and error rate by type (to identify patterns).
Set up alerts for: success rate dropping below 98%, p95 response time exceeding 5 seconds, daily word count approaching 80% of your monthly quota, and any spike in 500 errors.
These alerts catch problems before they impact your users. A spike in latency might mean you need to add caching. A spike in errors might mean the API is having issues. Either way, you'll know before your content pipeline stalls.
SDK vs. Raw HTTP
The Humanizer API provides official SDKs for Python and Node.js. These SDKs handle authentication, retries, rate limiting, and error parsing automatically. For most teams, using the SDK is the right choice.
Use raw HTTP when you need maximum control over the request lifecycle, when you're working in a language without an official SDK, or when the SDK doesn't support a feature you need (like custom retry logic or specific timeout configurations).
The SDK handles 90% of use cases correctly out of the box. Start there. Switch to raw HTTP only when you hit a limitation.
Putting It All Together
A production-ready integration combines validation, circuit breaking, exponential backoff, logging, and monitoring. It handles every error gracefully, never loses data, and gives you visibility into what's happening.
Start simple: basic request validation and retry logic. Add the circuit breaker and monitoring as your volume grows. The patterns in this guide scale from 100 requests per day to 100,000.
Check the full API documentation for endpoint details, and explore the features page for capabilities you might want to integrate.
Get your free API key to start building. 10,000 words per month, no credit card required. That's enough to test your integration end-to-end before scaling up.
Backoff patterns by client
Every HTTP client should respect 429 responses. The right pattern is exponential backoff capped at the value in the Retry-After header.
Python
import time, requests
def humanize_with_retry(text, tone, attempts=4):
for i in range(attempts):
r = requests.post(URL, headers=HEADERS, json={'text': text, 'tone': tone})
if r.status_code == 200: return r.json()
if r.status_code != 429: r.raise_for_status()
wait = int(r.headers.get('Retry-After', 2 ** i))
time.sleep(wait)
raise RuntimeError('rate limit persisted')
Node.js
async function humanizeWithRetry(text, tone, attempts = 4) {
for (let i = 0; i < attempts; i++) {
const r = await fetch(URL, { method: 'POST', headers, body: JSON.stringify({ text, tone }) });
if (r.ok) return r.json();
if (r.status !== 429) throw new Error(`HTTP ${r.status}`);
const wait = parseInt(r.headers.get('Retry-After')) || Math.pow(2, i);
await new Promise(res => setTimeout(res, wait * 1000));
}
throw new Error('rate limit persisted');
}
Frequently asked questions
What rate limit applies to me?
Free tier: 10 requests/minute. Starter: 60 RPM. Pro: 300 RPM. Enterprise: custom. See pricing.
Does the batch endpoint share the same limit?
Yes - a batch counts as one request. So Pro at 60 RPM × 100 items per batch = 6,000 humanizations/minute via batch.
What if my burst exceeds the per-minute cap?
The 429 response includes Retry-After. Most teams find that bursting up to 2x the cap briefly is fine - sustained over-limit triggers temporary blocks. For consistent burst patterns, upgrade your plan.
How do I distribute load across multiple API keys?
Each key has its own rate-limit pool. For higher concurrency, generate multiple keys and round-robin (Pro plans typically allow 5+ keys). Beyond that, talk to sales about Enterprise dedicated capacity.
Are there per-IP limits even on auth'd requests?
Yes - to prevent key-sharing abuse, we cap requests per IP at 10x the plan rate limit. For server-to-server traffic from a single egress IP, this isn't an issue. For end-user proxying, consider pooling keys per region.
What happens if I never back off?
Continuous 429 ignoring triggers progressive blocks: 5 min, 30 min, 2 hours. Repeated violations can suspend the API key. Always implement backoff.
Avoid these mistakes
- Don't poll waiting for capacity - sleep for
Retry-After, notRetry-After / 10. - Don't retry on 4xx errors - only 429 and 5xx. Retrying 400s wastes attempts.
- Don't skip the jitter - pure exponential backoff causes thundering-herd retries. Add 10-20% random jitter.
- Don't ignore the response logging -
request_idin v2 responses helps support trace failed calls.
Sign up for an API key to get started.