Free lesson

Practical use cases — security, parameters, observability

You can apply security best practices (API-key config, what NOT to log), tune LLM parameters intelligently (`temperature`, `top_p`, `max_tokens`, `stop`, `presence_penalty`, `frequency_penalty`, `seed`), track token usage, configure custom base URLs for proxies, and apply the chapter's production checklist.

~22 min read · Free to read — no subscription required.

Security Best Practices

API Key Management

Loading diagram...

Never hardcode API keys in your source code. The example below shows three progressively better approaches: hardcoded keys (dangerous because they can leak through version control, logs, and error messages), environment variables via os.environ (better because keys are separated from code), and Pydantic Settings with SecretStr (best because it adds type validation, automatic .env file loading, and prevents accidental logging of secrets through its masked string representation).

Code snippet python
1# BAD - Never do this 2# Keys in source code can be exposed through version control, 3# logs, error messages, and code sharing 4client = OpenAI(api_key="sk-xxxx...") 5 6# GOOD - Use environment variables 7# Environment variables are separate from code and can be 8# managed securely through deployment configurations 9import os 10client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) 11 12# BETTER - Use Pydantic Settings for typed configuration 13# This approach provides validation, type checking, and 14# automatic loading from .env files 15from pydantic_settings import BaseSettings 16from pydantic import SecretStr 17 18class LLMSettings(BaseSettings): 19 """Configuration settings for LLM clients. 20 21 Settings are loaded from environment variables or .env file. 22 SecretStr prevents accidental logging of sensitive values. 23 """ 24 openai_api_key: SecretStr 25 anthropic_api_key: SecretStr 26 gemini_api_key: SecretStr 27 28 # Optional proxy URLs 29 openai_proxy_url: str | None = None 30 anthropic_proxy_url: str | None = None 31 gemini_proxy_url: str | None = None 32 33 # Default models 34 default_openai_model: str = "gpt-4" 35 default_anthropic_model: str = "claude-3-opus-20240229" 36 default_gemini_model: str = "gemini-2.0-flash" 37 38 class Config: 39 env_file = ".env" 40 env_file_encoding = "utf-8" 41 42# Usage 43settings = LLMSettings() 44client = OpenAI( 45 api_key=settings.openai_api_key.get_secret_value() 46)
  • Lines 1-4: The BAD approach hardcodes the API key directly in the constructor call, exposing it in source code
  • Lines 6-9: The GOOD approach uses os.environ to retrieve the key at runtime, keeping it out of version control
  • Lines 11-15: The BETTER approach imports BaseSettings from pydantic_settings and SecretStr from pydantic for typed, validated configuration
  • Lines 17-30: Define LLMSettings extending BaseSettings with SecretStr fields for all three API keys, optional proxy URLs with string type, and default model strings
  • Lines 32-34: Configure the Config inner class to auto-load from a .env file with UTF-8 encoding
  • Lines 36-39: Demonstrate usage by instantiating settings and calling get_secret_value() on the SecretStr to retrieve the actual key string

Request Logging

The SecureLogger class below enables debugging and troubleshooting LLM requests without exposing sensitive message content. It uses a RequestLog dataclass to capture structured metadata (provider, model, message count, parameters) and a SHA-256 hash of the message content for request correlation instead of logging the actual text. The log_request method hashes the concatenated message content, the log_response method records token usage and latency, and log_error truncates error messages to prevent content leakage through exception strings.

Code snippet python
1import logging 2import hashlib 3from datetime import datetime 4from typing import List 5from dataclasses import dataclass, asdict 6import json 7 8@dataclass 9class RequestLog: 10 """Structured log entry for LLM requests.""" 11 timestamp: str 12 provider: str 13 model: str 14 message_count: int 15 last_role: str 16 max_tokens: int 17 temperature: float 18 request_hash: str # Hash of content for correlation without exposure 19 20 def to_dict(self): 21 return asdict(self) 22 23class SecureLogger: 24 """Logger that protects sensitive content while enabling debugging.""" 25 26 def __init__(self, logger: logging.Logger = None): 27 self.logger = logger or logging.getLogger("llm_client") 28 29 def _hash_content(self, content: str) -> str: 30 """Create a hash of content for correlation.""" 31 return hashlib.sha256(content.encode()).hexdigest()[:12] 32 33 def log_request( 34 self, 35 provider: str, 36 model: str, 37 messages: List[ChatMessage], 38 max_tokens: int, 39 temperature: float 40 ): 41 """Log a request with sensitive data protected.""" 42 # Create content hash for correlation 43 content_parts = [m.content for m in messages] 44 request_hash = self._hash_content("".join(content_parts)) 45 46 log_entry = RequestLog( 47 timestamp=datetime.utcnow().isoformat(), 48 provider=provider, 49 model=model, 50 message_count=len(messages), 51 last_role=messages[-1].role.value if messages else "none", 52 max_tokens=max_tokens, 53 temperature=temperature, 54 request_hash=request_hash 55 ) 56 57 self.logger.info(f"LLM Request: {json.dumps(log_entry.to_dict())}") 58 59 def log_response( 60 self, 61 provider: str, 62 request_hash: str, 63 input_tokens: int, 64 output_tokens: int, 65 latency_ms: float, 66 success: bool 67 ): 68 """Log a response with timing and token metrics.""" 69 self.logger.info( 70 f"LLM Response: provider={provider} " 71 f"request_hash={request_hash} " 72 f"input_tokens={input_tokens} " 73 f"output_tokens={output_tokens} " 74 f"latency_ms={latency_ms:.2f} " 75 f"success={success}" 76 ) 77 78 def log_error( 79 self, 80 provider: str, 81 request_hash: str, 82 error_type: str, 83 error_message: str 84 ): 85 """Log an error without exposing request content.""" 86 # Sanitize error message to remove any content that leaked in 87 sanitized_message = error_message[:200] # Truncate long messages 88 89 self.logger.error( 90 f"LLM Error: provider={provider} " 91 f"request_hash={request_hash} " 92 f"error_type={error_type} " 93 f"message={sanitized_message}" 94 )
  • Lines 1-6: Import logging, hashlib, datetime, List, dataclass, asdict, and json for structured logging infrastructure
  • Lines 8-18: Define RequestLog dataclass with fields for timestamp, provider, model, message count, last role, max tokens, temperature, and a content hash for correlation
  • Lines 20-22: Define SecureLogger with init accepting an optional logger instance, defaulting to the "llm_client" named logger
  • Lines 24-26: Define _hash_content that creates a 12-character SHA-256 hash prefix of the content string for safe correlation
  • Lines 28-44: Define log_request that hashes all message content, creates a RequestLog entry with metadata (no raw content), and logs it as JSON
  • Lines 46-62: Define log_response that logs provider, request hash, token counts, latency, and success status
  • Lines 64-72: Define log_error that truncates the error message to 200 characters to prevent content leakage, then logs with provider, hash, and error type

Input Validation and Sanitization

The InputValidator class below protects your LLM integration from prompt injection attacks and abuse by validating message counts, content length, and suspicious patterns before sending requests to providers. It defines configurable limits for maximum message length (100,000 characters) and maximum message count (100), plus a set of regex patterns that detect common prompt injection attempts like "ignore previous instructions". The validate_messages method returns a list of issues found, while sanitize_content removes null bytes, normalizes whitespace, and truncates oversized content.

Code snippet python
1from typing import List 2import re 3 4class InputValidator: 5 """Validator for LLM inputs to prevent injection and abuse.""" 6 7 MAX_MESSAGE_LENGTH = 100000 # Characters 8 MAX_MESSAGES = 100 9 10 def __init__(self): 11 # Patterns that might indicate prompt injection attempts 12 self.suspicious_patterns = [ 13 r"ignore\s+(previous|all)\s+instructions", 14 r"disregard\s+(previous|all)\s+instructions", 15 r"system\s*:\s*", # Attempting to inject system messages 16 ] 17 18 def validate_messages(self, messages: List[ChatMessage]) -> List[str]: 19 """Validate messages and return list of issues found.""" 20 issues = [] 21 22 if len(messages) > self.MAX_MESSAGES: 23 issues.append(f"Too many messages: {len(messages)} > {self.MAX_MESSAGES}") 24 25 for i, msg in enumerate(messages): 26 if len(msg.content) > self.MAX_MESSAGE_LENGTH: 27 issues.append( 28 f"Message {i} too long: {len(msg.content)} > {self.MAX_MESSAGE_LENGTH}" 29 ) 30 31 # Check for suspicious patterns 32 for pattern in self.suspicious_patterns: 33 if re.search(pattern, msg.content, re.IGNORECASE): 34 issues.append(f"Message {i} contains suspicious pattern") 35 36 return issues 37 38 def sanitize_content(self, content: str) -> str: 39 """Sanitize content for safe processing.""" 40 # Remove null bytes 41 content = content.replace("\x00", "") 42 43 # Normalize whitespace 44 content = " ".join(content.split()) 45 46 # Truncate if too long 47 if len(content) > self.MAX_MESSAGE_LENGTH: 48 content = content[:self.MAX_MESSAGE_LENGTH] + "... [truncated]" 49 50 return content
  • Lines 1-2: Import List for type hints and re for regex pattern matching
  • Lines 4-7: Define InputValidator with class-level constants for maximum message length (100,000 characters) and maximum message count (100)
  • Lines 9-15: In init, define a list of suspicious regex patterns that detect prompt injection attempts including "ignore previous instructions", "disregard all instructions", and attempts to inject system messages
  • Lines 17-30: Define validate_messages that checks message count against the limit, validates each message's length, and scans content against suspicious patterns using case-insensitive regex, returning a list of issue descriptions
  • Lines 32-40: Define sanitize_content that removes null bytes, normalizes whitespace by splitting and rejoining, and truncates content exceeding the maximum length with a "[truncated]" suffix

Practical Use Cases

Loading diagram...

Use Case 1: Customer Support Chatbot

A common application for LLM clients is building customer support chatbots that maintain conversation history and follow specific behavioral guidelines. The SupportBot class below uses the unified LLMClient interface to manage multi-turn conversations, storing a running conversation_history list and a system prompt with customer support guidelines. The respond method appends each user message to the history, calls generate with the full context, appends the assistant response back to the history, and returns the generated text. The reset_conversation method clears the history to start fresh.

Code snippet python
1class SupportBot: 2 """Customer support chatbot using unified LLM interface.""" 3 4 def __init__(self, client: LLMClient): 5 self.client = client 6 self.conversation_history: List[ChatMessage] = [] 7 self.system_prompt = """You are a helpful customer support agent for TechCo. 8 9 Guidelines: 10 - Be polite and professional 11 - Provide accurate information about our products 12 - Escalate to a human agent if you cannot help 13 - Never make up information about pricing or policies 14 """ 15 16 async def respond(self, user_message: str) -> str: 17 """Generate a response to a user message.""" 18 # Add user message to history 19 self.conversation_history.append( 20 ChatMessage(role=MessageRole.USER, content=user_message) 21 ) 22 23 # Generate response 24 response = await self.client.generate( 25 messages=self.conversation_history, 26 system_prompt=self.system_prompt, 27 max_tokens=500, 28 temperature=0.7 29 ) 30 31 # Add assistant response to history 32 self.conversation_history.append( 33 ChatMessage(role=MessageRole.ASSISTANT, content=response.content) 34 ) 35 36 return response.content 37 38 def reset_conversation(self): 39 """Start a new conversation.""" 40 self.conversation_history.clear()
  • Lines 1-3: Define SupportBot with init accepting an LLMClient, initializing an empty conversation_history list and a detailed system prompt with customer support guidelines
  • Lines 4-14: The system prompt instructs the model to be polite, accurate, and to escalate when unable to help, establishing behavioral guardrails
  • Lines 16-32: Define the respond method that appends the user message to history, calls client.generate with the full conversation context and system prompt, appends the assistant response, and returns the content string
  • Lines 34-36: Define reset_conversation that clears the history list to start a fresh conversation session

Use Case 2: Document Summarization Pipeline

The DocumentSummarizer class below implements a map-reduce pattern for handling documents that exceed the model's effective context window. It splits large documents into word-based chunks via _chunk_document, summarizes each chunk concurrently using asyncio.gather with the _summarize_chunk method, then combines all chunk summaries and produces a final coherent summary through _final_summary. For small documents that fit in a single chunk, it bypasses the map-reduce step and summarizes directly, optimizing for both efficiency and quality.

Code snippet python
1class DocumentSummarizer: 2 """Summarize documents using LLM with chunking support.""" 3 4 def __init__( 5 self, 6 client: LLMClient, 7 max_chunk_tokens: int = 3000 8 ): 9 self.client = client 10 self.max_chunk_tokens = max_chunk_tokens 11 12 async def summarize(self, document: str) -> str: 13 """Summarize a document, handling large texts with chunking.""" 14 chunks = self._chunk_document(document) 15 16 if len(chunks) == 1: 17 # Small document, summarize directly 18 return await self._summarize_chunk(chunks[0]) 19 20 # Large document, summarize chunks then combine 21 chunk_summaries = await asyncio.gather( 22 *[self._summarize_chunk(chunk) for chunk in chunks] 23 ) 24 25 # Combine summaries 26 combined = "\n\n".join(chunk_summaries) 27 return await self._final_summary(combined) 28 29 def _chunk_document(self, document: str) -> List[str]: 30 """Split document into chunks.""" 31 # Simple word-based chunking (in practice, use token counting) 32 words = document.split() 33 chunk_size = self.max_chunk_tokens # Approximation 34 35 chunks = [] 36 for i in range(0, len(words), chunk_size): 37 chunk = " ".join(words[i:i + chunk_size]) 38 chunks.append(chunk) 39 40 return chunks 41 42 async def _summarize_chunk(self, chunk: str) -> str: 43 """Summarize a single chunk.""" 44 response = await self.client.generate( 45 messages=[ 46 ChatMessage( 47 role=MessageRole.USER, 48 content=f"Summarize the following text concisely:\n\n{chunk}" 49 ) 50 ], 51 system_prompt="You are a professional summarization assistant.", 52 max_tokens=500, 53 temperature=0.3 54 ) 55 return response.content 56 57 async def _final_summary(self, summaries: str) -> str: 58 """Create final summary from chunk summaries.""" 59 response = await self.client.generate( 60 messages=[ 61 ChatMessage( 62 role=MessageRole.USER, 63 content=f"Combine these summaries into a coherent overall summary:\n\n{summaries}" 64 ) 65 ], 66 system_prompt="You are a professional summarization assistant.", 67 max_tokens=1000, 68 temperature=0.3 69 ) 70 return response.content
  • Lines 1-5: Define DocumentSummarizer with init accepting an LLMClient and a configurable max_chunk_tokens (default 3000) for controlling chunk size
  • Lines 7-18: Define the summarize method that chunks the document, returns a direct summary for single-chunk documents, or runs concurrent chunk summarizations via asyncio.gather followed by a final combining summary
  • Lines 20-30: Define _chunk_document that splits the document into word-based chunks of approximately max_chunk_tokens words each
  • Lines 32-42: Define _summarize_chunk that sends a single chunk to the LLM with a summarization prompt and low temperature (0.3) for consistent, factual output
  • Lines 44-53: Define _final_summary that combines chunk summaries into a coherent overall summary using a higher token limit (1000) to accommodate the synthesis

Use Case 3: Code Review Assistant

The CodeReviewAssistant class below automates code review by sending code snippets to the LLM with a detailed system prompt specifying review criteria including bugs, security vulnerabilities, performance issues, style problems, and test coverage gaps. The review method builds a formatted prompt with the code wrapped in a language-specific code block and optional context, while the suggest_improvements method focuses on readability, performance optimization, error handling, and modern language feature adoption. Both methods use low temperature values to produce consistent, factual analysis.

Code snippet python
1class CodeReviewAssistant: 2 """Automated code review using LLM analysis.""" 3 4 def __init__(self, client: LLMClient): 5 self.client = client 6 self.system_prompt = """You are an expert code reviewer. 7 8 Review code for: 9 - Bugs and potential errors 10 - Security vulnerabilities 11 - Performance issues 12 - Code style and best practices 13 - Test coverage gaps 14 15 Provide specific, actionable feedback with line references. 16 """
  • Lines 1-2: Define the CodeReviewAssistant class with a docstring describing its purpose as an automated code review tool powered by LLM analysis
  • Lines 4-6: The init method accepts an LLMClient instance and stores it as an instance variable for making API calls throughout the class
  • Lines 7-16: The system prompt is a multi-line string that instructs the model to review code across five dimensions: bugs, security vulnerabilities, performance issues, code style, and test coverage gaps, and to provide specific actionable feedback with line references

The review method below constructs a prompt that wraps the submitted code in a language-specific code block and optionally includes additional context. It sends this prompt to the LLM using a low temperature value of 0.3, which encourages deterministic and factually grounded analysis rather than creative or varied responses. The method returns the raw content from the LLM response, which the caller can then parse or display. Notice how the f-string template embeds the code between triple-backtick fences so the model sees properly formatted source code in its input.

Code snippetpython
1 async def review( 2 self, 3 code: str, 4 language: str, 5 context: str = "" 6 ) -> str: 7 """Review a code snippet.""" 8 prompt = f"""Review this {language} code: 9 10```{language} 11{code} 12``` 13 14{f"Context: {context}" if context else ""} 15 16Provide detailed feedback on any issues found.""" 17 18 response = await self.client.generate( 19 messages=[ChatMessage(role=MessageRole.USER, content=prompt)], 20 system_prompt=self.system_prompt, 21 max_tokens=2000, 22 temperature=0.3 23 ) 24 25 return response.content
  • Lines 1-6: Define the review method signature accepting the source code string, the programming language identifier, and an optional context string that provides additional information about the code's purpose or environment
  • Lines 8-16: Build the prompt using an f-string that wraps the code in a language-tagged code block, includes the optional context, and asks for detailed feedback on any issues found
  • Lines 18-23: Call client.generate with the constructed prompt, passing the system prompt, a 2000 token maximum to allow thorough analysis, and temperature 0.3 for consistent output
  • Line 25: Return the raw response content from the LLM for the caller to process or display

The suggest_improvements method follows a similar pattern but focuses the prompt on four specific improvement categories: readability, performance optimization, error handling enhancements, and adoption of modern language features. It uses a slightly higher temperature of 0.5 compared to the review method, which allows the model to generate more creative and diverse suggestions rather than strictly deterministic analysis. This temperature difference reflects the different nature of the tasks: reviewing code for bugs requires precision, while suggesting improvements benefits from broader thinking.

Code snippetpython
1 async def suggest_improvements(self, code: str, language: str) -> str: 2 """Suggest improvements for code.""" 3 prompt = f"""Suggest improvements for this {language} code: 4 5```{language} 6{code} 7``` 8 9Focus on: 101. Readability improvements 112. Performance optimizations 123. Error handling enhancements 134. Modern language features that could be used""" 14 15 response = await self.client.generate( 16 messages=[ChatMessage(role=MessageRole.USER, content=prompt)], 17 system_prompt=self.system_prompt, 18 max_tokens=2000, 19 temperature=0.5 20 ) 21 22 return response.content
  • Lines 1-2: Define suggest_improvements accepting the code string and language identifier, with a docstring indicating its purpose
  • Lines 3-13: Construct the prompt that wraps the code in language-tagged fences and lists four focus areas for improvement suggestions
  • Lines 15-20: Send the prompt through client.generate with temperature 0.5 for more creative suggestions and the same 2000 token limit
  • Lines 22-23: Return the generated improvement suggestions from the LLM response

Keep going with GenAI Agent Engineering

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.