Prerequisites

Before starting this chapter, you should have completed the following courses or have equivalent experience:

  • GenAI Agent Engineering: Familiarity with building agent workflows using LangGraph, state management, and tool calling patterns. You will use LangGraph to build the DiscoveryInterviewAgent in Objective 3.

  • Enterprise LLM Customization: Understanding of prompt engineering, structured outputs with Instructor and Pydantic, and multi-provider SDK usage (OpenAI, Gemini, Anthropic). The scoring engine and feasibility analyzer rely heavily on structured LLM outputs.

  • GenAI Architecture & Design Patterns: Knowledge of RAG pipelines, embedding strategies, and API design patterns. The data readiness profiler evaluates datasets for embedding suitability, and all components expose FastAPI endpoints.

Additionally, you should be comfortable with:

  • Python 3.11+: All labs use Python with type hints, dataclasses, and async patterns.
  • FastAPI: Every objective includes REST API endpoints. You should understand path operations, dependency injection, and Pydantic request/response models.
  • Pydantic v2: Used extensively for data validation, structured LLM outputs via Instructor, and API schemas.
  • Basic data analysis: The data readiness profiler works with pandas DataFrames for schema analysis and statistical profiling.

AI Use Case Discovery & Data Readiness Assessment: Learning Goals

By the end of this chapter, you will be able to:

  1. Use Case Prioritization and Scoring

    • Build a use case prioritization enginethat scores AI opportunities by feasibility, impact, and data readiness using LLM-powered analysis
    • This skill is fundamental for structured client discovery and prevents wasted effort on low-value AI initiatives.
    • You will practice this through hands-on labs building a UseCaseScoringEngine with Instructor and Pydantic models for weighted multi-criteria evaluation.
    • Understanding structured scoring enables you to produce defensible, data-driven recommendations for executive stakeholders.
  2. Data Readiness Profiling and PII Detection

    • Implement a data readiness assessment pipelinethat profiles customer datasets for schema quality, volume, PII exposure, and embedding suitability
    • Data readiness is the single biggest predictor of AI project success, making this assessment critical before any technical commitment.
    • You will practice this through hands-on labs building a DataReadinessProfiler with Presidio PII detection and pandas-based schema analysis.
    • Understanding data profiling enables you to identify blockers early and set realistic expectations with clients about data preparation effort.
  3. LLM-Powered Discovery Interviews

    • Design a structured discovery interview frameworkwith LLM-generated follow-up questions and automated insight extraction
    • Discovery interviews are the primary mechanism for understanding client needs, and automating follow-up generation ensures comprehensive coverage.
    • You will practice this through hands-on labs building a DiscoveryInterviewAgent as a LangGraph state machine with multi-turn conversation management.
    • Understanding automated interview flows enables you to scale discovery across multiple stakeholders without losing depth or consistency.
  4. Multi-Provider Feasibility Analysis

    • Build a technical feasibility analyzerthat maps use cases to provider capabilities (OpenAI vs Gemini vs Anthropic) with cost and latency estimates
    • Provider selection directly impacts project cost, performance, and feature availability, making objective comparison essential for scoping.
    • You will practice this through hands-on labs building a ProviderFeasibilityAnalyzer with LiteLLM for parallel multi-provider benchmarking.
    • Understanding provider trade-offs enables you to make architecture recommendations grounded in measured performance rather than assumptions.
  5. Automated Discovery Report Generation

    • Create a discovery report generatorthat produces executive-ready summaries from structured assessment data
    • Translating technical assessment data into business-oriented reports is essential for stakeholder buy-in and project approval.
    • You will practice this through hands-on labs building a DiscoveryReportGenerator with Jinja2 templates and cross-component data aggregation.
    • Understanding report generation enables you to package discovery insights into artifacts that drive informed decision-making.

Key Terminology

Use Case Scoring
A structured evaluation method that assigns numerical scores to potential AI use cases across multiple weighted criteria such as business impact, technical feasibility, data readiness, and time-to-value, producing a ranked prioritization list.
Data Readiness Assessment
A systematic profiling of customer datasets that evaluates schema completeness, null rates, cardinality, data volume, PII exposure, and embedding suitability to determine whether data assets can support the proposed AI solution.
Presidio
An open-source PII detection and anonymization framework by Microsoft that uses named entity recognition, pattern matching, and context-aware analysis to identify sensitive data types including emails, phone numbers, SSNs, and names across multiple languages.
Instructor
A Python library that patches LLM client libraries (OpenAI, Anthropic, Google) to return structured Pydantic model instances instead of raw text, enabling type-safe extraction of complex data structures from LLM responses.
LangGraph
A framework for building stateful, multi-actor agent workflows as directed graphs, where nodes represent processing steps and edges define transitions, with built-in support for state persistence, streaming, and human-in-the-loop patterns.
LiteLLM
A unified interface library that provides a single API to call 100+ LLM providers (OpenAI, Anthropic, Google, and others) with consistent request/response formats, automatic retry logic, and built-in cost tracking per provider.
Feasibility Analysis
A technical evaluation that benchmarks a proposed use case against multiple LLM providers by comparing response quality, latency (time-to-first-token and total), token usage, and estimated cost to determine the optimal provider fit.
Discovery Workshop
A structured client engagement session where stakeholders describe business problems, data assets, and technical constraints, producing documented insights that feed into use case scoring and solution scoping.
Embedding Suitability
An assessment of whether a dataset's content characteristics (text length, vocabulary diversity, domain specificity, and language) make it appropriate for vector embedding and semantic search applications.
Confidence Interval
In the context of use case scoring, a statistical range around each criterion score that reflects the LLM's certainty in its evaluation, calculated from consistency across multiple scoring passes.
Schema Profiling
The automated analysis of dataset structure including column types, null percentages, unique value counts, value distributions, and referential integrity that determines data quality for downstream AI processing.
PII Exposure
The presence of personally identifiable information in customer datasets that must be detected, quantified, and addressed through redaction or anonymization before the data can be used in AI training or inference pipelines.

On This Page