This article summarizes best practices for building high-performance large language model (LLM) AI agents based on practical experience. It explores different agent system architectures, from simple workflows to autonomous agents, and provides guidance on when to use each approach. Additionally, the article delves into the role of frameworks and emphasizes the importance of simplicity, transparency, and well-designed agent-computer interfaces (ACI).
Image Source: ChatGPT 4o
Key Summary
1. What are AI Agents?
- Workflows: LLMs and tools are coordinated through predefined code paths.
- AI Agents: LLMs dynamically guide their own processes and tool usage, controlling how tasks are executed.
Key Difference: Workflows follow fixed paths, while AI agents can adapt flexibly.
2. When (and When Not) to Use AI Agents
- Prioritize Simplicity: In many cases, optimizing a single LLM call is sufficient.
- Workflows for Predictable Tasks: Workflows are ideal when consistency and predictability are crucial.
- AI Agents for Flexible Tasks: AI agents are better when model-driven decisions and scalability are needed.
3. When and How to Use Frameworks
- Advantages of Frameworks: Simplify standard low-level tasks such as calling LLMs, defining and parsing tools, and chaining calls.
- Disadvantages of Frameworks: May add abstraction layers, making debugging more difficult and encouraging over-complexity.
- Recommendation: Prioritize direct use of LLM APIs, as many patterns can be implemented with just a few lines of code. If using frameworks, ensure you understand the underlying code.
4. Building Blocks, Workflows, and AI Agents
a) Building Blocks: Augmented LLM
- Retrieval: Access external knowledge.
- Tools: Interact with external systems (e.g., APIs, databases).
- Memory: Retain information across interactions.
Key Point: Adjust augmentation features based on specific use cases and ensure they provide a clear, comprehensive interface for the LLM.
The augmented LLM
b) Workflows: Prompt Chaining
Overview: Break down tasks into a series of steps, with each LLM call handling the output of the previous step.
When to Use: Suitable for tasks that can be clearly broken down into fixed subtasks, trading latency for higher accuracy.
Example Use Cases:
- Content Creation and Localization:
- Generate English marketing copy first
- Check if the copy aligns with brand tone
- Translate approved copy into target language
- Adjust translation to fit local cultural norms
- Technical Documentation Writing:
- Generate document outline based on requirements
- Check if the outline covers all necessary topics
- Write detailed content based on the approved outline
- Conduct technical accuracy review
- Product Description Generation:
- Extract key product features
- Generate an appealing product title
- Write detailed description
- Optimize SEO keywords
The prompt chaining workflow
c) Workflows: Routing
Overview: Classify input and direct it to specialized subsequent tasks, achieving prompt specialization and separation of concerns.
When to Use: Suitable for complex tasks with different categories that are best handled separately and can be accurately classified.
- Customer Service Inquiry Classification:
- Refund requests → Finance department-specific prompt
- Technical issues → Technical support prompt
- Product inquiries → Sales department prompt
- Complaint handling → Customer relations prompt
- Content Moderation Routing:
- General content → Basic moderation process
- Sensitive content → Enhanced moderation process
- Urgent content → Priority handling process
- Multilingual Support:
- Simple queries → Direct translation
- Technical issues → Professional translators
- Cultural-related → Localization experts
The routing workflow
d) Workflows: Parallelization
Overview: LLMs process a task simultaneously and programmatically aggregate their outputs.
Two Main Variants:
* Sectioning: Break down a task into independent subtasks and execute them in parallel.
* Voting: Run the same task multiple times to get different outputs.
When to Use: Effective when subtasks can be processed in parallel to increase speed, or when multiple perspectives or attempts are needed for higher confidence results.
Sectioning Use Cases:
- Content Moderation:
- Simultaneously check:
- Content appropriateness
- Fact accuracy
- Grammar correctness
- Brand consistency
- Code Review:
- Parallel evaluation of:
- Security vulnerabilities
- Performance issues
- Code style
- Documentation completeness
Voting Use Cases:
- Content Rating:
- Multiple models evaluate content ratings simultaneously
- Determine final rating based on majority decision
- Escalate to human review if inconsistent
- Translation Quality:
- Generate multiple translation versions
- Cross-evaluate quality of each version
- Choose the best or combine advantages
The parallelization workflow
e) Workflows: Orchestrator-Workers
Overview: A central LLM dynamically breaks down tasks, delegates them to worker LLMs, and integrates their results.
When to Use: Suitable for complex tasks where the required subtasks cannot be predicted (e.g., in programming, the number of files to be changed and the nature of changes in each file may depend on the task).
Example Use Cases:
- Website Content Update:
- Orchestrator analyzes update requirements
- Assigns tasks to specialized workers:
- SEO optimization
- Content writing
- Image caption generation
- Tag management
- Research Report Generation:
- Orchestrator plans report structure
- Workers handle different sections:
- Data analysis
- Literature review
- Trend analysis
- Recommendation writing
The orchestrator-workers workflow
f) Workflows: Evaluator-Optimizer
Overview: One LLM generates a response, while another LLM provides evaluation and feedback in a loop.
When to Use: Particularly effective when there are clear evaluation criteria, and iterative improvement can yield measurable value.
Example Use Cases:
- Article Optimization:
- Generate initial article
- Evaluator checks:
- Readability
- Logical flow
- Argument support
- Optimizer improves based on feedback
- Ad Copy Optimization:
- Create multiple copy versions
- Evaluate key metrics:
- Persuasiveness
- Relevance to target
- Call-to-action effectiveness
- Iteratively improve the best version
The evaluator-optimizer workflow
g) AI Agents
Overview: AI agents operate autonomously after receiving commands or engaging in interactive discussions with users. They plan and execute independently and may seek human input for clarification or judgment.
When to Use: Suitable for open-ended problems where the number of steps is hard to predict and cannot be hard-coded into a fixed path. Requires a certain level of trust in the LLM’s decision-making.
Example Use Cases:
- Research Assistant:
- Conduct literature searches autonomously
- Summarize key findings
- Identify research gaps
- Generate research recommendations
- Data Analyst:
- Automatically clean data
- Perform statistical analysis
- Generate visualizations
- Provide insight reports
- Content Curator:
- Monitor content trends
- Filter relevant content
- Organize content themes
- Generate content recommendations
5. Combining and Customizing Patterns
These patterns are not rigid rules but starting points that can be combined and adjusted according to specific needs. The key is to measure performance and iterate, only adding complexity when it significantly improves results.
6. Conclusion
- Simplicity: Maintain a simple design.
- Transparency: Clearly show the planning steps of AI agents.
- Well-Designed ACI: Thoroughly document and test tools.