Walkthrough video
Part 1: Discussion of Steps
- Creating Documentation in Markdown
Each of the main files (index.py, types.dt.py, files from the lib/ directory) should be described in the technical documentation. Below, I present Markdown files with full content.
README.md
README.md contains general information about the project, its operation, dependencies, and startup instructions.
Description
The AI Dev Agent project is an application that allows interaction with an AI model to perform specific tasks. It is configured to work with various text processing tools and integrate with the Anthropic Claude model.
Project Structure
- index.py - the main application file that starts the Flask server and handles requests.
- lib/ - folder containing all additional modules:
- ai.py - API handling for the Anthropic model.
- agent.py - decision-making logic and task execution by the agent.
- prompts.py - prompt definitions for the agent.
- tools.py - set of tools for content retrieval, file uploading, and other actions.
- ssh_manager.py - managing SSH connections and executing commands
- task_manager.py - managing tasks.
- types.dt.py - definitions of data types and agent state structure.
- config.py - environment variables
- config.yml - environment variables
- .env - API keys
Installation
1.Clone the repository.
|
|
2.Install Ansible and run the playbook site.yml
3.Run a virtual environment
4.Start the server
|
|
Usage
Send POST requests to the main endpoint / with the appropriate message content.
File log.md
log.md
is used to record the agent’s actions in Markdown format.
markdown
Agent Operation Log
Here, all operations performed by the agent will be logged in real time.
Log Structure
Each operation will be described along with its type, header, and content.
Example
[Operation Type] Header
Operation content…
Part 2: Manually Creating the Project Structure and Configuring the Environment
1. Creating the project directory structure and files
-
Create the main project directory and navigate to it:
1 2
mkdir aidevs cd aidevs
-
Create the directory structure and files in lib:
1 2 3
mkdir -p lib cd lib touch agent.py ai.py prompts.py tools.py
-
Create project files in the appropriate directories:
1
touch .env asgi_app.py config.py config.yml index.py requirements.txt ssh_manager.py task_manager.py types.dt.py
Files are created in the main project directory (ai_dev_agent
), as they serve fundamental functions for the entire project and are not specific to any subdirectory.
2. Configuring the virtual environment and requirements.txt file
-
Create and activate the virtual environment:
1 2
python3 -m venv venv source venv/bin/activate # Linux/macOS
Or add to .bashrc
this:
|
|
And then execute this:
|
|
-
Add required packages to the
requirements.txt
file:1 2 3 4 5 6 7 8 9 10 11 12
openai asyncssh markdown2 python-dotenv ansible-lint flask anthropic playwright markdownify httpx quart uvicorn
-
Install the packages from
requirements.txt
:1
pip install -r requirements.txt
3. Configuring the .env
file
-
Create the
.env
file in the main project directory with API keys:1 2 3 4
OPENAI_API_KEY=your_openai_api_key OPENROUTER_API_KEY=your_openrouter_api_key ANTHROPIC_API_KEY=your_anthropic_api_key LLAMA_PATH=path_to_your_local_llama_model
Part 3: Creating the main logic of the agent and other files
- File
index.py
– serves as the main entry point for the Flask-based server and handles various HTTP requests related to the AI agent’s functionality.
|
|
Full Explanation of the index.py
File
1. Initialization and Configuration
-
Flask as the framework
Quart
is the asynchronous version ofFlask
and supports both synchronous and asynchronous endpoints.- The application is defined and assigned to the variable
app
.
-
Loading environment variables
dotenv
is used to load essential environment variables like the Anthropic API key.
-
Initialization of the
AIAgent
class- The
AIAgent
class manages the AI agent’s logic and communicates with the model via theAnthropicCompletion
client.
- The
2. Structure of the AIAgent
Class
-
State attributes (
state
)- Store information about the current stage (
currentStage
), user messages (messages
), and actions taken by the agent (actionsTaken
).
- Store information about the current stage (
-
Processing loop (
run
)- Processes the user message in successive stages:
plan
,decide
,describe
,execute
,reflect
. - Ends when the
final_answer
stage is reached or the step limit is exceeded.
- Processes the user message in successive stages:
-
Stage methods
_plan
creates a plan of action based on the user’s message._decide
selects the next tool or decides to end the process._describe
generates input data for the selected tool._execute
performs the selected tool and records the results._reflect
analyzes the results and updates the plan.
-
final_answer
method- Creates the final answer and sends it to the user based on the collected data.
-
Debugging
- Debug logs like
[DEBUG]
help track the agent’s operation at each step.
- Debug logs like
3. process_request
Endpoint
- Handles POST requests to the
/
endpoint- Retrieves data in JSON format, processes it using
AIAgent
. - Creates an instance of
AIAgent
and calls therun
method with the user’s message. - Returns the agent’s response or an error code if something goes wrong.
- Retrieves data in JSON format, processes it using
4. Error Handling
- Exception handling
- Each stage and endpoint is wrapped in try-except blocks to handle errors and return appropriate information in the HTTP response.
5. Logging
_log_to_markdown
- Agent actions are logged to the
log.md
file, allowing for analysis.
- Agent actions are logged to the
- Debug logs
[DEBUG]
shows details about processed data and execution stages.
6. Key Elements
-
Asynchronous operations
- All operations are asynchronous, allowing the system to handle multiple requests concurrently.
-
Integration with AnthropicCompletion
- The
AnthropicCompletion
class handles communication with the AI model, processing input data and generating responses.
- The
7. Potential Extensions
-
Adding new endpoints
- You can add functions to handle new tools or agent functionalities.
-
Optimizing decision logic
- You could implement more advanced decision-making mechanisms to increase flexibility.
-
Better logging
- Implement a logging system using the
logging
module rather than justprint
statements.
- Implement a logging system using the
The index.py
file serves as the core of the application, managing data flow between the user, the AI agent, and the server, enabling scalable and efficient task execution.
- File
types.dt.py
– contains data type definitions:
|
|
Explanation
-
Stage: A
Literal
type that defines the various stages the agent can go through (init
,plan
,decide
,describe
,reflect
,execute
, andfinal
). -
ITool: A
TypedDict
that represents a tool used by the agent, with three fields:name
: The name of the tool.instruction
: The instructions for using the tool.description
: A brief description of what the tool does.
-
IAction: A
TypedDict
representing an action taken by the AI agent, containing:name
: The name of the action.payload
: The data input for the action.result
: The outcome or result of the action.reflection
: The reflection or additional notes on the action.tool
: The tool used to perform the action.
-
IState: A
TypedDict
defining the complete state of the AI agent, including:systemPrompt
: The current system prompt guiding the agent’s behavior.messages
: A list of all messages exchanged in the conversation.currentStage
: The stage the agent is currently in.currentStep
: The current step in the agent’s process.maxSteps
: The maximum number of steps allowed.activeTool
: The currently active tool being used by the agent.activeToolPayload
: The payload for the active tool.plan
: The current plan for the agent’s actions.actionsTaken
: A list of all actions the agent has taken.
This file defines all the necessary types to manage the agent’s state and track the actions throughout its workflow in a structured and type-safe manner.
- File
task_manager.py
– task management:
|
|
Explanation of task_manager.py
- TaskManager class: This class is designed to handle task management. It includes methods to retrieve a task by name (
get_task
) and execute an example task (example_task
).get_task
: Returns a task based on thetask_name
. It currently has a placeholder for tasks, withexample_task
as a dummy function.example_task
: A placeholder task that simply returns a success message.
This file will be expanded later to handle more complex task management logic, such as interacting with external systems or APIs.
- File
ssh_manager.py
– asynchronous SSH connection management:
|
|
Explanation of ssh_manager.py
- AsyncSSHManager class: This class handles SSH connections asynchronously using the
asyncssh
library.connect
: Establishes an SSH connection to a remote host using the provided hostname, username, and password.execute_command
: Executes a command on the remote server via SSH and returns the output.close_connection
: Closes the SSH connection.
This class is crucial for managing remote executions in the agent system, allowing it to run commands on remote servers securely and asynchronously.
Files inside lib directory
- File
ai.py
– code responsible for handling the Anthropic API:
|
|
Full Explanation of prompts.py
1. Key Functionalities
-
Generating prompts for the AI agent’s system
- The
prompts.py
file is responsible for generating the system prompts used by the agent in various phases of its operation, such as planning, decision-making, describing, reflecting, and generating the final answer. - Each phase has predefined rules and structures for generating the prompt.
- The
-
Dynamic adjustment of prompt content
- The prompt adjusts based on the system state (
state
), allowing the generation of context-sensitive responses tailored to the current interaction and history.
- The prompt adjusts based on the system state (
2. Functions in prompts.py
tools_instruction()
- Returns a dictionary describing instructions for the available tools, such as:
get_html_contents
: Fetch HTML content from a given URL.game_submit_form
: Submit files or data to a game.upload_text_file
: Create and upload text files.final_answer
: Generate the final answer to the user’s query.play_music
: Handle operations related to the Spotify API.
available_tools()
- Returns a list of
available tools in a simplified format used in prompts like decide_prompt
.
plan_prompt(state)
- Creates the prompt for the planning stage.
- Takes into account the current system state (
state
), including:- User messages.
- Previous actions (
actionsTaken
). - Current plan (
plan
), if any.
- The prompt describes the planning goal and the agent’s operating rules:
- Recognizing straightforward questions and responding directly.
- Creating a plan if the question requires more complex analysis.
decide_prompt(state)
- Generates the prompt for the decision-making stage.
- Considers the current plan, list of actions taken, and available tools.
- Determines the next step in the process or selects the appropriate tool.
describe_prompt(state)
- Creates the prompt for the description stage (
describe
). - Requires the
state
to define the active tool (activeTool.name
) and its instructions (activeTool.instruction
). - The prompt defines the rules for generating the appropriate data to execute the tool.
reflection_prompt(state)
- Creates the prompt for the reflection stage.
- Allows the agent to analyze the actions it has taken and suggest improvements or adjustments.
final_answer_prompt(state)
- Generates the prompt for the final answer to the user’s query.
- Takes into account:
- The initial plan (
plan
), if available. - All actions taken (
actionsTaken
). - The user’s query as the starting point.
- The initial plan (
- The prompt’s rules guide the agent to provide clear, actionable, and concise answers.
3. Key Benefits of prompts.py
-
Modularity
- Each phase of the agent’s process has a dedicated function, making the code more maintainable and extendable.
-
Dynamic Content
- Prompts are generated based on the current state of the system, providing flexibility and precision in responses.
-
Handling Complex Queries
- The system can handle both simple questions and more complex scenarios requiring multiple steps, making it adaptable to various tasks.
4. Potential Extensions
-
Adding New Tools
- New tools can easily be added by extending the
tools_instruction()
andavailable_tools()
functions.
- New tools can easily be added by extending the
-
Advanced Natural Language Handling
- Additional rules for more complex natural language structures can be incorporated to improve the agent’s understanding.
-
Better Error Logging
- A more robust error handling system can be implemented, possibly replacing the current debug
print
statements with a formal logging framework.
- A more robust error handling system can be implemented, possibly replacing the current debug
The prompts.py
file is a crucial part of the agent system, defining the structure and rules for each phase of the agent’s interaction with the user. It’s a Python adaptation of the prompts.ts
file.
- File
agent.py
– Agent’s Logic for Decision Making, Reflection, and Action Execution
|
|
Full Explanation of the agent.py
File
1. Key Functionalities
-
Managing AI Agent Stages
- The
agent.py
file implements theAIAgent
class, which controls the flow of the AI agent through the various stages:- Planning (
plan
) - Decision-making (
decide
) - Description generation (
describe
) - Action execution (
execute
) - Reflection on the result (
reflect
) - Generating the final answer (
final_answer
)
- Planning (
- The
-
Logging Progress in a Markdown File
- Every important action is logged in the
log.md
file, enabling easy tracking of the agent’s actions.
- Every important action is logged in the
2. Key Elements of the File
AIAgent
Class
- The main class responsible for handling all stages of the agent’s operation.
__init__()
- Initializing the Agent’s State
currentStage
: The current stage of processing (e.g.,plan
,decide
).currentStep
: The current step in the overall process.maxSteps
: The maximum number of steps to avoid infinite loops.messages
: The user messages that guide the agent’s behavior.actionsTaken
: A history of actions taken by the agent.api_key
: The Anthropic API key used to communicate with the AI model.
log_to_markdown()
- A function that logs the results of each stage to the
log.md
file. - It takes:
header
: The section header.content
: The content to be logged.
Asynchronous Stage Methods
- Each stage of the process is handled by a dedicated method.
plan()
- Generates an action plan based on the prompt.
- Sends a query to the AI model using the generated
plan_prompt
.
decide()
- Decides the next step or tool to use.
- Uses the
decide_prompt
to determine the best course of action. - The result is processed as JSON, which helps in precisely selecting the next tool or action.
describe()
- Generates the input (
payload
) required to execute the tool. - Uses the
describe_prompt
, and requires that the tool (activeTool
) be defined in the agent’s state.
execute()
- Executes the selected tool or action.
- Stores the action result in the state (
state['actionsTaken']
).
reflect()
- Analyzes the last action taken by the agent.
- Uses the
reflection_prompt
to suggest improvements or adjustments to the plan.
final_answer()
- Generates the final response to the user’s query.
- Uses the
final_answer_prompt
and returns the response as the result of the agent’s actions.
3. The Processing Loop in the AIAgent
Class
-
Description
- The loop iterates through a maximum of
maxSteps
steps. - The stages (
plan
,decide
,describe
,execute
,reflect
) are executed in a set order. - The loop ends when the
final_answer
stage is reached or the maximum steps are exceeded.
- The loop iterates through a maximum of
-
Error Handling
- If an error occurs at any stage, the process stops and the error is logged.
4. Key Advantages
-
Asynchronicity
- All methods are asynchronous, allowing efficient parallel processing.
-
Flexibility and Modularity
- Each stage is defined separately, making it easier to expand and modify functionalities.
-
Handling Complex Scenarios
- The agent can handle both simple user queries and more complex tasks requiring multi-step planning and reflection.
5. Potential Improvements
-
Exception Handling
- More detailed error messages could be added for each stage.
-
Advanced Logging
- Logging to separate files
or external monitoring systems (e.g., ElasticSearch, Sentry) could enhance analysis capabilities.
- Enhancing Action History
- Storing more detailed data in
actionsTaken
can aid in debugging and analyzing results.
- Storing more detailed data in
The agent.py
file is a central component of the system, managing the agent’s processing flow and integrating with the Anthropic model via asynchronous queries.
- File
tools.py
– Functions for Handling HTML Content Fetching, File Uploading, Music Playback, etc.
|
|
Explanation of tools.py
File
1. Main Functions
The tools.py
file provides implementations for various tools used in the AI agent system. Each tool is represented by a function that performs a specific task. These functions allow operations such as fetching HTML content, uploading files, and integrating with music services.
Explanation of Each Function
-
browse(url)
- Description: Fetches HTML content from the provided URL.
- Behavior:
- Sends an HTTP GET request to the provided URL.
- Converts the fetched HTML content to markdown using the
markdownify
library. - Returns the formatted result or an error message if the operation fails.
- Error Handling:
- Handles
RequestException
exceptions, returning a detailed message in case of a connection error.
- Handles
-
upload_file(data)
- Description: Uploads a text file to a remote server.
- Behavior:
- Expects a dictionary
data
containing keys:content
: The content of the file.file_name
: The name of the file.
- Uses the environment variable
UPLOAD_DOMAIN
as the endpoint for the server. - Sends a POST request with the file content as the payload.
- Returns the uploaded file’s URL if successful or an error message if the upload fails.
- Expects a dictionary
- Error Handling:
- Checks if the
UPLOAD_DOMAIN
environment variable is set. If not, it returns an error message. - Handles exceptions related to the connection or server response.
- Checks if the
-
play_music(data)
- Description: Sends a request to a music playback service.
- Behavior:
- Expects a dictionary
data
containing details for the request, such as songs to be played. - Uses the environment variable
MUSIC_URL
as the endpoint for the music service. - Sends a POST request with the music data.
- Returns the server’s response, which may contain details about the music being played.
- Expects a dictionary
- Error Handling:
- Checks if the
MUSIC_URL
environment variable is set. If not, it returns an error message. - Handles errors related to connection and server responses.
- Checks if the
2. tools
Dictionary
- Description:
- A mapping of tool names (e.g.,
"browse"
,"upload_file"
,"play_music"
) to their respective functions in Python. - This dictionary facilitates access to functions by their name, which is useful for dynamically executing tools within the AI agent.
- A mapping of tool names (e.g.,
3. Key Features
- Environment Variable Handling:
- The
upload_file
andplay_music
functions rely on environment variables (UPLOAD_DOMAIN
,MUSIC_URL
) to determine the server endpoints.
- The
- Error Handling:
- Each function includes detailed error handling, ensuring the user receives readable messages in case of issues.
- Flexibility:
- The
tools
dictionary allows easy addition of new tools or modification of existing ones.
- The
4. Key Benefits
-
Integration with External Services:
- Supports operations requiring interaction with external services, such as uploading files or playing music.
-
Data Conversion:
- The
browse
function allows automatic conversion of HTML content to markdown, which is useful for processing content.
- The
5. Potential Extensions
-
Functionality Expansion:
- New tool functions (e.g., file editing, handling other data formats) can be added.
-
Improved Logging:
- A logging system (e.g., to a file or external monitoring system) could replace simple error messages.
-
Unit Testing:
- Unit tests for each function could be added to ensure greater reliability.
The tools.py
file provides essential functions for handling tools within the AI agent system, enabling integration with various external services and facilitating data processing.
Part 4: Automation Using Ansible
1. Create Ansible Playbook – site.yml
Let’s break the playbook into logical parts and create an Ansible project where each functionality (e.g., environment setup, file copying, application configuration) will be a separate playbook or role. This will make the project flexible, easy to install, modify, and expand.
Plan
-
Ansible Project Structure:
- We will create a main Ansible project directory with subdirectories like
roles
(where we place individual Ansible roles) andplaybooks
. - We will divide tasks into roles:
- Roles for environment: Creating the virtual environment, installing packages.
- Roles for application files: Creating each application file with complete code.
- Roles for server configuration: Configuring and running the application server.
- We will create a main Ansible project directory with subdirectories like
-
Main Ansible Project Structure:
site.yml
– The main playbook file that runs all roles.roles/environment
– Role that creates the virtual environment and installs required packages.roles/application_files
– Role that creates application files with the full code.roles/server_configuration
– Role that configures and runs the application server.
Step 1: Create Directory Structure
In the Ansible project directory, execute the following steps:
|
|
In the roles
directory, create subdirectories for each role:
|
|
Step 2: Create Ansible Files for Each Role
1. Role environment
: Create Virtual Environment and Install Packages
In roles/environment/tasks/main.yml
:
|
|
2. Role application_files
: Create Application Files
In roles/application_files/tasks/main.yml
, add the full code for each application file.
|
|
3. Role server_configuration
: Server Configuration
In roles/server_configuration/tasks/main.yml
:
|
|
Step 3: Main Playbook site.yml
In the main ansible_project
directory, create site.yml
to run all roles:
|
|
Step 4: Preparing the Project ZIP Archive
Once you’ve created the full structure and added the complete code files in the appropriate places, you can create a ZIP archive:
|
|
Summary
- The project structure is divided into logical roles.
- Each role performs specific tasks, making it easier to manage and develop.
- The main
site.yml
file coordinates all roles, creating a fully functional application environment. - Once the structure is complete, you can zip the entire project and deploy it easily.
1. Running the Playbook
To run the playbook and automate the project setup process, use:
|
|
2. Debugging and Testing the Virtual Environment
If you want to verify if Ansible is correctly creating and using the virtual environment:
-
Run the following manually in the project directory to check if the process works:
1 2 3
python3 -m venv venv source venv/bin/activate # Activate the virtual environment pip install -r requirements.txt # Install dependencies
-
If there are issues, check the Ansible logs after running the playbook.
Debugging Uvicorn and Quart-based Application
Running the Server Manually
The app uses the Uvicorn server to run the Quart framework. To test its functionality:
-
Activate the virtual environment:
1
source venv/bin/activate
-
Start the application:
1
uvicorn index:app --host 0.0.0.0 --port 3000
index:app
refers to theindex.py
module and the Quartapp
instance within that file.- Port
3000
is the default. Make sure it’s available.
-
Check if the server is working:
-
Check open ports:
1
ss -tuln | grep 3000
-
Send a test HTTP request:
1
curl -X POST http://localhost:3000 -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "Hello, World!"}]}'
-
Logs and Debugging
-
Uvicorn Logs:
-
The Uvicorn server logs contain information about errors and HTTP traffic.
-
Run the app with debug logging enabled:
1
uvicorn index:app --host 0.0.0.0 --port 3000 --log-level debug
-
-
App Logs:
-
The Quart app contains debug
print
statements for most operations. Ensure the Quart debug mode is enabled:1 2
export QUART_ENV=development export QUART_DEBUG=1
-
-
Check
log.md
File:-
Check if
log.md
is properly logging data for each stage of the agent’s operation:1
tail -f log.md
-
Verifying Environment Variables
Make sure the .env
file contains the correct values:
|
|
Verify if the variables are loaded:
|
|
Verifying After Installation
-
Verify the Process:
-
Check if the Uvicorn process is running:
1
ps aux | grep uvicorn
-
-
Test Endpoint:
-
Send a request to the server:
1
curl -X POST http://localhost:3000 -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "How far is the Moon?"}]}'
-
-
Restart the App:
-
If there are issues, stop and restart the process:
1 2
pkill -f uvicorn uvicorn index:app --host 0.0.0.0 --port 3000
-
This set of steps should help with debugging and testing the Uvicorn and Quart-based application.