Task Execution
Automated task execution workflow with observe-think-act loop.
Execution Flow
sequenceDiagram
User->>Agent: Task instruction
loop Each step (max_steps)
Agent->>Env: Capture screenshot
Agent->>VLM: Screenshot + instruction
VLM->>Agent: Thinking + actions
Agent->>Env: Execute actions
Env->>Agent: New state
end
Agent->>User: Task complete
Task Examples
Good Tasks:
"Open browser and search Python"
"Create a new folder called test"
"Open terminal and type ls"
Complex Tasks:
"Open browser, visit google.com, search 'Python tutorial',
and open the first result"
Configuration
TASK_CONFIG = {
"max_steps": 15, # Maximum steps
"step_delay": 1.5, # Delay between steps (seconds)
"enable_thinking": True, # Show VLM thinking process
"use_trajectory": True # Use history context
}
Monitoring
- 📍 Step progress notifications
- 🧠 Model thinking process display
- 🤖 Action execution feedback
- ✅ Task completion status
Interrupt Task
- Press ESC key (requires accessibility permission)
- Click new task (auto-interrupts current)