Automation Examples
Real-world automation tasks and implementations.
Example 1: Open Calculator
task = "Open the calculator application"
# VLM decides actions
actions = [
{"type": "CLICK", "x": 50, "y": 900}, # Click Activities
{"type": "WAIT", "seconds": 1},
{"type": "TYPE", "text": "calculator"},
{"type": "WAIT", "seconds": 1},
{"type": "CLICK", "x": 200, "y": 150}, # Click Calculator icon
{"type": "DONE"}
]
for action in actions:
execute_action(action)
Example 2: Create Text File
task = "Create a file called 'hello.txt' with content 'Hello World'"
# Execution trace
1. CLICK(50, 900) # Open Files app
2. WAIT(2)
3. CLICK(100, 200) # New file button
4. TYPE("hello.txt")
5. CLICK(300, 400) # Confirm
6. TYPE("Hello World")
7. CTRL+S # Save
8. DONE
Example 3: Web Search
task = "Search for 'Python tutorials' on Firefox"
def execute_web_search(query):
agent = GUIAgent()
# Open Firefox
agent.execute({"type": "CLICK", "x": firefox_icon_coords})
agent.wait(3)
# Click address bar
agent.execute({"type": "CLICK", "x": 400, "y": 100})
# Type search query
agent.execute({"type": "TYPE", "text": query})
# Press Enter
agent.execute({"type": "KEY", "key": "enter"})
return agent.capture_screenshot()
Batch Automation
tasks = [
"Open calculator and compute 25 + 37",
"Take a screenshot and save it as 'desktop.png'",
"Open Firefox and navigate to github.com"
]
for task in tasks:
print(f"Executing: {task}")
agent = GUIAgent()
success = agent.execute_task(task, max_steps=20)
print(f"Result: {'Success' if success else 'Failed'}")
Error Handling
def robust_execution(task, max_retries=3):
"""Execute task with automatic retry"""
for attempt in range(max_retries):
try:
agent = GUIAgent()
result = agent.execute_task(task)
if result.success:
return result
# Reset environment
reset_desktop_state()
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt == max_retries - 1:
raise
return None
Performance Tips
1. Reduce Screenshots
# Only capture when needed
if action_type_needs_vision(action):
screenshot = capture()
else:
screenshot = last_screenshot
2. Batch Actions
# Group actions that don't need visual feedback
batched_actions = [
{"type": "TYPE", "text": "filename"},
{"type": "KEY", "key": "enter"}
]
execute_batch(batched_actions)
3. Parallel Execution
from concurrent.futures import ThreadPoolExecutor
def execute_multiple_tasks(tasks):
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(execute_task, t) for t in tasks]
results = [f.result() for f in futures]
return results
Use Cases
| Scenario | Complexity | Success Rate |
|---|---|---|
| Open Applications | Low | 95% |
| File Operations | Medium | 85% |
| Web Navigation | Medium | 80% |
| Form Filling | High | 70% |
| Multi-App Workflows | High | 65% |
Best Practices
✅ Do:
- Start with simple, deterministic tasks
- Add verification steps
- Implement timeouts
- Log all actions for debugging
- Use VM for isolation
❌ Don’t:
- Assume UI is always in same state
- Skip error handling
- Use hardcoded coordinates (screen resolution varies)
- Execute untrusted tasks without sandboxing