OSWorld Integration

OSWorld provides a virtual Ubuntu desktop environment for safe GUI agent experimentation.


Setup

Download VM Image

# Download pre-configured Ubuntu image
wget https://drive.usercontent.google.com/download?id=1y9FocOC0y5R78sqhD24j0P5a7KXSLFJ4 \
    -O Ubuntu.qcow2.zip

# Extract
unzip Ubuntu.qcow2.zip -d data/osworld_vm/

Start VM

qemu-system-x86_64 \
    -enable-kvm \
    -m 4096 \
    -smp 2 \
    -drive file=data/osworld_vm/Ubuntu.qcow2,format=qcow2 \
    -vnc :0 \
    -device e1000,netdev=net0 \
    -netdev user,id=net0

Screenshot Capture

from PIL import Image
import subprocess

def capture_vm_screenshot():
    """Capture screenshot via VNC"""
    # Connect to VNC display
    subprocess.run([
        "vncsnapshot",
        "localhost:5900",
        "temp_screenshot.png"
    ])
    
    return Image.open("temp_screenshot.png")

Action Execution

import pyautogui

def execute_action(action):
    """Execute action in VM"""
    if action['type'] == 'CLICK':
        pyautogui.click(action['x'], action['y'])
    
    elif action['type'] == 'TYPE':
        pyautogui.write(action['text'])
    
    elif action['type'] == 'SCROLL':
        pyautogui.scroll(-3 if action['direction'] == 'down' else 3)

VM Management

Snapshot & Restore

# Create snapshot
qemu-img snapshot -c clean_state Ubuntu.qcow2

# Restore snapshot
qemu-img snapshot -a clean_state Ubuntu.qcow2

Automation

import paramiko

def setup_vm_automation():
    """Enable SSH access to VM"""
    ssh = paramiko.SSHClient()
    ssh.connect('localhost', port=2222, username='user', password='password')
    
    # Install dependencies
    ssh.exec_command('sudo apt-get update')
    ssh.exec_command('sudo apt-get install -y python3-tk')
    
    return ssh

Best Practices

  • ✅ Use snapshots for reset between tasks
  • ✅ Set reasonable timeouts (5 min per task)
  • ✅ Monitor VM resource usage
  • ❌ Don’t run untrusted code without isolation
  • ❌ Don’t exceed VM memory limits