Using Replit AI as an Browse Scraping Agent

Firefox Browser Automation - Direct Command Execution

This document provides examples of browser automation using direct command execution with xdotool. These commands can be executed directly in the terminal for browser automation without scripts.

Key Benefit: Using direct command execution allows for quick automation without frameworks like Selenium or Playwright, and provides immediate feedback during execution.

1. Browser Launch and Navigation

Task Command
Launch Firefox with a URL firefox https://www.google.com &
Get Firefox window ID xdotool search --name "Firefox" | head -1
Get window name from ID xdotool getwindowname WINDOW_ID
Activate window by ID xdotool windowactivate WINDOW_ID
Navigate to URL (address bar) xdotool key ctrl+l && xdotool type "https://example.com" && xdotool key Return

2. Mouse and Keyboard Interaction

Task Command
Click at coordinates xdotool mousemove X Y click 1
Right-click at coordinates xdotool mousemove X Y click 3
Type text xdotool type "text to type"
Press keyboard shortcut xdotool key ctrl+t
Press multiple keys xdotool key ctrl+shift+i

3. Tab Management

Task Command
Open new tab xdotool key ctrl+t
Switch to specific tab xdotool key ctrl+1 (for first tab, ctrl+2 for second, etc.)
Close current tab xdotool key ctrl+w
Switch to next tab xdotool key ctrl+Tab
Switch to previous tab xdotool key ctrl+shift+Tab

4. Screenshot Capture

Task Command
Take full screenshot gnome-screenshot -f /tmp/screenshot.png
Take screenshot with delay sleep 3 && gnome-screenshot -f /tmp/screenshot.png
Take screenshot of window gnome-screenshot -w -f /tmp/window.png

5. Advanced Interactions

Task Command
Open context menu and select item xdotool mousemove X Y click 3 && sleep 1 && xdotool key Down Down Return
Page scroll down xdotool key Page_Down
Open developer tools xdotool key F12
Reload page xdotool key ctrl+r
Search on page xdotool key ctrl+f && xdotool type "search term" && xdotool key Return
Navigate back in history xdotool key alt+Left
Navigate forward in history xdotool key alt+Right

6. Working with Context Menu and Clipboard

Task Command
Open context menu xdotool click 3
Select all text xdotool key ctrl+a
Copy selected text xdotool key ctrl+c
Paste from clipboard xdotool key ctrl+v
Cut selected text xdotool key ctrl+x
Save page as xdotool key ctrl+s
Print page xdotool key ctrl+p

7. Handling Browser Dialogs

Task Command
Accept alert dialog xdotool key Return
Dismiss confirmation dialog xdotool key Tab Return
Cancel prompt dialog xdotool key Escape
Handle authentication dialog xdotool type "username" && xdotool key Tab && xdotool type "password" && xdotool key Return
Close a tab with unsaved changes xdotool key ctrl+w && sleep 1 && xdotool key Tab Return

8. Example Workflows

Google Search Workflow

# Launch Firefox with Google
firefox https://www.google.com &

# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --name "Firefox" | head -1)

# Focus the window
xdotool windowactivate $FIREFOX_ID

# Click on search box
xdotool mousemove 500 300 click 1

# Type search query
xdotool type "browser automation examples"

# Submit search
xdotool key Return

# Take screenshot of results
sleep 3
gnome-screenshot -f /tmp/search_results.png

Form Submission Workflow

# Launch Firefox with form page
firefox https://httpbin.org/forms/post &

# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --onlyvisible --name "Firefox" | head -1)

# Focus the window
xdotool windowactivate $FIREFOX_ID

# Wait for page to fully load
sleep 3

# Fill out Customer Name
xdotool key Tab Tab && sleep 0.5
xdotool type "John Doe"

# Select Pizza Size (Medium)
xdotool key Tab && sleep 0.5
xdotool key Down

# Select Pizza Topping (Onion)
xdotool key Tab Tab && sleep 0.5
xdotool key Down Down && sleep 0.5
xdotool key space

# Best Pizza dropdown - select "New Haven Style"
xdotool key Tab && sleep 0.5
xdotool key Down Down

# Enter comment in textarea
xdotool key Tab && sleep 0.5
xdotool type "This is a complete form submission test."

# Take screenshot before submission
gnome-screenshot -f /tmp/form_filled.png

# Tab to the Submit button and click it
xdotool key Tab Tab && sleep 0.5
xdotool key space

9. Troubleshooting Notes

  • If XGetWindowProperty[_NET_WM_DESKTOP] failed (code=1) appears, it's usually harmless and can be ignored
  • For more reliable window identification, use --onlyvisible with xdotool search
  • Allow sufficient time between commands with sleep to ensure the browser has time to respond
  • Screen coordinates are relative to the current display resolution
  • For better reliability, combine commands with && to ensure sequential execution
  • If website structure changes, tab navigation sequences may need adjustment
  • Browser version differences may affect keyboard shortcuts and dialog behavior
  • Context menu structures vary by browser, OS, and website
Important: Always include sufficient delay (sleep) between commands when automating browser actions. Browser response times can vary based on page complexity and system load.

For more detailed examples and workflows, see the GitHub repository or refer to the direct_commands.md file.

Comments

Popular posts from this blog

local LLM runners like Ollama, GPT4All, and LMStudio

Understanding Radix UI, shadcn/ui, and Component Architecture in Modern Web Development

Supabase Storage Image Uploader Guide (Agentic Oriented)