Firefox Browser Automation - Direct Command Execution

Browser Launch Mouse/Keyboard Tab Management Advanced Examples Troubleshooting

This document provides examples of browser automation using direct command execution with xdotool. These commands can be executed directly in the terminal for browser automation without scripts.

Key Benefit: Using direct command execution allows for quick automation without frameworks like Selenium or Playwright, and provides immediate feedback during execution.

1. Browser Launch and Navigation

Task	Command
Launch Firefox with a URL	`firefox https://www.google.com &`
Get Firefox window ID	`xdotool search --name "Firefox" \| head -1`
Get window name from ID	`xdotool getwindowname WINDOW_ID`
Activate window by ID	`xdotool windowactivate WINDOW_ID`
Navigate to URL (address bar)	`xdotool key ctrl+l && xdotool type "https://example.com" && xdotool key Return`

2. Mouse and Keyboard Interaction

Task	Command
Click at coordinates	`xdotool mousemove X Y click 1`
Right-click at coordinates	`xdotool mousemove X Y click 3`
Type text	`xdotool type "text to type"`
Press keyboard shortcut	`xdotool key ctrl+t`
Press multiple keys	`xdotool key ctrl+shift+i`

3. Tab Management

Task	Command
Open new tab	`xdotool key ctrl+t`
Switch to specific tab	`xdotool key ctrl+1` (for first tab, ctrl+2 for second, etc.)
Close current tab	`xdotool key ctrl+w`
Switch to next tab	`xdotool key ctrl+Tab`
Switch to previous tab	`xdotool key ctrl+shift+Tab`

4. Screenshot Capture

Task	Command
Take full screenshot	`gnome-screenshot -f /tmp/screenshot.png`
Take screenshot with delay	`sleep 3 && gnome-screenshot -f /tmp/screenshot.png`
Take screenshot of window	`gnome-screenshot -w -f /tmp/window.png`

5. Advanced Interactions

Task	Command
Open context menu and select item	`xdotool mousemove X Y click 3 && sleep 1 && xdotool key Down Down Return`
Page scroll down	`xdotool key Page_Down`
Open developer tools	`xdotool key F12`
Reload page	`xdotool key ctrl+r`
Search on page	`xdotool key ctrl+f && xdotool type "search term" && xdotool key Return`
Navigate back in history	`xdotool key alt+Left`
Navigate forward in history	`xdotool key alt+Right`

6. Working with Context Menu and Clipboard

Task	Command
Open context menu	`xdotool click 3`
Select all text	`xdotool key ctrl+a`
Copy selected text	`xdotool key ctrl+c`
Paste from clipboard	`xdotool key ctrl+v`
Cut selected text	`xdotool key ctrl+x`
Save page as	`xdotool key ctrl+s`
Print page	`xdotool key ctrl+p`

7. Handling Browser Dialogs

Task	Command
Accept alert dialog	`xdotool key Return`
Dismiss confirmation dialog	`xdotool key Tab Return`
Cancel prompt dialog	`xdotool key Escape`
Handle authentication dialog	`xdotool type "username" && xdotool key Tab && xdotool type "password" && xdotool key Return`
Close a tab with unsaved changes	`xdotool key ctrl+w && sleep 1 && xdotool key Tab Return`

8. Example Workflows

Google Search Workflow

# Launch Firefox with Google
firefox https://www.google.com &

# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --name "Firefox" | head -1)

# Focus the window
xdotool windowactivate $FIREFOX_ID

# Click on search box
xdotool mousemove 500 300 click 1

# Type search query
xdotool type "browser automation examples"

# Submit search
xdotool key Return

# Take screenshot of results
sleep 3
gnome-screenshot -f /tmp/search_results.png

Form Submission Workflow

# Launch Firefox with form page
firefox https://httpbin.org/forms/post &

# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --onlyvisible --name "Firefox" | head -1)

# Focus the window
xdotool windowactivate $FIREFOX_ID

# Wait for page to fully load
sleep 3

# Fill out Customer Name
xdotool key Tab Tab && sleep 0.5
xdotool type "John Doe"

# Select Pizza Size (Medium)
xdotool key Tab && sleep 0.5
xdotool key Down

# Select Pizza Topping (Onion)
xdotool key Tab Tab && sleep 0.5
xdotool key Down Down && sleep 0.5
xdotool key space

# Best Pizza dropdown - select "New Haven Style"
xdotool key Tab && sleep 0.5
xdotool key Down Down

# Enter comment in textarea
xdotool key Tab && sleep 0.5
xdotool type "This is a complete form submission test."

# Take screenshot before submission
gnome-screenshot -f /tmp/form_filled.png

# Tab to the Submit button and click it
xdotool key Tab Tab && sleep 0.5
xdotool key space

9. Troubleshooting Notes

If XGetWindowProperty[_NET_WM_DESKTOP] failed (code=1) appears, it's usually harmless and can be ignored
For more reliable window identification, use --onlyvisible with xdotool search
Allow sufficient time between commands with sleep to ensure the browser has time to respond
Screen coordinates are relative to the current display resolution
For better reliability, combine commands with && to ensure sequential execution
If website structure changes, tab navigation sequences may need adjustment
Browser version differences may affect keyboard shortcuts and dialog behavior
Context menu structures vary by browser, OS, and website

Important: Always include sufficient delay (sleep) between commands when automating browser actions. Browser response times can vary based on page complexity and system load.

For more detailed examples and workflows, see the GitHub repository or refer to the direct_commands.md file.

Search This Blog

ResourcesForAI

Using Replit AI as an Browse Scraping Agent

Firefox Browser Automation - Direct Command Execution

1. Browser Launch and Navigation

2. Mouse and Keyboard Interaction

3. Tab Management

4. Screenshot Capture

5. Advanced Interactions

6. Working with Context Menu and Clipboard

7. Handling Browser Dialogs

8. Example Workflows

Google Search Workflow

Form Submission Workflow

9. Troubleshooting Notes

Comments

Post a Comment

Popular posts from this blog

local LLM runners like Ollama, GPT4All, and LMStudio

Understanding Radix UI, shadcn/ui, and Component Architecture in Modern Web Development

Analysis of Open-Source Solutions for Persistent Browser Sessions and Virtual Browser Orchestration