Firefox Browser Automation - Direct Command Execution
This document provides examples of browser automation using direct command execution with xdotool. These commands can be executed directly in the terminal for browser automation without scripts.
Key Benefit: Using direct command execution allows for quick automation without frameworks like Selenium or Playwright, and provides immediate feedback during execution.
1. Browser Launch and Navigation
| Task |
Command |
| Launch Firefox with a URL |
firefox https://www.google.com & |
| Get Firefox window ID |
xdotool search --name "Firefox" | head -1 |
| Get window name from ID |
xdotool getwindowname WINDOW_ID |
| Activate window by ID |
xdotool windowactivate WINDOW_ID |
| Navigate to URL (address bar) |
xdotool key ctrl+l && xdotool type "https://example.com" && xdotool key Return |
2. Mouse and Keyboard Interaction
| Task |
Command |
| Click at coordinates |
xdotool mousemove X Y click 1 |
| Right-click at coordinates |
xdotool mousemove X Y click 3 |
| Type text |
xdotool type "text to type" |
| Press keyboard shortcut |
xdotool key ctrl+t |
| Press multiple keys |
xdotool key ctrl+shift+i |
3. Tab Management
| Task |
Command |
| Open new tab |
xdotool key ctrl+t |
| Switch to specific tab |
xdotool key ctrl+1 (for first tab, ctrl+2 for second, etc.) |
| Close current tab |
xdotool key ctrl+w |
| Switch to next tab |
xdotool key ctrl+Tab |
| Switch to previous tab |
xdotool key ctrl+shift+Tab |
4. Screenshot Capture
| Task |
Command |
| Take full screenshot |
gnome-screenshot -f /tmp/screenshot.png |
| Take screenshot with delay |
sleep 3 && gnome-screenshot -f /tmp/screenshot.png |
| Take screenshot of window |
gnome-screenshot -w -f /tmp/window.png |
5. Advanced Interactions
| Task |
Command |
| Open context menu and select item |
xdotool mousemove X Y click 3 && sleep 1 && xdotool key Down Down Return |
| Page scroll down |
xdotool key Page_Down |
| Open developer tools |
xdotool key F12 |
| Reload page |
xdotool key ctrl+r |
| Search on page |
xdotool key ctrl+f && xdotool type "search term" && xdotool key Return |
| Navigate back in history |
xdotool key alt+Left |
| Navigate forward in history |
xdotool key alt+Right |
6. Working with Context Menu and Clipboard
| Task |
Command |
| Open context menu |
xdotool click 3 |
| Select all text |
xdotool key ctrl+a |
| Copy selected text |
xdotool key ctrl+c |
| Paste from clipboard |
xdotool key ctrl+v |
| Cut selected text |
xdotool key ctrl+x |
| Save page as |
xdotool key ctrl+s |
| Print page |
xdotool key ctrl+p |
7. Handling Browser Dialogs
| Task |
Command |
| Accept alert dialog |
xdotool key Return |
| Dismiss confirmation dialog |
xdotool key Tab Return |
| Cancel prompt dialog |
xdotool key Escape |
| Handle authentication dialog |
xdotool type "username" && xdotool key Tab && xdotool type "password" && xdotool key Return |
| Close a tab with unsaved changes |
xdotool key ctrl+w && sleep 1 && xdotool key Tab Return |
8. Example Workflows
Google Search Workflow
# Launch Firefox with Google
firefox https://www.google.com &
# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --name "Firefox" | head -1)
# Focus the window
xdotool windowactivate $FIREFOX_ID
# Click on search box
xdotool mousemove 500 300 click 1
# Type search query
xdotool type "browser automation examples"
# Submit search
xdotool key Return
# Take screenshot of results
sleep 3
gnome-screenshot -f /tmp/search_results.png
Form Submission Workflow
# Launch Firefox with form page
firefox https://httpbin.org/forms/post &
# Wait for Firefox to load and get window ID
sleep 5
FIREFOX_ID=$(xdotool search --onlyvisible --name "Firefox" | head -1)
# Focus the window
xdotool windowactivate $FIREFOX_ID
# Wait for page to fully load
sleep 3
# Fill out Customer Name
xdotool key Tab Tab && sleep 0.5
xdotool type "John Doe"
# Select Pizza Size (Medium)
xdotool key Tab && sleep 0.5
xdotool key Down
# Select Pizza Topping (Onion)
xdotool key Tab Tab && sleep 0.5
xdotool key Down Down && sleep 0.5
xdotool key space
# Best Pizza dropdown - select "New Haven Style"
xdotool key Tab && sleep 0.5
xdotool key Down Down
# Enter comment in textarea
xdotool key Tab && sleep 0.5
xdotool type "This is a complete form submission test."
# Take screenshot before submission
gnome-screenshot -f /tmp/form_filled.png
# Tab to the Submit button and click it
xdotool key Tab Tab && sleep 0.5
xdotool key space
9. Troubleshooting Notes
- If
XGetWindowProperty[_NET_WM_DESKTOP] failed (code=1) appears, it's usually harmless and can be ignored
- For more reliable window identification, use
--onlyvisible with xdotool search
- Allow sufficient time between commands with
sleep to ensure the browser has time to respond
- Screen coordinates are relative to the current display resolution
- For better reliability, combine commands with
&& to ensure sequential execution
- If website structure changes, tab navigation sequences may need adjustment
- Browser version differences may affect keyboard shortcuts and dialog behavior
- Context menu structures vary by browser, OS, and website
Important: Always include sufficient delay (sleep) between commands when automating browser actions. Browser response times can vary based on page complexity and system load.
For more detailed examples and workflows, see the GitHub repository or refer to the direct_commands.md file.
Comments
Post a Comment