Performance & Optimization
Why Performance Matters
As your Ansible infrastructure grows, playbook execution time can increase significantly. Optimizing performance reduces deployment time, improves developer productivity, and enables faster incident response.
- Parallelization: Execute tasks across multiple hosts simultaneously
- Fact Gathering: Reduce time spent collecting system information
- Connection Overhead: Minimize SSH connection setup time
- Task Efficiency: Optimize individual task execution
Execution Strategies
Linear Strategy (Default)
Ansible's default strategy runs each task on all hosts before moving to the next task. All hosts wait for the slowest host to complete each task.
- hosts: all
strategy: linear # Default
tasks:
- name: Task 1 runs on all hosts first
- name: Task 2 runs on all hosts second
Free Strategy
Each host runs as fast as it can without waiting for other hosts. Fastest execution for independent tasks.
- hosts: all
strategy: free
tasks:
- name: Fast hosts complete all tasks first
- name: Slow hosts catch up independently
Debug Strategy
Interactive debugging strategy for troubleshooting:
- hosts: all
strategy: debug
tasks:
- name: Task to debug interactively
Configuration
# ansible.cfg - Set default strategy
[defaults]
strategy = free
# Or via environment variable
export ANSIBLE_STRATEGY=free
Parallelization with Forks
Forks control how many hosts Ansible manages simultaneously. Default is 5.
Increasing Forks
# ansible.cfg
[defaults]
forks = 50
# Command line
ansible-playbook playbook.yml -f 50
# Environment variable
export ANSIBLE_FORKS=50
Optimal Fork Settings
# Development/Testing (limited resources)
forks = 10
# Production (ample resources)
forks = 50-100
# Large-scale deployments (powerful control node)
forks = 100-500
- Each fork uses memory and CPU on control node
- Too many forks can overwhelm the control node
- Monitor control node resources when increasing forks
- Network bandwidth and target system capacity also matter
Serial Execution (Rolling Updates)
Control batch sizes for rolling updates to limit risk and resource usage:
# Fixed batch size
- hosts: webservers
serial: 5
tasks:
- name: Update in batches of 5
# Percentage-based
- hosts: webservers
serial: "20%"
tasks:
- name: Update 20% at a time
# Progressive batches
- hosts: webservers
serial:
- 1 # Test on 1 host first
- 5 # Then 5 hosts
- 10 # Then 10 hosts
- "30%" # Then 30% of remaining
tasks:
- name: Gradual rollout
Serial with max_fail_percentage
- hosts: webservers
serial: 10
max_fail_percentage: 20
tasks:
- name: Abort if >20% of batch fails
Asynchronous Tasks
Run long-running tasks in the background to avoid timeouts and enable parallelism:
Fire and Forget (poll: 0)
- name: Start long-running task
command: /usr/bin/long_process.sh
async: 3600 # Maximum runtime in seconds
poll: 0 # Don't wait for completion
register: long_task
- name: Continue with other tasks immediately
debug:
msg: "Task started, moving on"
# Check status later
- name: Check on async task
async_status:
jid: "{{ long_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 100
delay: 10
Polling with Extended Timeout (poll > 0)
- name: Long task with polling
shell: /usr/bin/process_data.sh
async: 3600 # Allow up to 1 hour
poll: 30 # Check every 30 seconds
Concurrent Async Tasks
# Start multiple tasks concurrently
- name: Start backup on all databases
command: /usr/bin/backup_database.sh
async: 7200
poll: 0
register: backup_tasks
- name: Continue with other work
debug:
msg: "Backups running in background"
# Wait for all to complete
- name: Wait for all backups
async_status:
jid: "{{ item.ansible_job_id }}"
register: backup_results
until: backup_results.finished
retries: 120
delay: 60
loop: "{{ backup_tasks.results }}"
Fact Caching
Gathering facts can take 2-5 seconds per host. Cache facts to avoid repeated gathering:
Configuration
# ansible.cfg
[defaults]
gathering = smart # Only gather if not cached
fact_caching = jsonfile # Cache backend
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400 # 24 hours
# Other cache backends
fact_caching = redis
fact_caching_connection = localhost:6379:0
fact_caching = memcached
fact_caching_connection = localhost:11211
Gathering Strategies
# Don't gather facts at all
- hosts: all
gather_facts: no
tasks:
- name: Tasks that don't need facts
# Gather only needed facts
- hosts: all
gather_facts: yes
tasks:
- name: Gather specific facts only
setup:
filter: ansible_distribution*
# Parallel fact gathering
- hosts: all
gather_facts: yes
tasks:
- name: Facts gathered in parallel across forks
Gathering Options
# ansible.cfg gathering options
gathering = implicit # Always gather (default)
gathering = explicit # Only when gather_facts: yes
gathering = smart # Use cache if available
SSH Optimization
SSH Pipelining
Reduces SSH operations by sending commands without creating temporary files on targets:
# ansible.cfg
[defaults]
pipelining = True
[ssh_connection]
pipelining = True
requiretty to be disabled in /etc/sudoers on target hosts.
ControlMaster & ControlPersist
Reuse SSH connections to avoid repeated authentication:
# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r
Disable Host Key Checking (Development Only)
# ansible.cfg
[defaults]
host_key_checking = False
Task-Level Optimizations
Throttle
Limit concurrent tasks to avoid overwhelming target systems:
- name: Resource-intensive task
command: /usr/bin/heavy_process
throttle: 5 # Max 5 hosts concurrently
Run Once
Execute task on only one host:
- name: Database migration (only needs to run once)
command: /usr/bin/migrate_db.sh
run_once: true
delegate_to: "{{ groups['databases'][0] }}"
Delegate to Localhost
Run API calls or commands from control node instead of all targets:
- name: Call API once instead of from every host
uri:
url: https://api.example.com/notify
method: POST
delegate_to: localhost
run_once: true
Profiling and Measurement
Profile Tasks Callback
# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, timer
# Output shows task execution times
TASK [Install nginx] ********************************************************
ok: [web01] => (elapsed: 0:00:12.345)
# Summary at end
PLAY RECAP *******************************************************************
Playbook run took 0 days, 0 hours, 2 minutes, 34 seconds
Profile Roles Callback
# ansible.cfg
[defaults]
callback_whitelist = profile_roles
# Shows time per role
ROLE RECAP *****************************************************************
webserver : 0 days, 0 hours, 1 minutes, 23 seconds
database : 0 days, 0 hours, 2 minutes, 45 seconds
Timer Callback
# ansible.cfg
[defaults]
callback_whitelist = timer
# Shows total playbook execution time
Playbook run took 0 days, 0 hours, 3 minutes, 45 seconds
Environment Variable
export ANSIBLE_CALLBACKS_ENABLED=profile_tasks,timer
ansible-playbook playbook.yml
Module-Specific Optimizations
Package Management
# Slow - Multiple package manager invocations
- name: Install packages one by one
apt:
name: "{{ item }}"
loop:
- nginx
- postgresql
- redis
# Fast - Single package manager invocation
- name: Install packages in one transaction
apt:
name:
- nginx
- postgresql
- redis
state: present
File Operations
# Use synchronize (rsync) for large file copies
- name: Fast large directory copy
synchronize:
src: /source/large_dir/
dest: /dest/large_dir/
delegate_to: "{{ inventory_hostname }}"
# Use copy module's backup feature instead of separate stat+copy
- name: Copy with built-in backup
copy:
src: config.conf
dest: /etc/app/config.conf
backup: yes
Advanced Performance Techniques
Mitogen Strategy Plugin
Mitogen is a third-party strategy that can provide 1.25x-7x speedup:
# Install
pip install mitogen
# ansible.cfg
[defaults]
strategy_plugins = /path/to/mitogen/ansible_mitogen/plugins/strategy
strategy = mitogen_linear
# Or use free strategy
strategy = mitogen_free
Reduce Module Round-trips
# Slow - Multiple stat checks
- stat: path=/etc/file1
- stat: path=/etc/file2
- stat: path=/etc/file3
# Fast - Single find module call
- find:
paths: /etc
patterns: "file*"
register: files
Pre-task Optimization
- name: Optimization pre-tasks
hosts: all
gather_facts: no
tasks:
# Gather minimal facts in parallel
- setup:
gather_subset: min
when: needed_facts is defined
# Pre-warm connection pool
- ping:
async: 0
poll: 0
Best Practices Summary
- Increase Forks: Match your control node capacity
- Enable Pipelining: Reduce SSH overhead
- Cache Facts: Use smart gathering and caching
- Use Free Strategy: For independent tasks
- Batch with Serial: For rolling updates
- Async for Long Tasks: Prevent timeouts
- Profile Regularly: Identify slow tasks
- Optimize Loops: Use modules' native list support
- Delegate Wisely: Run API calls from control node
- Control Parallelism: Use throttle for resource-heavy tasks
Performance Tuning Checklist
# Optimal ansible.cfg for performance
[defaults]
forks = 50
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
host_key_checking = False
callback_whitelist = profile_tasks, timer
stdout_callback = yaml
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r
Measuring Performance
Benchmark Playbook
- name: Performance baseline
hosts: all
gather_facts: yes
tasks:
- name: Record start time
set_fact:
start_time: "{{ ansible_date_time.epoch }}"
- name: Run your tasks here
debug:
msg: "Task execution"
- name: Calculate duration
debug:
msg: "Duration: {{ ansible_date_time.epoch | int - start_time | int }} seconds"
Compare Strategies
# Test 1: Linear strategy
time ansible-playbook -f 5 playbook.yml
# Test 2: Free strategy
time ansible-playbook -f 5 playbook.yml -e "ansible_strategy=free"
# Test 3: Increased forks
time ansible-playbook -f 50 playbook.yml
# Test 4: With Mitogen
time ansible-playbook playbook.yml # With mitogen configured
Troubleshooting Slow Playbooks
- Fact Gathering: Disable or cache if not needed
- Loops with Modules: Use module's native list support
- Serial Execution: Switch to free strategy if possible
- Low Fork Count: Increase forks to match resources
- No Pipelining: Enable SSH pipelining
- Slow Modules: Replace shell/command with native modules
- Excessive Logging: Reduce verbosity in production
Quick Reference
# Execution strategies
strategy: linear # Default, sequential
strategy: free # Parallel, independent
strategy: debug # Interactive debugging
# Parallelization
forks = 50 # In ansible.cfg
ansible-playbook -f 50 # Command line
# Serial batches
serial: 5 # Fixed batch size
serial: "20%" # Percentage
serial: [1, 5, "50%"] # Progressive
# Async tasks
async: 3600 # Max runtime
poll: 0 # Fire and forget
poll: 30 # Poll every 30s
# Fact caching
gathering = smart # Use cache
fact_caching = jsonfile # Cache backend
fact_caching_timeout = 86400 # 24 hours
# SSH optimization
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
# Profiling
callback_whitelist = profile_tasks,timer
# Task controls
throttle: 5 # Limit concurrency
run_once: true # Run on one host only
Next Steps
- Learn about Testing & Debugging to validate optimizations
- Explore Best Practices for efficient playbooks
- Master CI/CD Integration for automated deployments
- Try the Playground to experiment with strategies