Performance & Optimization

Why Performance Matters

As your Ansible infrastructure grows, playbook execution time can increase significantly. Optimizing performance reduces deployment time, improves developer productivity, and enables faster incident response.

Performance Impact Areas:
  • Parallelization: Execute tasks across multiple hosts simultaneously
  • Fact Gathering: Reduce time spent collecting system information
  • Connection Overhead: Minimize SSH connection setup time
  • Task Efficiency: Optimize individual task execution

Execution Strategies

Linear Strategy (Default)

Ansible's default strategy runs each task on all hosts before moving to the next task. All hosts wait for the slowest host to complete each task.

- hosts: all
  strategy: linear  # Default
  tasks:
    - name: Task 1 runs on all hosts first
    - name: Task 2 runs on all hosts second

Free Strategy

Each host runs as fast as it can without waiting for other hosts. Fastest execution for independent tasks.

- hosts: all
  strategy: free
  tasks:
    - name: Fast hosts complete all tasks first
    - name: Slow hosts catch up independently
Use Case: Free strategy is ideal when tasks don't depend on other hosts (package updates, log collection, backups).

Debug Strategy

Interactive debugging strategy for troubleshooting:

- hosts: all
  strategy: debug
  tasks:
    - name: Task to debug interactively

Configuration

# ansible.cfg - Set default strategy
[defaults]
strategy = free

# Or via environment variable
export ANSIBLE_STRATEGY=free

Parallelization with Forks

Forks control how many hosts Ansible manages simultaneously. Default is 5.

Increasing Forks

# ansible.cfg
[defaults]
forks = 50

# Command line
ansible-playbook playbook.yml -f 50

# Environment variable
export ANSIBLE_FORKS=50

Optimal Fork Settings

# Development/Testing (limited resources)
forks = 10

# Production (ample resources)
forks = 50-100

# Large-scale deployments (powerful control node)
forks = 100-500
Fork Considerations:
  • Each fork uses memory and CPU on control node
  • Too many forks can overwhelm the control node
  • Monitor control node resources when increasing forks
  • Network bandwidth and target system capacity also matter

Serial Execution (Rolling Updates)

Control batch sizes for rolling updates to limit risk and resource usage:

# Fixed batch size
- hosts: webservers
  serial: 5
  tasks:
    - name: Update in batches of 5

# Percentage-based
- hosts: webservers
  serial: "20%"
  tasks:
    - name: Update 20% at a time

# Progressive batches
- hosts: webservers
  serial:
    - 1    # Test on 1 host first
    - 5    # Then 5 hosts
    - 10   # Then 10 hosts
    - "30%" # Then 30% of remaining
  tasks:
    - name: Gradual rollout

Serial with max_fail_percentage

- hosts: webservers
  serial: 10
  max_fail_percentage: 20
  tasks:
    - name: Abort if >20% of batch fails

Asynchronous Tasks

Run long-running tasks in the background to avoid timeouts and enable parallelism:

Fire and Forget (poll: 0)

- name: Start long-running task
  command: /usr/bin/long_process.sh
  async: 3600  # Maximum runtime in seconds
  poll: 0      # Don't wait for completion
  register: long_task

- name: Continue with other tasks immediately
  debug:
    msg: "Task started, moving on"

# Check status later
- name: Check on async task
  async_status:
    jid: "{{ long_task.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 100
  delay: 10

Polling with Extended Timeout (poll > 0)

- name: Long task with polling
  shell: /usr/bin/process_data.sh
  async: 3600    # Allow up to 1 hour
  poll: 30       # Check every 30 seconds

Concurrent Async Tasks

# Start multiple tasks concurrently
- name: Start backup on all databases
  command: /usr/bin/backup_database.sh
  async: 7200
  poll: 0
  register: backup_tasks

- name: Continue with other work
  debug:
    msg: "Backups running in background"

# Wait for all to complete
- name: Wait for all backups
  async_status:
    jid: "{{ item.ansible_job_id }}"
  register: backup_results
  until: backup_results.finished
  retries: 120
  delay: 60
  loop: "{{ backup_tasks.results }}"

Fact Caching

Gathering facts can take 2-5 seconds per host. Cache facts to avoid repeated gathering:

Configuration

# ansible.cfg
[defaults]
gathering = smart                        # Only gather if not cached
fact_caching = jsonfile                  # Cache backend
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400             # 24 hours

# Other cache backends
fact_caching = redis
fact_caching_connection = localhost:6379:0

fact_caching = memcached
fact_caching_connection = localhost:11211

Gathering Strategies

# Don't gather facts at all
- hosts: all
  gather_facts: no
  tasks:
    - name: Tasks that don't need facts

# Gather only needed facts
- hosts: all
  gather_facts: yes
  tasks:
    - name: Gather specific facts only
      setup:
        filter: ansible_distribution*

# Parallel fact gathering
- hosts: all
  gather_facts: yes
  tasks:
    - name: Facts gathered in parallel across forks

Gathering Options

# ansible.cfg gathering options
gathering = implicit   # Always gather (default)
gathering = explicit   # Only when gather_facts: yes
gathering = smart      # Use cache if available

SSH Optimization

SSH Pipelining

Reduces SSH operations by sending commands without creating temporary files on targets:

# ansible.cfg
[defaults]
pipelining = True

[ssh_connection]
pipelining = True
Note: Pipelining requires requiretty to be disabled in /etc/sudoers on target hosts.

ControlMaster & ControlPersist

Reuse SSH connections to avoid repeated authentication:

# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r

Disable Host Key Checking (Development Only)

# ansible.cfg
[defaults]
host_key_checking = False

Task-Level Optimizations

Throttle

Limit concurrent tasks to avoid overwhelming target systems:

- name: Resource-intensive task
  command: /usr/bin/heavy_process
  throttle: 5  # Max 5 hosts concurrently

Run Once

Execute task on only one host:

- name: Database migration (only needs to run once)
  command: /usr/bin/migrate_db.sh
  run_once: true
  delegate_to: "{{ groups['databases'][0] }}"

Delegate to Localhost

Run API calls or commands from control node instead of all targets:

- name: Call API once instead of from every host
  uri:
    url: https://api.example.com/notify
    method: POST
  delegate_to: localhost
  run_once: true

Profiling and Measurement

Profile Tasks Callback

# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, timer

# Output shows task execution times
TASK [Install nginx] ********************************************************
ok: [web01] => (elapsed: 0:00:12.345)

# Summary at end
PLAY RECAP *******************************************************************
Playbook run took 0 days, 0 hours, 2 minutes, 34 seconds

Profile Roles Callback

# ansible.cfg
[defaults]
callback_whitelist = profile_roles

# Shows time per role
ROLE RECAP *****************************************************************
webserver : 0 days, 0 hours, 1 minutes, 23 seconds
database  : 0 days, 0 hours, 2 minutes, 45 seconds

Timer Callback

# ansible.cfg
[defaults]
callback_whitelist = timer

# Shows total playbook execution time
Playbook run took 0 days, 0 hours, 3 minutes, 45 seconds

Environment Variable

export ANSIBLE_CALLBACKS_ENABLED=profile_tasks,timer
ansible-playbook playbook.yml

Module-Specific Optimizations

Package Management

# Slow - Multiple package manager invocations
- name: Install packages one by one
  apt:
    name: "{{ item }}"
  loop:
    - nginx
    - postgresql
    - redis

# Fast - Single package manager invocation
- name: Install packages in one transaction
  apt:
    name:
      - nginx
      - postgresql
      - redis
    state: present

File Operations

# Use synchronize (rsync) for large file copies
- name: Fast large directory copy
  synchronize:
    src: /source/large_dir/
    dest: /dest/large_dir/
  delegate_to: "{{ inventory_hostname }}"

# Use copy module's backup feature instead of separate stat+copy
- name: Copy with built-in backup
  copy:
    src: config.conf
    dest: /etc/app/config.conf
    backup: yes

Advanced Performance Techniques

Mitogen Strategy Plugin

Mitogen is a third-party strategy that can provide 1.25x-7x speedup:

# Install
pip install mitogen

# ansible.cfg
[defaults]
strategy_plugins = /path/to/mitogen/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

# Or use free strategy
strategy = mitogen_free

Reduce Module Round-trips

# Slow - Multiple stat checks
- stat: path=/etc/file1
- stat: path=/etc/file2
- stat: path=/etc/file3

# Fast - Single find module call
- find:
    paths: /etc
    patterns: "file*"
  register: files

Pre-task Optimization

- name: Optimization pre-tasks
  hosts: all
  gather_facts: no
  tasks:
    # Gather minimal facts in parallel
    - setup:
        gather_subset: min
      when: needed_facts is defined

    # Pre-warm connection pool
    - ping:
      async: 0
      poll: 0

Best Practices Summary

  1. Increase Forks: Match your control node capacity
  2. Enable Pipelining: Reduce SSH overhead
  3. Cache Facts: Use smart gathering and caching
  4. Use Free Strategy: For independent tasks
  5. Batch with Serial: For rolling updates
  6. Async for Long Tasks: Prevent timeouts
  7. Profile Regularly: Identify slow tasks
  8. Optimize Loops: Use modules' native list support
  9. Delegate Wisely: Run API calls from control node
  10. Control Parallelism: Use throttle for resource-heavy tasks

Performance Tuning Checklist

# Optimal ansible.cfg for performance
[defaults]
forks = 50
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
host_key_checking = False
callback_whitelist = profile_tasks, timer
stdout_callback = yaml

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r

Measuring Performance

Benchmark Playbook

- name: Performance baseline
  hosts: all
  gather_facts: yes
  tasks:
    - name: Record start time
      set_fact:
        start_time: "{{ ansible_date_time.epoch }}"

    - name: Run your tasks here
      debug:
        msg: "Task execution"

    - name: Calculate duration
      debug:
        msg: "Duration: {{ ansible_date_time.epoch | int - start_time | int }} seconds"

Compare Strategies

# Test 1: Linear strategy
time ansible-playbook -f 5 playbook.yml

# Test 2: Free strategy
time ansible-playbook -f 5 playbook.yml -e "ansible_strategy=free"

# Test 3: Increased forks
time ansible-playbook -f 50 playbook.yml

# Test 4: With Mitogen
time ansible-playbook playbook.yml  # With mitogen configured

Troubleshooting Slow Playbooks

Common Slow Patterns:
  • Fact Gathering: Disable or cache if not needed
  • Loops with Modules: Use module's native list support
  • Serial Execution: Switch to free strategy if possible
  • Low Fork Count: Increase forks to match resources
  • No Pipelining: Enable SSH pipelining
  • Slow Modules: Replace shell/command with native modules
  • Excessive Logging: Reduce verbosity in production

Quick Reference

# Execution strategies
strategy: linear              # Default, sequential
strategy: free                # Parallel, independent
strategy: debug               # Interactive debugging

# Parallelization
forks = 50                    # In ansible.cfg
ansible-playbook -f 50        # Command line

# Serial batches
serial: 5                     # Fixed batch size
serial: "20%"                 # Percentage
serial: [1, 5, "50%"]         # Progressive

# Async tasks
async: 3600                   # Max runtime
poll: 0                       # Fire and forget
poll: 30                      # Poll every 30s

# Fact caching
gathering = smart             # Use cache
fact_caching = jsonfile       # Cache backend
fact_caching_timeout = 86400  # 24 hours

# SSH optimization
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

# Profiling
callback_whitelist = profile_tasks,timer

# Task controls
throttle: 5                   # Limit concurrency
run_once: true                # Run on one host only

Next Steps