Troubleshooting Ansible

Introduction

Troubleshooting is an essential skill for working with Ansible. This comprehensive guide covers debugging techniques, common problems and solutions, performance issues, connection problems, and tools to help diagnose and resolve issues quickly.

Troubleshooting Approach:
  • Increase verbosity: Use -v, -vv, -vvv flags for more details
  • Check mode: Test changes without applying them
  • Debug module: Print variables and intermediate results
  • Syntax check: Validate playbook syntax before execution
  • Step mode: Execute tasks one at a time

Debugging Techniques

Verbosity Levels

Increase output verbosity to see more details:

# Normal execution (minimal output)
ansible-playbook playbook.yml

# -v: Show task results
ansible-playbook playbook.yml -v

# -vv: Show task input parameters
ansible-playbook playbook.yml -vv

# -vvv: Show connection debugging
ansible-playbook playbook.yml -vvv

# -vvvv: Show SSH protocol details and extra debugging
ansible-playbook playbook.yml -vvvv

# Example output with -vvv:
# Shows:
# - SSH commands executed
# - Python modules transferred
# - Module arguments
# - Connection details
# - Timing information

Debug Module

Use the debug module to inspect variables and execution flow:

---
- name: Debugging examples
  hosts: localhost
  gather_facts: yes

  vars:
    app_name: myapp
    app_port: 8080

  tasks:
    - name: Print simple message
      debug:
        msg: "Application: {{ app_name }} on port {{ app_port }}"

    - name: Print variable value
      debug:
        var: ansible_distribution

    - name: Print all variables for host
      debug:
        var: hostvars[inventory_hostname]

    - name: Print specific fact
      debug:
        var: ansible_default_ipv4.address

    - name: Conditional debugging (only with -v)
      debug:
        msg: "This appears only with verbose flag"
        verbosity: 1

    - name: Debug with verbosity level 2 (-vv required)
      debug:
        msg: "This needs -vv or higher"
        verbosity: 2

    - name: Print complex data structure
      debug:
        var: ansible_facts
        verbosity: 2

    - name: Format output nicely
      debug:
        msg: |
          Host Information:
          - Name: {{ inventory_hostname }}
          - IP: {{ ansible_default_ipv4.address }}
          - OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
          - Memory: {{ ansible_memtotal_mb }}MB

    - name: Debug registered variable
      command: uptime
      register: uptime_result

    - name: Show registered variable
      debug:
        var: uptime_result

    - name: Show specific key from result
      debug:
        msg: "Output: {{ uptime_result.stdout }}"

Check Mode (Dry Run)

Test playbooks without making actual changes:

# Run in check mode
ansible-playbook playbook.yml --check

# Check mode with diff (show what would change)
ansible-playbook playbook.yml --check --diff

# Example output with --diff shows file changes:
# --- before: /etc/nginx/nginx.conf
# +++ after: /etc/nginx/nginx.conf
# @@ -1,3 +1,4 @@
# +worker_processes 4;
#  user nginx;

# Some modules don't support check mode
---
- name: Check mode example
  hosts: all
  tasks:
    - name: This works in check mode
      file:
        path: /tmp/test
        state: directory
      check_mode: yes

    - name: This always runs (even in check mode)
      command: echo "Always runs"
      check_mode: no

    - name: Skip in check mode
      command: /usr/local/bin/risky-operation
      when: not ansible_check_mode

Step Mode

Execute tasks interactively, one at a time:

# Run in step mode
ansible-playbook playbook.yml --step

# For each task, you'll be prompted:
# Perform task: Install packages (y/n/c):
# y = yes, execute this task
# n = no, skip this task
# c = continue, execute remaining tasks without prompting

# Start at specific task
ansible-playbook playbook.yml --start-at-task="Configure service"

# Limit to specific hosts
ansible-playbook playbook.yml --limit "web01,web02"

# Combine options
ansible-playbook playbook.yml --step --check --diff -vv

Syntax Checking

Validate playbook syntax without executing:

# Check syntax
ansible-playbook playbook.yml --syntax-check

# Example error:
# ERROR! Syntax Error while loading YAML.
#   expected , but found ''
# The error appears to be in '/path/to/playbook.yml': line 15, column 3

# Check inventory syntax
ansible-inventory -i inventory --list

# Validate specific files
ansible-playbook site.yml --syntax-check
ansible-playbook roles/webserver/tasks/main.yml --syntax-check

# Lint playbooks with ansible-lint
pip install ansible-lint
ansible-lint playbook.yml

# Example ansible-lint output:
# [201] Trailing whitespace
# playbook.yml:23
#     name: Install packages

# [206] Variables should have spaces before and after: {{ var_name }}
# playbook.yml:45
# msg: "Value is {{my_var}}"

Common Errors and Solutions

Connection Issues

SSH Connection Refused

Error: Failed to connect to the host via ssh: Connection refused

Solutions:

  • Verify SSH service is running: systemctl status sshd
  • Check firewall allows SSH: firewall-cmd --list-ports
  • Verify correct hostname/IP: ping hostname
  • Check SSH port: ansible_port=2222 if non-standard
  • Test manual SSH: ssh -vvv user@host
# Debug SSH connection
ansible all -m ping -vvv

# Test with specific user
ansible all -m ping -u myuser

# Test with specific key
ansible all -m ping --private-key=~/.ssh/mykey

# Bypass host key checking (development only)
export ANSIBLE_HOST_KEY_CHECKING=False
ansible all -m ping

# Or in ansible.cfg
[defaults]
host_key_checking = False

Permission Denied Errors

Permission Denied

Error: Permission denied or You need to be root

Solutions:

---
- name: Fix permission issues
  hosts: all
  become: yes  # Enable privilege escalation

  tasks:
    - name: This requires root
      yum:
        name: httpd
        state: present

# Check sudo configuration on remote host
ansible all -m shell -a "sudo -l" -b

# Test with password prompt
ansible-playbook playbook.yml --ask-become-pass

# Configure sudoers on remote host
# /etc/sudoers.d/ansible
ansible ALL=(ALL) NOPASSWD: ALL

# Or for specific commands
ansible ALL=(ALL) NOPASSWD: /usr/bin/systemctl, /usr/bin/yum

# Disable requiretty for pipelining
Defaults:ansible !requiretty

Module Not Found

Module Not Found

Error: The module xyz was not found

# List available modules
ansible-doc -l

# Check if module exists
ansible-doc module_name

# Install collection containing module
ansible-galaxy collection install community.general

# Install required Python library on control node
pip install paramiko  # For SSH
pip install pywinrm   # For Windows
pip install boto3     # For AWS modules

# Install Python library on managed nodes
---
- name: Install required Python modules
  hosts: all
  become: yes
  tasks:
    - name: Install pip
      package:
        name: python3-pip
        state: present

    - name: Install Python library
      pip:
        name: docker
        state: present

# Check Python interpreter
ansible all -m setup -a "filter=ansible_python*"

# Set Python interpreter
ansible_python_interpreter=/usr/bin/python3

Variable Not Defined

Undefined Variable

Error: 'my_variable' is undefined

---
- name: Handle undefined variables
  hosts: all
  tasks:
    # Use default filter
    - name: Variable with default
      debug:
        msg: "Port: {{ app_port | default(8080) }}"

    # Check if defined
    - name: Only run if defined
      debug:
        msg: "Database: {{ db_host }}"
      when: db_host is defined

    # Fail if not defined
    - name: Ensure variable is defined
      assert:
        that:
          - required_var is defined
          - required_var | length > 0
        fail_msg: "required_var must be defined"

    # Set default in play
    - name: Set defaults
      set_fact:
        app_port: "{{ app_port | default(8080) }}"
        app_host: "{{ app_host | default('localhost') }}"

    # Debug all variables
    - name: Show all variables
      debug:
        var: vars
        verbosity: 2

# Check variable precedence
ansible-playbook playbook.yml -e "my_var=value" -vv

YAML Syntax Errors

YAML Syntax Error

Error: Syntax Error while loading YAML

# Common YAML mistakes:

# 1. Incorrect indentation
tasks:
  - name: Task one
    debug:
      msg: "Hello"
   - name: Task two  # Wrong indentation
     debug:
       msg: "World"

# 2. Missing colon
tasks
  - name: Install package  # Missing colon after tasks
    yum:
      name httpd  # Missing colon after name

# 3. Mixing tabs and spaces (use spaces only!)
# Set editor to insert spaces: set expandtab in vim

# 4. Unquoted special characters
tasks:
  - name: Use quotes
    debug:
      msg: "Value: {{ my_var }}"  # Correct
      # msg: {{ my_var }}  # Wrong - needs quotes

# 5. Incorrect list syntax
packages:
  - nginx
  - mysql
  - redis
# Not: packages: nginx, mysql, redis

# Validate YAML
python -m yaml playbook.yml

# Use yamllint
pip install yamllint
yamllint playbook.yml

# .yamllint configuration
---
extends: default
rules:
  line-length:
    max: 120
  indentation:
    spaces: 2

Performance Issues

Slow Playbook Execution

# Profile playbook performance
ansible-playbook playbook.yml \
  --callback-whitelist=timer,profile_tasks,profile_roles

# Enable SSH pipelining (huge speedup)
[ssh_connection]
pipelining = True

# Requires: Defaults:ansible !requiretty in /etc/sudoers

# Increase parallel execution
[defaults]
forks = 50

# Use fact caching
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/facts_cache
fact_caching_timeout = 86400

# Disable fact gathering when not needed
---
- name: Fast playbook
  hosts: all
  gather_facts: no
  tasks:
    - name: Quick task
      ping:

# Use async for long-running tasks
- name: Long running task
  command: /usr/bin/long_running_operation
  async: 3600
  poll: 0
  register: long_task

- name: Check on async task later
  async_status:
    jid: "{{ long_task.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 100
  delay: 10

# Use strategy plugins
- name: Free strategy (don't wait for all hosts)
  hosts: all
  strategy: free
  tasks:
    - name: Independent task
      command: /usr/bin/task

Identifying Slow Tasks

# Enable profiling
[defaults]
callback_whitelist = profile_tasks

# Output shows:
# TASK [Install packages] ********************
# ok: [host1] => (item=nginx)     15.23s
# ok: [host2] => (item=nginx)     14.87s

# Analyze profile output
ansible-playbook site.yml 2>&1 | \
  grep -A1 "TASK \[" | \
  grep -E "^[0-9]" | \
  sort -rn | \
  head -10

# Monitor system resources during execution
---
- name: Monitor performance
  hosts: all
  tasks:
    - name: Capture start metrics
      shell: |
        free -m
        uptime
        ps aux | grep ansible
      register: start_metrics
      delegate_to: localhost

    - name: Your tasks here
      include_tasks: main_tasks.yml

    - name: Capture end metrics
      shell: |
        free -m
        uptime
        ps aux | grep ansible
      register: end_metrics
      delegate_to: localhost

    - name: Compare metrics
      debug:
        msg: "{{ start_metrics.stdout }} vs {{ end_metrics.stdout }}"

Debugging Complex Issues

Using Debugger

---
- name: Interactive debugging
  hosts: localhost
  gather_facts: no
  debugger: on_failed  # Options: always, never, on_failed, on_unreachable, on_skipped

  tasks:
    - name: Task that might fail
      command: /bin/false
      # When this fails, drops into debugger

# Debugger commands:
# p task                 - Print task
# p task.args            - Print task arguments
# p task_vars            - Print all variables
# p task_vars['my_var']  - Print specific variable
# task.args['arg'] = value - Modify task argument
# r                      - Redo task
# c                      - Continue
# q                      - Quit debugger

# Enable debugger in ansible.cfg
[defaults]
enable_task_debugger = True

# Use with strategy
- name: Debug with strategy
  hosts: all
  strategy: debug
  tasks:
    - name: Debug this
      shell: echo "test"

Tracing Execution Flow

---
- name: Trace execution
  hosts: all
  tasks:
    - name: Log task start
      debug:
        msg: "=== Starting task: {{ ansible_play_name }} ==="
      tags: always

    - name: Show current host
      debug:
        msg: "Processing {{ inventory_hostname }}"

    - name: Show variables in scope
      debug:
        var: vars
        verbosity: 2

    - name: Conditional with debug
      block:
        - name: Risky operation
          command: /usr/bin/risky-op
          register: result

        - name: Debug result
          debug:
            var: result
      rescue:
        - name: Error occurred
          debug:
            msg: "Failed: {{ ansible_failed_result }}"

        - name: Show error details
          debug:
            var: ansible_failed_result
      always:
        - name: Cleanup debug
          debug:
            msg: "Task completed"

# Log to file
- name: Log to file
  hosts: all
  tasks:
    - name: Log execution
      lineinfile:
        path: /tmp/ansible-debug.log
        line: "{{ ansible_date_time.iso8601 }} - {{ inventory_hostname }} - {{ ansible_play_name }}"
        create: yes
      delegate_to: localhost

Testing Conditions

---
- name: Test conditionals
  hosts: localhost
  gather_facts: yes

  vars:
    test_var: "hello"
    test_number: 42
    test_list: [1, 2, 3]

  tasks:
    - name: Test string
      debug:
        msg: "String test: {{ test_var is string }}"

    - name: Test number
      debug:
        msg: "Number test: {{ test_number is number }}"

    - name: Test list
      debug:
        msg: "List test: {{ test_list is iterable }}"

    - name: Test version
      debug:
        msg: "Version test: {{ ansible_version.full is version('2.9', '>=') }}"

    - name: Test defined
      debug:
        msg: "Defined test: {{ undefined_var is defined }}"

    - name: Test facts
      debug:
        msg: |
          OS Family: {{ ansible_os_family }}
          Distribution: {{ ansible_distribution }}
          Version: {{ ansible_distribution_version }}
          Is RedHat: {{ ansible_os_family == 'RedHat' }}
          Is Debian: {{ ansible_os_family == 'Debian' }}

    - name: Assert expectations
      assert:
        that:
          - ansible_os_family in ['RedHat', 'Debian']
          - ansible_memtotal_mb >= 1024
          - ansible_processor_vcpus >= 1
        fail_msg: "System requirements not met"
        success_msg: "System checks passed"

Network and Connectivity

Testing Connectivity

# Test ping
ansible all -m ping

# Test with specific user
ansible all -m ping -u deploy

# Test with sudo
ansible all -m ping -b

# Test SSH connectivity
ansible all -m shell -a "echo 'SSH works'"

# Detailed connection test
ansible all -m setup -a "filter=ansible_hostname" -vvv

# Test from playbook
---
- name: Connectivity tests
  hosts: all
  gather_facts: no
  tasks:
    - name: Ping test
      ping:
      register: ping_result

    - name: Show ping result
      debug:
        var: ping_result

    - name: Test DNS resolution
      command: nslookup google.com
      register: dns_test

    - name: Test outbound connectivity
      uri:
        url: https://www.google.com
        timeout: 10
      register: http_test

    - name: Report results
      debug:
        msg: |
          Ping: {{ 'OK' if ping_result is succeeded else 'FAILED' }}
          DNS: {{ 'OK' if dns_test.rc == 0 else 'FAILED' }}
          HTTP: {{ 'OK' if http_test.status == 200 else 'FAILED' }}

Proxy and Firewall Issues

# Configure proxy
export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
export no_proxy=localhost,127.0.0.1,.local

# In playbook
---
- name: Use proxy
  hosts: all
  environment:
    http_proxy: http://proxy.example.com:8080
    https_proxy: http://proxy.example.com:8080
  tasks:
    - name: Download with proxy
      get_url:
        url: https://example.com/file
        dest: /tmp/file

# Test firewall
- name: Check firewall
  hosts: all
  tasks:
    - name: Check if port is open
      wait_for:
        host: "{{ target_host }}"
        port: "{{ target_port }}"
        timeout: 5
      register: port_check
      ignore_errors: yes

    - name: Report port status
      debug:
        msg: "Port {{ target_port }} is {{ 'OPEN' if port_check is succeeded else 'CLOSED' }}"

Logging for Troubleshooting

Enable Detailed Logging

# ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log
callback_whitelist = log_plays, timer, profile_tasks

# View logs in real-time
tail -f /var/log/ansible/ansible.log

# Search for errors
grep -i error /var/log/ansible/ansible.log
grep -i failed /var/log/ansible/ansible.log

# Analyze specific playbook run
grep "PLAY \[My Playbook\]" /var/log/ansible/ansible.log -A 100

# Create detailed debug log
ansible-playbook site.yml -vvvv 2>&1 | tee debug.log

# Log with timestamps
ansible-playbook site.yml -vv 2>&1 | while IFS= read -r line; do
  echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
done | tee timestamped.log

Common Troubleshooting Commands

# Check inventory
ansible-inventory --list
ansible-inventory --graph
ansible-inventory --host hostname

# Test module execution
ansible localhost -m debug -a "msg='Test'"
ansible all -m setup
ansible all -m command -a "uptime"

# Verify configuration
ansible-config dump
ansible-config view
ansible --version

# Test connection
ansible all -m ping -vvv
ansible all -m shell -a "hostname" -o

# List available modules
ansible-doc -l
ansible-doc -t lookup -l
ansible-doc -t callback -l

# Check facts
ansible hostname -m setup
ansible hostname -m setup -a "filter=ansible_distribution*"

# Test playbook
ansible-playbook playbook.yml --syntax-check
ansible-playbook playbook.yml --check
ansible-playbook playbook.yml --list-tasks
ansible-playbook playbook.yml --list-tags
ansible-playbook playbook.yml --list-hosts

# Debug playbook
ansible-playbook playbook.yml -vvv
ansible-playbook playbook.yml --step
ansible-playbook playbook.yml --start-at-task="Task name"
ansible-playbook playbook.yml --limit "host1,host2"

Troubleshooting Checklist

Troubleshooting Checklist:
  1. Verify connectivity: Can you SSH to the host?
  2. Check syntax: Run --syntax-check
  3. Increase verbosity: Add -v, -vv, or -vvv flags
  4. Test in check mode: Use --check --diff
  5. Check permissions: Verify sudo/become settings
  6. Review logs: Check ansible.log for errors
  7. Verify variables: Use debug module to print values
  8. Check facts: Run setup module to verify facts
  9. Test modules: Run ad-hoc commands to test
  10. Review documentation: Check ansible-doc for module details
  11. Check versions: Ensure Ansible and Python versions are compatible
  12. Isolate problem: Use --limit and --step to narrow down issue

Getting Help

Resources for Help:
  • Documentation: ansible-doc module_name
  • Official Docs: https://docs.ansible.com
  • Community: Ansible mailing list and IRC #ansible
  • Stack Overflow: Tag your questions with [ansible]
  • GitHub: Report bugs at github.com/ansible/ansible
  • Galaxy: Check role documentation at galaxy.ansible.com

Next Steps