Troubleshooting Ansible
Introduction
Troubleshooting is an essential skill for working with Ansible. This comprehensive guide covers debugging techniques, common problems and solutions, performance issues, connection problems, and tools to help diagnose and resolve issues quickly.
Troubleshooting Approach:
- Increase verbosity: Use -v, -vv, -vvv flags for more details
- Check mode: Test changes without applying them
- Debug module: Print variables and intermediate results
- Syntax check: Validate playbook syntax before execution
- Step mode: Execute tasks one at a time
Debugging Techniques
Verbosity Levels
Increase output verbosity to see more details:
# Normal execution (minimal output)
ansible-playbook playbook.yml
# -v: Show task results
ansible-playbook playbook.yml -v
# -vv: Show task input parameters
ansible-playbook playbook.yml -vv
# -vvv: Show connection debugging
ansible-playbook playbook.yml -vvv
# -vvvv: Show SSH protocol details and extra debugging
ansible-playbook playbook.yml -vvvv
# Example output with -vvv:
# Shows:
# - SSH commands executed
# - Python modules transferred
# - Module arguments
# - Connection details
# - Timing information
Debug Module
Use the debug module to inspect variables and execution flow:
---
- name: Debugging examples
hosts: localhost
gather_facts: yes
vars:
app_name: myapp
app_port: 8080
tasks:
- name: Print simple message
debug:
msg: "Application: {{ app_name }} on port {{ app_port }}"
- name: Print variable value
debug:
var: ansible_distribution
- name: Print all variables for host
debug:
var: hostvars[inventory_hostname]
- name: Print specific fact
debug:
var: ansible_default_ipv4.address
- name: Conditional debugging (only with -v)
debug:
msg: "This appears only with verbose flag"
verbosity: 1
- name: Debug with verbosity level 2 (-vv required)
debug:
msg: "This needs -vv or higher"
verbosity: 2
- name: Print complex data structure
debug:
var: ansible_facts
verbosity: 2
- name: Format output nicely
debug:
msg: |
Host Information:
- Name: {{ inventory_hostname }}
- IP: {{ ansible_default_ipv4.address }}
- OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
- Memory: {{ ansible_memtotal_mb }}MB
- name: Debug registered variable
command: uptime
register: uptime_result
- name: Show registered variable
debug:
var: uptime_result
- name: Show specific key from result
debug:
msg: "Output: {{ uptime_result.stdout }}"
Check Mode (Dry Run)
Test playbooks without making actual changes:
# Run in check mode
ansible-playbook playbook.yml --check
# Check mode with diff (show what would change)
ansible-playbook playbook.yml --check --diff
# Example output with --diff shows file changes:
# --- before: /etc/nginx/nginx.conf
# +++ after: /etc/nginx/nginx.conf
# @@ -1,3 +1,4 @@
# +worker_processes 4;
# user nginx;
# Some modules don't support check mode
---
- name: Check mode example
hosts: all
tasks:
- name: This works in check mode
file:
path: /tmp/test
state: directory
check_mode: yes
- name: This always runs (even in check mode)
command: echo "Always runs"
check_mode: no
- name: Skip in check mode
command: /usr/local/bin/risky-operation
when: not ansible_check_mode
Step Mode
Execute tasks interactively, one at a time:
# Run in step mode
ansible-playbook playbook.yml --step
# For each task, you'll be prompted:
# Perform task: Install packages (y/n/c):
# y = yes, execute this task
# n = no, skip this task
# c = continue, execute remaining tasks without prompting
# Start at specific task
ansible-playbook playbook.yml --start-at-task="Configure service"
# Limit to specific hosts
ansible-playbook playbook.yml --limit "web01,web02"
# Combine options
ansible-playbook playbook.yml --step --check --diff -vv
Syntax Checking
Validate playbook syntax without executing:
# Check syntax
ansible-playbook playbook.yml --syntax-check
# Example error:
# ERROR! Syntax Error while loading YAML.
# expected , but found ''
# The error appears to be in '/path/to/playbook.yml': line 15, column 3
# Check inventory syntax
ansible-inventory -i inventory --list
# Validate specific files
ansible-playbook site.yml --syntax-check
ansible-playbook roles/webserver/tasks/main.yml --syntax-check
# Lint playbooks with ansible-lint
pip install ansible-lint
ansible-lint playbook.yml
# Example ansible-lint output:
# [201] Trailing whitespace
# playbook.yml:23
# name: Install packages
# [206] Variables should have spaces before and after: {{ var_name }}
# playbook.yml:45
# msg: "Value is {{my_var}}"
Common Errors and Solutions
Connection Issues
SSH Connection Refused
Error: Failed to connect to the host via ssh: Connection refused
Solutions:
- Verify SSH service is running:
systemctl status sshd - Check firewall allows SSH:
firewall-cmd --list-ports - Verify correct hostname/IP:
ping hostname - Check SSH port:
ansible_port=2222if non-standard - Test manual SSH:
ssh -vvv user@host
# Debug SSH connection
ansible all -m ping -vvv
# Test with specific user
ansible all -m ping -u myuser
# Test with specific key
ansible all -m ping --private-key=~/.ssh/mykey
# Bypass host key checking (development only)
export ANSIBLE_HOST_KEY_CHECKING=False
ansible all -m ping
# Or in ansible.cfg
[defaults]
host_key_checking = False
Permission Denied Errors
Permission Denied
Error: Permission denied or You need to be root
Solutions:
---
- name: Fix permission issues
hosts: all
become: yes # Enable privilege escalation
tasks:
- name: This requires root
yum:
name: httpd
state: present
# Check sudo configuration on remote host
ansible all -m shell -a "sudo -l" -b
# Test with password prompt
ansible-playbook playbook.yml --ask-become-pass
# Configure sudoers on remote host
# /etc/sudoers.d/ansible
ansible ALL=(ALL) NOPASSWD: ALL
# Or for specific commands
ansible ALL=(ALL) NOPASSWD: /usr/bin/systemctl, /usr/bin/yum
# Disable requiretty for pipelining
Defaults:ansible !requiretty
Module Not Found
Module Not Found
Error: The module xyz was not found
# List available modules
ansible-doc -l
# Check if module exists
ansible-doc module_name
# Install collection containing module
ansible-galaxy collection install community.general
# Install required Python library on control node
pip install paramiko # For SSH
pip install pywinrm # For Windows
pip install boto3 # For AWS modules
# Install Python library on managed nodes
---
- name: Install required Python modules
hosts: all
become: yes
tasks:
- name: Install pip
package:
name: python3-pip
state: present
- name: Install Python library
pip:
name: docker
state: present
# Check Python interpreter
ansible all -m setup -a "filter=ansible_python*"
# Set Python interpreter
ansible_python_interpreter=/usr/bin/python3
Variable Not Defined
Undefined Variable
Error: 'my_variable' is undefined
---
- name: Handle undefined variables
hosts: all
tasks:
# Use default filter
- name: Variable with default
debug:
msg: "Port: {{ app_port | default(8080) }}"
# Check if defined
- name: Only run if defined
debug:
msg: "Database: {{ db_host }}"
when: db_host is defined
# Fail if not defined
- name: Ensure variable is defined
assert:
that:
- required_var is defined
- required_var | length > 0
fail_msg: "required_var must be defined"
# Set default in play
- name: Set defaults
set_fact:
app_port: "{{ app_port | default(8080) }}"
app_host: "{{ app_host | default('localhost') }}"
# Debug all variables
- name: Show all variables
debug:
var: vars
verbosity: 2
# Check variable precedence
ansible-playbook playbook.yml -e "my_var=value" -vv
YAML Syntax Errors
YAML Syntax Error
Error: Syntax Error while loading YAML
# Common YAML mistakes:
# 1. Incorrect indentation
tasks:
- name: Task one
debug:
msg: "Hello"
- name: Task two # Wrong indentation
debug:
msg: "World"
# 2. Missing colon
tasks
- name: Install package # Missing colon after tasks
yum:
name httpd # Missing colon after name
# 3. Mixing tabs and spaces (use spaces only!)
# Set editor to insert spaces: set expandtab in vim
# 4. Unquoted special characters
tasks:
- name: Use quotes
debug:
msg: "Value: {{ my_var }}" # Correct
# msg: {{ my_var }} # Wrong - needs quotes
# 5. Incorrect list syntax
packages:
- nginx
- mysql
- redis
# Not: packages: nginx, mysql, redis
# Validate YAML
python -m yaml playbook.yml
# Use yamllint
pip install yamllint
yamllint playbook.yml
# .yamllint configuration
---
extends: default
rules:
line-length:
max: 120
indentation:
spaces: 2
Performance Issues
Slow Playbook Execution
# Profile playbook performance
ansible-playbook playbook.yml \
--callback-whitelist=timer,profile_tasks,profile_roles
# Enable SSH pipelining (huge speedup)
[ssh_connection]
pipelining = True
# Requires: Defaults:ansible !requiretty in /etc/sudoers
# Increase parallel execution
[defaults]
forks = 50
# Use fact caching
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/facts_cache
fact_caching_timeout = 86400
# Disable fact gathering when not needed
---
- name: Fast playbook
hosts: all
gather_facts: no
tasks:
- name: Quick task
ping:
# Use async for long-running tasks
- name: Long running task
command: /usr/bin/long_running_operation
async: 3600
poll: 0
register: long_task
- name: Check on async task later
async_status:
jid: "{{ long_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 100
delay: 10
# Use strategy plugins
- name: Free strategy (don't wait for all hosts)
hosts: all
strategy: free
tasks:
- name: Independent task
command: /usr/bin/task
Identifying Slow Tasks
# Enable profiling
[defaults]
callback_whitelist = profile_tasks
# Output shows:
# TASK [Install packages] ********************
# ok: [host1] => (item=nginx) 15.23s
# ok: [host2] => (item=nginx) 14.87s
# Analyze profile output
ansible-playbook site.yml 2>&1 | \
grep -A1 "TASK \[" | \
grep -E "^[0-9]" | \
sort -rn | \
head -10
# Monitor system resources during execution
---
- name: Monitor performance
hosts: all
tasks:
- name: Capture start metrics
shell: |
free -m
uptime
ps aux | grep ansible
register: start_metrics
delegate_to: localhost
- name: Your tasks here
include_tasks: main_tasks.yml
- name: Capture end metrics
shell: |
free -m
uptime
ps aux | grep ansible
register: end_metrics
delegate_to: localhost
- name: Compare metrics
debug:
msg: "{{ start_metrics.stdout }} vs {{ end_metrics.stdout }}"
Debugging Complex Issues
Using Debugger
---
- name: Interactive debugging
hosts: localhost
gather_facts: no
debugger: on_failed # Options: always, never, on_failed, on_unreachable, on_skipped
tasks:
- name: Task that might fail
command: /bin/false
# When this fails, drops into debugger
# Debugger commands:
# p task - Print task
# p task.args - Print task arguments
# p task_vars - Print all variables
# p task_vars['my_var'] - Print specific variable
# task.args['arg'] = value - Modify task argument
# r - Redo task
# c - Continue
# q - Quit debugger
# Enable debugger in ansible.cfg
[defaults]
enable_task_debugger = True
# Use with strategy
- name: Debug with strategy
hosts: all
strategy: debug
tasks:
- name: Debug this
shell: echo "test"
Tracing Execution Flow
---
- name: Trace execution
hosts: all
tasks:
- name: Log task start
debug:
msg: "=== Starting task: {{ ansible_play_name }} ==="
tags: always
- name: Show current host
debug:
msg: "Processing {{ inventory_hostname }}"
- name: Show variables in scope
debug:
var: vars
verbosity: 2
- name: Conditional with debug
block:
- name: Risky operation
command: /usr/bin/risky-op
register: result
- name: Debug result
debug:
var: result
rescue:
- name: Error occurred
debug:
msg: "Failed: {{ ansible_failed_result }}"
- name: Show error details
debug:
var: ansible_failed_result
always:
- name: Cleanup debug
debug:
msg: "Task completed"
# Log to file
- name: Log to file
hosts: all
tasks:
- name: Log execution
lineinfile:
path: /tmp/ansible-debug.log
line: "{{ ansible_date_time.iso8601 }} - {{ inventory_hostname }} - {{ ansible_play_name }}"
create: yes
delegate_to: localhost
Testing Conditions
---
- name: Test conditionals
hosts: localhost
gather_facts: yes
vars:
test_var: "hello"
test_number: 42
test_list: [1, 2, 3]
tasks:
- name: Test string
debug:
msg: "String test: {{ test_var is string }}"
- name: Test number
debug:
msg: "Number test: {{ test_number is number }}"
- name: Test list
debug:
msg: "List test: {{ test_list is iterable }}"
- name: Test version
debug:
msg: "Version test: {{ ansible_version.full is version('2.9', '>=') }}"
- name: Test defined
debug:
msg: "Defined test: {{ undefined_var is defined }}"
- name: Test facts
debug:
msg: |
OS Family: {{ ansible_os_family }}
Distribution: {{ ansible_distribution }}
Version: {{ ansible_distribution_version }}
Is RedHat: {{ ansible_os_family == 'RedHat' }}
Is Debian: {{ ansible_os_family == 'Debian' }}
- name: Assert expectations
assert:
that:
- ansible_os_family in ['RedHat', 'Debian']
- ansible_memtotal_mb >= 1024
- ansible_processor_vcpus >= 1
fail_msg: "System requirements not met"
success_msg: "System checks passed"
Network and Connectivity
Testing Connectivity
# Test ping
ansible all -m ping
# Test with specific user
ansible all -m ping -u deploy
# Test with sudo
ansible all -m ping -b
# Test SSH connectivity
ansible all -m shell -a "echo 'SSH works'"
# Detailed connection test
ansible all -m setup -a "filter=ansible_hostname" -vvv
# Test from playbook
---
- name: Connectivity tests
hosts: all
gather_facts: no
tasks:
- name: Ping test
ping:
register: ping_result
- name: Show ping result
debug:
var: ping_result
- name: Test DNS resolution
command: nslookup google.com
register: dns_test
- name: Test outbound connectivity
uri:
url: https://www.google.com
timeout: 10
register: http_test
- name: Report results
debug:
msg: |
Ping: {{ 'OK' if ping_result is succeeded else 'FAILED' }}
DNS: {{ 'OK' if dns_test.rc == 0 else 'FAILED' }}
HTTP: {{ 'OK' if http_test.status == 200 else 'FAILED' }}
Proxy and Firewall Issues
# Configure proxy
export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
export no_proxy=localhost,127.0.0.1,.local
# In playbook
---
- name: Use proxy
hosts: all
environment:
http_proxy: http://proxy.example.com:8080
https_proxy: http://proxy.example.com:8080
tasks:
- name: Download with proxy
get_url:
url: https://example.com/file
dest: /tmp/file
# Test firewall
- name: Check firewall
hosts: all
tasks:
- name: Check if port is open
wait_for:
host: "{{ target_host }}"
port: "{{ target_port }}"
timeout: 5
register: port_check
ignore_errors: yes
- name: Report port status
debug:
msg: "Port {{ target_port }} is {{ 'OPEN' if port_check is succeeded else 'CLOSED' }}"
Logging for Troubleshooting
Enable Detailed Logging
# ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log
callback_whitelist = log_plays, timer, profile_tasks
# View logs in real-time
tail -f /var/log/ansible/ansible.log
# Search for errors
grep -i error /var/log/ansible/ansible.log
grep -i failed /var/log/ansible/ansible.log
# Analyze specific playbook run
grep "PLAY \[My Playbook\]" /var/log/ansible/ansible.log -A 100
# Create detailed debug log
ansible-playbook site.yml -vvvv 2>&1 | tee debug.log
# Log with timestamps
ansible-playbook site.yml -vv 2>&1 | while IFS= read -r line; do
echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
done | tee timestamped.log
Common Troubleshooting Commands
# Check inventory
ansible-inventory --list
ansible-inventory --graph
ansible-inventory --host hostname
# Test module execution
ansible localhost -m debug -a "msg='Test'"
ansible all -m setup
ansible all -m command -a "uptime"
# Verify configuration
ansible-config dump
ansible-config view
ansible --version
# Test connection
ansible all -m ping -vvv
ansible all -m shell -a "hostname" -o
# List available modules
ansible-doc -l
ansible-doc -t lookup -l
ansible-doc -t callback -l
# Check facts
ansible hostname -m setup
ansible hostname -m setup -a "filter=ansible_distribution*"
# Test playbook
ansible-playbook playbook.yml --syntax-check
ansible-playbook playbook.yml --check
ansible-playbook playbook.yml --list-tasks
ansible-playbook playbook.yml --list-tags
ansible-playbook playbook.yml --list-hosts
# Debug playbook
ansible-playbook playbook.yml -vvv
ansible-playbook playbook.yml --step
ansible-playbook playbook.yml --start-at-task="Task name"
ansible-playbook playbook.yml --limit "host1,host2"
Troubleshooting Checklist
Troubleshooting Checklist:
- Verify connectivity: Can you SSH to the host?
- Check syntax: Run
--syntax-check - Increase verbosity: Add -v, -vv, or -vvv flags
- Test in check mode: Use
--check --diff - Check permissions: Verify sudo/become settings
- Review logs: Check ansible.log for errors
- Verify variables: Use debug module to print values
- Check facts: Run setup module to verify facts
- Test modules: Run ad-hoc commands to test
- Review documentation: Check ansible-doc for module details
- Check versions: Ensure Ansible and Python versions are compatible
- Isolate problem: Use --limit and --step to narrow down issue
Getting Help
Resources for Help:
- Documentation:
ansible-doc module_name - Official Docs: https://docs.ansible.com
- Community: Ansible mailing list and IRC #ansible
- Stack Overflow: Tag your questions with [ansible]
- GitHub: Report bugs at github.com/ansible/ansible
- Galaxy: Check role documentation at galaxy.ansible.com
Next Steps
- Learn about Logging and Monitoring for better debugging
- Explore Configuration for optimization
- Security Best Practices for secure troubleshooting
- Master Advanced Topics for complex scenarios
- Practice in the Playground with interactive examples