Logging and Monitoring
Introduction
Effective logging and monitoring are crucial for maintaining Ansible automation at scale. Comprehensive logging enables troubleshooting, audit trails, compliance reporting, and performance analysis. Ansible provides multiple mechanisms for capturing and analyzing playbook execution data.
- File Logging: Write playbook execution logs to files
- Callback Plugins: Customize output and logging behavior
- Syslog Integration: Send logs to centralized syslog servers
- External Systems: Integration with Splunk, ELK, Datadog, etc.
- Performance Profiling: Track task timing and resource usage
Basic File Logging
Enabling Log Files
Configure Ansible to write logs to a file:
# In ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log
# Ensure the directory exists and is writable
mkdir -p /var/log/ansible
chmod 755 /var/log/ansible
# Or use environment variable
export ANSIBLE_LOG_PATH=/var/log/ansible/ansible.log
# Run playbook (output will be logged)
ansible-playbook site.yml
Log Rotation
Configure logrotate to manage Ansible log files:
# /etc/logrotate.d/ansible
/var/log/ansible/ansible.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 ansible ansible
sharedscripts
postrotate
/usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
endscript
}
# Test logrotate configuration
logrotate -d /etc/logrotate.d/ansible
# Force rotation
logrotate -f /etc/logrotate.d/ansible
Per-Playbook Logging
Log different playbooks to different files:
---
- name: Playbook with custom logging
hosts: all
vars:
ansible_log_path: "/var/log/ansible/{{ playbook_name }}-{{ ansible_date_time.date }}.log"
playbook_name: deployment
pre_tasks:
- name: Ensure log directory exists
file:
path: /var/log/ansible
state: directory
mode: '0755'
delegate_to: localhost
run_once: true
tasks:
- name: Log playbook start
lineinfile:
path: "{{ ansible_log_path }}"
line: "{{ ansible_date_time.iso8601 }} - Playbook started on {{ inventory_hostname }}"
create: yes
delegate_to: localhost
- name: Your tasks here
debug:
msg: "Executing tasks"
post_tasks:
- name: Log playbook completion
lineinfile:
path: "{{ ansible_log_path }}"
line: "{{ ansible_date_time.iso8601 }} - Playbook completed on {{ inventory_hostname }}"
delegate_to: localhost
Callback Plugins for Logging
Log Plays Callback
The log_plays callback logs all play execution to syslog:
# ansible.cfg
[defaults]
callback_whitelist = log_plays
# Logs are sent to syslog with these details:
# - Playbook name
# - Play name
# - Task name
# - Host
# - Task result (ok, changed, failed, skipped)
# View syslog output
sudo tail -f /var/log/syslog | grep ansible
# Example syslog entry:
# ansible-playbook[12345]: PLAY [Configure web servers]
# ansible-playbook[12345]: TASK [Install nginx] ***
# ansible-playbook[12345]: ok: [web01]
JSON Callback
Output playbook results in JSON format for parsing:
# ansible.cfg
[defaults]
stdout_callback = json
# Run playbook with JSON output
ansible-playbook site.yml > output.json
# Parse JSON output with jq
cat output.json | jq '.plays[].tasks[] | {task: .task.name, hosts: .hosts | keys}'
# Example JSON structure:
{
"plays": [
{
"play": {
"name": "Configure servers",
"id": "uuid"
},
"tasks": [
{
"task": {
"name": "Install packages",
"id": "uuid"
},
"hosts": {
"web01": {
"changed": true,
"msg": "Package installed"
}
}
}
]
}
],
"stats": {
"web01": {
"ok": 10,
"changed": 5,
"unreachable": 0,
"failed": 0,
"skipped": 2
}
}
}
Profile Tasks Callback
Measure and log task execution times:
# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, profile_roles
# Output shows timing for each task:
# PLAY RECAP *********************************************
# web01 : ok=15 changed=5 unreachable=0 failed=0
#
# TASK TIMING ********************************************
# Install packages ---------------------------- 45.23s
# Copy configuration files ------------------- 12.34s
# Restart services --------------------------- 8.91s
# Update application code -------------------- 5.67s
# Export timing data for analysis
ansible-playbook site.yml | grep -A 100 "TASK TIMING" > task_times.txt
Timer Callback
Log total playbook execution time:
# ansible.cfg
[defaults]
callback_whitelist = timer
# Output at end of playbook:
# Playbook run took 0 days, 0 hours, 3 minutes, 45 seconds
# Combine multiple callbacks
callback_whitelist = timer, profile_tasks, profile_roles, log_plays
Custom Logging with Callback Plugins
Custom Callback Plugin
Create a custom callback plugin for specialized logging:
# callback_plugins/custom_logger.py
from ansible.plugins.callback import CallbackBase
from datetime import datetime
import json
import os
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'custom_logger'
CALLBACK_NEEDS_WHITELIST = True
def __init__(self):
super(CallbackModule, self).__init__()
self.log_file = os.environ.get('CUSTOM_LOG_FILE', '/var/log/ansible/custom.log')
self.start_time = None
def log(self, data):
"""Write data to log file"""
with open(self.log_file, 'a') as f:
f.write(json.dumps(data) + '\n')
def v2_playbook_on_start(self, playbook):
"""Called when playbook starts"""
self.start_time = datetime.now()
self.log({
'timestamp': self.start_time.isoformat(),
'event': 'playbook_start',
'playbook': playbook._file_name
})
def v2_playbook_on_task_start(self, task, is_conditional):
"""Called when task starts"""
self.log({
'timestamp': datetime.now().isoformat(),
'event': 'task_start',
'task': task.get_name()
})
def v2_runner_on_ok(self, result):
"""Called when task succeeds"""
self.log({
'timestamp': datetime.now().isoformat(),
'event': 'task_ok',
'host': result._host.get_name(),
'task': result._task.get_name(),
'changed': result._result.get('changed', False)
})
def v2_runner_on_failed(self, result, ignore_errors=False):
"""Called when task fails"""
self.log({
'timestamp': datetime.now().isoformat(),
'event': 'task_failed',
'host': result._host.get_name(),
'task': result._task.get_name(),
'error': result._result.get('msg', 'Unknown error')
})
def v2_playbook_on_stats(self, stats):
"""Called when playbook completes"""
end_time = datetime.now()
duration = (end_time - self.start_time).total_seconds()
hosts = sorted(stats.processed.keys())
summary = {}
for host in hosts:
summary[host] = stats.summarize(host)
self.log({
'timestamp': end_time.isoformat(),
'event': 'playbook_end',
'duration_seconds': duration,
'summary': summary
})
Enable and use the custom callback:
# ansible.cfg
[defaults]
callback_plugins = ./callback_plugins
callback_whitelist = custom_logger
# Set log file location
export CUSTOM_LOG_FILE=/var/log/ansible/detailed.log
# Run playbook
ansible-playbook site.yml
# View structured logs
cat /var/log/ansible/detailed.log | jq '.'
Slack Notification Callback
Send notifications to Slack:
# ansible.cfg
[defaults]
callback_whitelist = slack
# Environment variables for Slack integration
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
# callback_plugins/slack.py (simplified example)
from ansible.plugins.callback import CallbackBase
import json
import requests
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'slack'
CALLBACK_NEEDS_WHITELIST = True
def __init__(self):
super(CallbackModule, self).__init__()
self.webhook_url = os.environ.get('SLACK_WEBHOOK_URL')
def send_slack(self, message):
if self.webhook_url:
payload = {'text': message}
requests.post(self.webhook_url, json=payload)
def v2_playbook_on_start(self, playbook):
self.send_slack(f"Playbook started: {playbook._file_name}")
def v2_playbook_on_stats(self, stats):
hosts = sorted(stats.processed.keys())
summary = []
for host in hosts:
s = stats.summarize(host)
summary.append(f"{host}: ok={s['ok']} changed={s['changed']} failed={s['failures']}")
message = "Playbook completed:\n" + "\n".join(summary)
self.send_slack(message)
Syslog Integration
Sending Logs to Syslog
# ansible.cfg
[defaults]
callback_whitelist = syslog
# Logs are sent to local syslog daemon
# Configure rsyslog to forward to central server
# /etc/rsyslog.d/10-ansible.conf
# Forward Ansible logs to remote server
if $programname == 'ansible-playbook' then @@log-server.example.com:514
& stop
# Restart rsyslog
sudo systemctl restart rsyslog
# View Ansible logs in syslog
sudo tail -f /var/log/syslog | grep ansible
# Filter by severity
logger -t ansible-playbook -p local0.info "Test message"
Structured Syslog Logging
---
- name: Playbook with structured logging
hosts: all
tasks:
- name: Log structured event
shell: |
logger -t ansible-playbook -p local0.info \
"event=task_start playbook={{ playbook_dir | basename }} \
host={{ inventory_hostname }} task={{ ansible_play_name }}"
- name: Your actual tasks
debug:
msg: "Doing work"
- name: Log completion
shell: |
logger -t ansible-playbook -p local0.info \
"event=task_complete playbook={{ playbook_dir | basename }} \
host={{ inventory_hostname }} status=success"
Integration with External Systems
ELK Stack Integration
Send Ansible logs to Elasticsearch, Logstash, Kibana:
# Filebeat configuration for Ansible logs
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/ansible/ansible.log
fields:
app: ansible
environment: production
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
output.logstash:
hosts: ["logstash.example.com:5044"]
# Logstash pipeline
# /etc/logstash/conf.d/ansible.conf
input {
beats {
port => 5044
}
}
filter {
if [fields][app] == "ansible" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} - %{GREEDYDATA:log_message}" }
}
date {
match => [ "timestamp", "ISO8601" ]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch.example.com:9200"]
index => "ansible-logs-%{+YYYY.MM.dd}"
}
}
Splunk Integration
# Configure Splunk Universal Forwarder
# /opt/splunkforwarder/etc/system/local/inputs.conf
[monitor:///var/log/ansible/ansible.log]
disabled = false
index = ansible
sourcetype = ansible:log
# Custom callback for Splunk HEC
# callback_plugins/splunk_hec.py
from ansible.plugins.callback import CallbackBase
import json
import requests
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'splunk_hec'
CALLBACK_NEEDS_WHITELIST = True
def __init__(self):
super(CallbackModule, self).__init__()
self.hec_url = os.environ.get('SPLUNK_HEC_URL')
self.hec_token = os.environ.get('SPLUNK_HEC_TOKEN')
def send_to_splunk(self, event_data):
if self.hec_url and self.hec_token:
headers = {
'Authorization': f'Splunk {self.hec_token}',
'Content-Type': 'application/json'
}
event = {
'event': event_data,
'sourcetype': 'ansible:callback'
}
requests.post(self.hec_url, headers=headers, json=event)
def v2_runner_on_ok(self, result):
event = {
'event_type': 'task_ok',
'host': result._host.get_name(),
'task': result._task.get_name(),
'changed': result._result.get('changed', False)
}
self.send_to_splunk(event)
Datadog Integration
# Custom callback for Datadog
# callback_plugins/datadog.py
from ansible.plugins.callback import CallbackBase
from datadog import initialize, api
import time
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'datadog'
CALLBACK_NEEDS_WHITELIST = True
def __init__(self):
super(CallbackModule, self).__init__()
options = {
'api_key': os.environ.get('DATADOG_API_KEY'),
'app_key': os.environ.get('DATADOG_APP_KEY')
}
initialize(**options)
self.start_time = None
def v2_playbook_on_start(self, playbook):
self.start_time = time.time()
title = f"Ansible Playbook Started: {playbook._file_name}"
text = f"Playbook {playbook._file_name} execution started"
tags = ['ansible', 'playbook:start']
api.Event.create(title=title, text=text, tags=tags)
def v2_playbook_on_stats(self, stats):
duration = time.time() - self.start_time
hosts = sorted(stats.processed.keys())
for host in hosts:
s = stats.summarize(host)
# Send metrics to Datadog
api.Metric.send(
metric='ansible.tasks.ok',
points=s['ok'],
tags=[f'host:{host}']
)
api.Metric.send(
metric='ansible.tasks.changed',
points=s['changed'],
tags=[f'host:{host}']
)
api.Metric.send(
metric='ansible.tasks.failed',
points=s['failures'],
tags=[f'host:{host}']
)
# Send duration metric
api.Metric.send(
metric='ansible.playbook.duration',
points=duration,
tags=['playbook:completed']
)
Audit Logging
Comprehensive Audit Trail
---
- name: Playbook with audit logging
hosts: all
vars:
audit_log: "/var/log/ansible/audit/{{ ansible_date_time.date }}.log"
tasks:
- name: Create audit log entry - Start
lineinfile:
path: "{{ audit_log }}"
line: |
{{ ansible_date_time.iso8601 }} | START | {{ ansible_play_name }} |
User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
Playbook: {{ playbook_dir }}/{{ ansible_play_name }}
create: yes
mode: '0600'
delegate_to: localhost
- name: Sensitive operation
user:
name: dbadmin
state: present
register: user_result
- name: Audit log - User created
lineinfile:
path: "{{ audit_log }}"
line: |
{{ ansible_date_time.iso8601 }} | ACTION | User Management |
User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
Action: User created - dbadmin | Changed: {{ user_result.changed }}
delegate_to: localhost
when: user_result.changed
- name: Configuration change
template:
src: config.j2
dest: /etc/app/config.yml
register: config_result
- name: Audit log - Configuration changed
lineinfile:
path: "{{ audit_log }}"
line: |
{{ ansible_date_time.iso8601 }} | ACTION | Configuration |
User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
File: /etc/app/config.yml | Changed: {{ config_result.changed }}
delegate_to: localhost
- name: Create audit log entry - Complete
lineinfile:
path: "{{ audit_log }}"
line: |
{{ ansible_date_time.iso8601 }} | COMPLETE | {{ ansible_play_name }} |
User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
Status: Success
delegate_to: localhost
Compliance Logging
---
- name: Compliance audit logging
hosts: all
vars:
compliance_log: "/var/log/ansible/compliance.json"
tasks:
- name: Log compliance check
lineinfile:
path: "{{ compliance_log }}"
line: "{{ log_entry | to_json }}"
create: yes
delegate_to: localhost
vars:
log_entry:
timestamp: "{{ ansible_date_time.iso8601 }}"
host: "{{ inventory_hostname }}"
check: "{{ item.name }}"
status: "{{ item.status }}"
operator: "{{ ansible_user_id }}"
playbook: "{{ ansible_play_name }}"
loop:
- { name: "SSH Key Authentication", status: "compliant" }
- { name: "Password Policy", status: "compliant" }
- { name: "Firewall Status", status: "non-compliant" }
Performance Monitoring
Task Performance Analysis
# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, profile_roles, cgroup_perf_recap
# Output shows:
# - Task execution time
# - Role execution time
# - CPU usage
# - Memory usage
# Export performance data
ansible-playbook site.yml 2>&1 | tee performance.log
# Analyze slow tasks
grep "TASK" performance.log | sort -t '-' -k2 -n | tail -10
Resource Usage Tracking
---
- name: Monitor resource usage during playbook
hosts: all
tasks:
- name: Capture pre-execution metrics
shell: |
echo "CPU: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}')"
echo "Memory: $(free -m | awk 'NR==2{printf "%.2f%%", $3*100/$2 }')"
echo "Disk: $(df -h / | awk 'NR==2{print $5}')"
register: pre_metrics
- name: Your resource-intensive tasks
include_tasks: heavy_tasks.yml
- name: Capture post-execution metrics
shell: |
echo "CPU: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}')"
echo "Memory: $(free -m | awk 'NR==2{printf "%.2f%%", $3*100/$2 }')"
echo "Disk: $(df -h / | awk 'NR==2{print $5}')"
register: post_metrics
- name: Log resource usage
copy:
content: |
Before:
{{ pre_metrics.stdout }}
After:
{{ post_metrics.stdout }}
dest: "/var/log/ansible/resources-{{ inventory_hostname }}.txt"
delegate_to: localhost
Log Analysis and Reporting
Parsing Ansible Logs
#!/bin/bash
# analyze-ansible-logs.sh
LOG_FILE="/var/log/ansible/ansible.log"
echo "=== Ansible Log Analysis ==="
echo
echo "Total playbook runs:"
grep -c "PLAY \[" "$LOG_FILE"
echo "Failed tasks:"
grep -c "fatal:" "$LOG_FILE"
echo "Changed tasks:"
grep -c "changed:" "$LOG_FILE"
echo "Most common tasks:"
grep "TASK \[" "$LOG_FILE" | awk -F'[\\[\\]]' '{print $2}' | sort | uniq -c | sort -rn | head -10
echo
echo "Failures by host:"
grep "fatal:" "$LOG_FILE" | awk '{print $2}' | sort | uniq -c | sort -rn
echo
echo "Recent errors (last 24 hours):"
find "$LOG_FILE" -mtime -1 -exec grep "ERROR" {} \;
Generating Reports
#!/usr/bin/env python3
# generate-ansible-report.py
import json
import sys
from datetime import datetime
from collections import defaultdict
def parse_json_log(log_file):
"""Parse JSON callback output"""
with open(log_file) as f:
data = json.load(f)
report = {
'playbook': data.get('playbook', 'Unknown'),
'start_time': None,
'end_time': None,
'duration': 0,
'total_tasks': 0,
'hosts': defaultdict(lambda: {'ok': 0, 'changed': 0, 'failed': 0, 'skipped': 0})
}
# Process plays and tasks
for play in data.get('plays', []):
for task in play.get('tasks', []):
report['total_tasks'] += 1
for host, result in task.get('hosts', {}).items():
if result.get('changed'):
report['hosts'][host]['changed'] += 1
elif result.get('failed'):
report['hosts'][host]['failed'] += 1
elif result.get('skipped'):
report['hosts'][host]['skipped'] += 1
else:
report['hosts'][host]['ok'] += 1
# Print report
print("=" * 60)
print(f"Ansible Playbook Report: {report['playbook']}")
print("=" * 60)
print(f"\nTotal Tasks: {report['total_tasks']}")
print(f"\nHost Summary:")
for host, stats in report['hosts'].items():
print(f" {host}:")
print(f" OK: {stats['ok']}, Changed: {stats['changed']}, "
f"Failed: {stats['failed']}, Skipped: {stats['skipped']}")
if __name__ == '__main__':
if len(sys.argv) < 2:
print("Usage: generate-ansible-report.py ")
sys.exit(1)
parse_json_log(sys.argv[1])
Best Practices
- Enable logging always: Set log_path in ansible.cfg
- Rotate logs: Use logrotate to manage log file sizes
- Structured logging: Use JSON format for machine parsing
- Centralized logging: Forward logs to central server
- Sensitive data: Use no_log to prevent logging secrets
- Audit trail: Log all privileged operations
- Performance monitoring: Enable profile callbacks
- Retention policy: Define how long to keep logs
- Access control: Restrict log file permissions (600 or 640)
- Monitoring alerts: Set up alerts for failures
Next Steps
- Learn about Ansible Configuration for logging settings
- Explore Security Best Practices for protecting logs
- Master Troubleshooting with log analysis
- Study Plugins for custom callback development
- Practice in the Playground with logging examples