Logging and Monitoring

Introduction

Effective logging and monitoring are crucial for maintaining Ansible automation at scale. Comprehensive logging enables troubleshooting, audit trails, compliance reporting, and performance analysis. Ansible provides multiple mechanisms for capturing and analyzing playbook execution data.

Logging Mechanisms:
  • File Logging: Write playbook execution logs to files
  • Callback Plugins: Customize output and logging behavior
  • Syslog Integration: Send logs to centralized syslog servers
  • External Systems: Integration with Splunk, ELK, Datadog, etc.
  • Performance Profiling: Track task timing and resource usage

Basic File Logging

Enabling Log Files

Configure Ansible to write logs to a file:

# In ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log

# Ensure the directory exists and is writable
mkdir -p /var/log/ansible
chmod 755 /var/log/ansible

# Or use environment variable
export ANSIBLE_LOG_PATH=/var/log/ansible/ansible.log

# Run playbook (output will be logged)
ansible-playbook site.yml

Log Rotation

Configure logrotate to manage Ansible log files:

# /etc/logrotate.d/ansible
/var/log/ansible/ansible.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 ansible ansible
    sharedscripts
    postrotate
        /usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
    endscript
}

# Test logrotate configuration
logrotate -d /etc/logrotate.d/ansible

# Force rotation
logrotate -f /etc/logrotate.d/ansible

Per-Playbook Logging

Log different playbooks to different files:

---
- name: Playbook with custom logging
  hosts: all
  vars:
    ansible_log_path: "/var/log/ansible/{{ playbook_name }}-{{ ansible_date_time.date }}.log"
    playbook_name: deployment

  pre_tasks:
    - name: Ensure log directory exists
      file:
        path: /var/log/ansible
        state: directory
        mode: '0755'
      delegate_to: localhost
      run_once: true

  tasks:
    - name: Log playbook start
      lineinfile:
        path: "{{ ansible_log_path }}"
        line: "{{ ansible_date_time.iso8601 }} - Playbook started on {{ inventory_hostname }}"
        create: yes
      delegate_to: localhost

    - name: Your tasks here
      debug:
        msg: "Executing tasks"

  post_tasks:
    - name: Log playbook completion
      lineinfile:
        path: "{{ ansible_log_path }}"
        line: "{{ ansible_date_time.iso8601 }} - Playbook completed on {{ inventory_hostname }}"
      delegate_to: localhost

Callback Plugins for Logging

Log Plays Callback

The log_plays callback logs all play execution to syslog:

# ansible.cfg
[defaults]
callback_whitelist = log_plays

# Logs are sent to syslog with these details:
# - Playbook name
# - Play name
# - Task name
# - Host
# - Task result (ok, changed, failed, skipped)

# View syslog output
sudo tail -f /var/log/syslog | grep ansible

# Example syslog entry:
# ansible-playbook[12345]: PLAY [Configure web servers]
# ansible-playbook[12345]: TASK [Install nginx] ***
# ansible-playbook[12345]: ok: [web01]

JSON Callback

Output playbook results in JSON format for parsing:

# ansible.cfg
[defaults]
stdout_callback = json

# Run playbook with JSON output
ansible-playbook site.yml > output.json

# Parse JSON output with jq
cat output.json | jq '.plays[].tasks[] | {task: .task.name, hosts: .hosts | keys}'

# Example JSON structure:
{
  "plays": [
    {
      "play": {
        "name": "Configure servers",
        "id": "uuid"
      },
      "tasks": [
        {
          "task": {
            "name": "Install packages",
            "id": "uuid"
          },
          "hosts": {
            "web01": {
              "changed": true,
              "msg": "Package installed"
            }
          }
        }
      ]
    }
  ],
  "stats": {
    "web01": {
      "ok": 10,
      "changed": 5,
      "unreachable": 0,
      "failed": 0,
      "skipped": 2
    }
  }
}

Profile Tasks Callback

Measure and log task execution times:

# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, profile_roles

# Output shows timing for each task:
# PLAY RECAP *********************************************
# web01 : ok=15   changed=5    unreachable=0    failed=0
#
# TASK TIMING ********************************************
# Install packages ---------------------------- 45.23s
# Copy configuration files ------------------- 12.34s
# Restart services --------------------------- 8.91s
# Update application code -------------------- 5.67s

# Export timing data for analysis
ansible-playbook site.yml | grep -A 100 "TASK TIMING" > task_times.txt

Timer Callback

Log total playbook execution time:

# ansible.cfg
[defaults]
callback_whitelist = timer

# Output at end of playbook:
# Playbook run took 0 days, 0 hours, 3 minutes, 45 seconds

# Combine multiple callbacks
callback_whitelist = timer, profile_tasks, profile_roles, log_plays

Custom Logging with Callback Plugins

Custom Callback Plugin

Create a custom callback plugin for specialized logging:

# callback_plugins/custom_logger.py
from ansible.plugins.callback import CallbackBase
from datetime import datetime
import json
import os

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'notification'
    CALLBACK_NAME = 'custom_logger'
    CALLBACK_NEEDS_WHITELIST = True

    def __init__(self):
        super(CallbackModule, self).__init__()
        self.log_file = os.environ.get('CUSTOM_LOG_FILE', '/var/log/ansible/custom.log')
        self.start_time = None

    def log(self, data):
        """Write data to log file"""
        with open(self.log_file, 'a') as f:
            f.write(json.dumps(data) + '\n')

    def v2_playbook_on_start(self, playbook):
        """Called when playbook starts"""
        self.start_time = datetime.now()
        self.log({
            'timestamp': self.start_time.isoformat(),
            'event': 'playbook_start',
            'playbook': playbook._file_name
        })

    def v2_playbook_on_task_start(self, task, is_conditional):
        """Called when task starts"""
        self.log({
            'timestamp': datetime.now().isoformat(),
            'event': 'task_start',
            'task': task.get_name()
        })

    def v2_runner_on_ok(self, result):
        """Called when task succeeds"""
        self.log({
            'timestamp': datetime.now().isoformat(),
            'event': 'task_ok',
            'host': result._host.get_name(),
            'task': result._task.get_name(),
            'changed': result._result.get('changed', False)
        })

    def v2_runner_on_failed(self, result, ignore_errors=False):
        """Called when task fails"""
        self.log({
            'timestamp': datetime.now().isoformat(),
            'event': 'task_failed',
            'host': result._host.get_name(),
            'task': result._task.get_name(),
            'error': result._result.get('msg', 'Unknown error')
        })

    def v2_playbook_on_stats(self, stats):
        """Called when playbook completes"""
        end_time = datetime.now()
        duration = (end_time - self.start_time).total_seconds()

        hosts = sorted(stats.processed.keys())
        summary = {}
        for host in hosts:
            summary[host] = stats.summarize(host)

        self.log({
            'timestamp': end_time.isoformat(),
            'event': 'playbook_end',
            'duration_seconds': duration,
            'summary': summary
        })

Enable and use the custom callback:

# ansible.cfg
[defaults]
callback_plugins = ./callback_plugins
callback_whitelist = custom_logger

# Set log file location
export CUSTOM_LOG_FILE=/var/log/ansible/detailed.log

# Run playbook
ansible-playbook site.yml

# View structured logs
cat /var/log/ansible/detailed.log | jq '.'

Slack Notification Callback

Send notifications to Slack:

# ansible.cfg
[defaults]
callback_whitelist = slack

# Environment variables for Slack integration
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# callback_plugins/slack.py (simplified example)
from ansible.plugins.callback import CallbackBase
import json
import requests

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'notification'
    CALLBACK_NAME = 'slack'
    CALLBACK_NEEDS_WHITELIST = True

    def __init__(self):
        super(CallbackModule, self).__init__()
        self.webhook_url = os.environ.get('SLACK_WEBHOOK_URL')

    def send_slack(self, message):
        if self.webhook_url:
            payload = {'text': message}
            requests.post(self.webhook_url, json=payload)

    def v2_playbook_on_start(self, playbook):
        self.send_slack(f"Playbook started: {playbook._file_name}")

    def v2_playbook_on_stats(self, stats):
        hosts = sorted(stats.processed.keys())
        summary = []
        for host in hosts:
            s = stats.summarize(host)
            summary.append(f"{host}: ok={s['ok']} changed={s['changed']} failed={s['failures']}")

        message = "Playbook completed:\n" + "\n".join(summary)
        self.send_slack(message)

Syslog Integration

Sending Logs to Syslog

# ansible.cfg
[defaults]
callback_whitelist = syslog

# Logs are sent to local syslog daemon
# Configure rsyslog to forward to central server

# /etc/rsyslog.d/10-ansible.conf
# Forward Ansible logs to remote server
if $programname == 'ansible-playbook' then @@log-server.example.com:514
& stop

# Restart rsyslog
sudo systemctl restart rsyslog

# View Ansible logs in syslog
sudo tail -f /var/log/syslog | grep ansible

# Filter by severity
logger -t ansible-playbook -p local0.info "Test message"

Structured Syslog Logging

---
- name: Playbook with structured logging
  hosts: all
  tasks:
    - name: Log structured event
      shell: |
        logger -t ansible-playbook -p local0.info \
          "event=task_start playbook={{ playbook_dir | basename }} \
           host={{ inventory_hostname }} task={{ ansible_play_name }}"

    - name: Your actual tasks
      debug:
        msg: "Doing work"

    - name: Log completion
      shell: |
        logger -t ansible-playbook -p local0.info \
          "event=task_complete playbook={{ playbook_dir | basename }} \
           host={{ inventory_hostname }} status=success"

Integration with External Systems

ELK Stack Integration

Send Ansible logs to Elasticsearch, Logstash, Kibana:

# Filebeat configuration for Ansible logs
# /etc/filebeat/filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/ansible/ansible.log
    fields:
      app: ansible
      environment: production
    multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
    multiline.negate: true
    multiline.match: after

output.logstash:
  hosts: ["logstash.example.com:5044"]

# Logstash pipeline
# /etc/logstash/conf.d/ansible.conf
input {
  beats {
    port => 5044
  }
}

filter {
  if [fields][app] == "ansible" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} - %{GREEDYDATA:log_message}" }
    }
    date {
      match => [ "timestamp", "ISO8601" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch.example.com:9200"]
    index => "ansible-logs-%{+YYYY.MM.dd}"
  }
}

Splunk Integration

# Configure Splunk Universal Forwarder
# /opt/splunkforwarder/etc/system/local/inputs.conf
[monitor:///var/log/ansible/ansible.log]
disabled = false
index = ansible
sourcetype = ansible:log

# Custom callback for Splunk HEC
# callback_plugins/splunk_hec.py
from ansible.plugins.callback import CallbackBase
import json
import requests

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'notification'
    CALLBACK_NAME = 'splunk_hec'
    CALLBACK_NEEDS_WHITELIST = True

    def __init__(self):
        super(CallbackModule, self).__init__()
        self.hec_url = os.environ.get('SPLUNK_HEC_URL')
        self.hec_token = os.environ.get('SPLUNK_HEC_TOKEN')

    def send_to_splunk(self, event_data):
        if self.hec_url and self.hec_token:
            headers = {
                'Authorization': f'Splunk {self.hec_token}',
                'Content-Type': 'application/json'
            }
            event = {
                'event': event_data,
                'sourcetype': 'ansible:callback'
            }
            requests.post(self.hec_url, headers=headers, json=event)

    def v2_runner_on_ok(self, result):
        event = {
            'event_type': 'task_ok',
            'host': result._host.get_name(),
            'task': result._task.get_name(),
            'changed': result._result.get('changed', False)
        }
        self.send_to_splunk(event)

Datadog Integration

# Custom callback for Datadog
# callback_plugins/datadog.py
from ansible.plugins.callback import CallbackBase
from datadog import initialize, api
import time

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'notification'
    CALLBACK_NAME = 'datadog'
    CALLBACK_NEEDS_WHITELIST = True

    def __init__(self):
        super(CallbackModule, self).__init__()
        options = {
            'api_key': os.environ.get('DATADOG_API_KEY'),
            'app_key': os.environ.get('DATADOG_APP_KEY')
        }
        initialize(**options)
        self.start_time = None

    def v2_playbook_on_start(self, playbook):
        self.start_time = time.time()
        title = f"Ansible Playbook Started: {playbook._file_name}"
        text = f"Playbook {playbook._file_name} execution started"
        tags = ['ansible', 'playbook:start']
        api.Event.create(title=title, text=text, tags=tags)

    def v2_playbook_on_stats(self, stats):
        duration = time.time() - self.start_time

        hosts = sorted(stats.processed.keys())
        for host in hosts:
            s = stats.summarize(host)
            # Send metrics to Datadog
            api.Metric.send(
                metric='ansible.tasks.ok',
                points=s['ok'],
                tags=[f'host:{host}']
            )
            api.Metric.send(
                metric='ansible.tasks.changed',
                points=s['changed'],
                tags=[f'host:{host}']
            )
            api.Metric.send(
                metric='ansible.tasks.failed',
                points=s['failures'],
                tags=[f'host:{host}']
            )

        # Send duration metric
        api.Metric.send(
            metric='ansible.playbook.duration',
            points=duration,
            tags=['playbook:completed']
        )

Audit Logging

Comprehensive Audit Trail

---
- name: Playbook with audit logging
  hosts: all
  vars:
    audit_log: "/var/log/ansible/audit/{{ ansible_date_time.date }}.log"

  tasks:
    - name: Create audit log entry - Start
      lineinfile:
        path: "{{ audit_log }}"
        line: |
          {{ ansible_date_time.iso8601 }} | START | {{ ansible_play_name }} |
          User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
          Playbook: {{ playbook_dir }}/{{ ansible_play_name }}
        create: yes
        mode: '0600'
      delegate_to: localhost

    - name: Sensitive operation
      user:
        name: dbadmin
        state: present
      register: user_result

    - name: Audit log - User created
      lineinfile:
        path: "{{ audit_log }}"
        line: |
          {{ ansible_date_time.iso8601 }} | ACTION | User Management |
          User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
          Action: User created - dbadmin | Changed: {{ user_result.changed }}
      delegate_to: localhost
      when: user_result.changed

    - name: Configuration change
      template:
        src: config.j2
        dest: /etc/app/config.yml
      register: config_result

    - name: Audit log - Configuration changed
      lineinfile:
        path: "{{ audit_log }}"
        line: |
          {{ ansible_date_time.iso8601 }} | ACTION | Configuration |
          User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
          File: /etc/app/config.yml | Changed: {{ config_result.changed }}
      delegate_to: localhost

    - name: Create audit log entry - Complete
      lineinfile:
        path: "{{ audit_log }}"
        line: |
          {{ ansible_date_time.iso8601 }} | COMPLETE | {{ ansible_play_name }} |
          User: {{ ansible_user_id }} | Host: {{ inventory_hostname }} |
          Status: Success
      delegate_to: localhost

Compliance Logging

---
- name: Compliance audit logging
  hosts: all
  vars:
    compliance_log: "/var/log/ansible/compliance.json"

  tasks:
    - name: Log compliance check
      lineinfile:
        path: "{{ compliance_log }}"
        line: "{{ log_entry | to_json }}"
        create: yes
      delegate_to: localhost
      vars:
        log_entry:
          timestamp: "{{ ansible_date_time.iso8601 }}"
          host: "{{ inventory_hostname }}"
          check: "{{ item.name }}"
          status: "{{ item.status }}"
          operator: "{{ ansible_user_id }}"
          playbook: "{{ ansible_play_name }}"
      loop:
        - { name: "SSH Key Authentication", status: "compliant" }
        - { name: "Password Policy", status: "compliant" }
        - { name: "Firewall Status", status: "non-compliant" }

Performance Monitoring

Task Performance Analysis

# ansible.cfg
[defaults]
callback_whitelist = profile_tasks, profile_roles, cgroup_perf_recap

# Output shows:
# - Task execution time
# - Role execution time
# - CPU usage
# - Memory usage

# Export performance data
ansible-playbook site.yml 2>&1 | tee performance.log

# Analyze slow tasks
grep "TASK" performance.log | sort -t '-' -k2 -n | tail -10

Resource Usage Tracking

---
- name: Monitor resource usage during playbook
  hosts: all
  tasks:
    - name: Capture pre-execution metrics
      shell: |
        echo "CPU: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}')"
        echo "Memory: $(free -m | awk 'NR==2{printf "%.2f%%", $3*100/$2 }')"
        echo "Disk: $(df -h / | awk 'NR==2{print $5}')"
      register: pre_metrics

    - name: Your resource-intensive tasks
      include_tasks: heavy_tasks.yml

    - name: Capture post-execution metrics
      shell: |
        echo "CPU: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}')"
        echo "Memory: $(free -m | awk 'NR==2{printf "%.2f%%", $3*100/$2 }')"
        echo "Disk: $(df -h / | awk 'NR==2{print $5}')"
      register: post_metrics

    - name: Log resource usage
      copy:
        content: |
          Before:
          {{ pre_metrics.stdout }}

          After:
          {{ post_metrics.stdout }}
        dest: "/var/log/ansible/resources-{{ inventory_hostname }}.txt"
      delegate_to: localhost

Log Analysis and Reporting

Parsing Ansible Logs

#!/bin/bash
# analyze-ansible-logs.sh

LOG_FILE="/var/log/ansible/ansible.log"

echo "=== Ansible Log Analysis ==="
echo

echo "Total playbook runs:"
grep -c "PLAY \[" "$LOG_FILE"

echo "Failed tasks:"
grep -c "fatal:" "$LOG_FILE"

echo "Changed tasks:"
grep -c "changed:" "$LOG_FILE"

echo "Most common tasks:"
grep "TASK \[" "$LOG_FILE" | awk -F'[\\[\\]]' '{print $2}' | sort | uniq -c | sort -rn | head -10

echo
echo "Failures by host:"
grep "fatal:" "$LOG_FILE" | awk '{print $2}' | sort | uniq -c | sort -rn

echo
echo "Recent errors (last 24 hours):"
find "$LOG_FILE" -mtime -1 -exec grep "ERROR" {} \;

Generating Reports

#!/usr/bin/env python3
# generate-ansible-report.py

import json
import sys
from datetime import datetime
from collections import defaultdict

def parse_json_log(log_file):
    """Parse JSON callback output"""
    with open(log_file) as f:
        data = json.load(f)

    report = {
        'playbook': data.get('playbook', 'Unknown'),
        'start_time': None,
        'end_time': None,
        'duration': 0,
        'total_tasks': 0,
        'hosts': defaultdict(lambda: {'ok': 0, 'changed': 0, 'failed': 0, 'skipped': 0})
    }

    # Process plays and tasks
    for play in data.get('plays', []):
        for task in play.get('tasks', []):
            report['total_tasks'] += 1
            for host, result in task.get('hosts', {}).items():
                if result.get('changed'):
                    report['hosts'][host]['changed'] += 1
                elif result.get('failed'):
                    report['hosts'][host]['failed'] += 1
                elif result.get('skipped'):
                    report['hosts'][host]['skipped'] += 1
                else:
                    report['hosts'][host]['ok'] += 1

    # Print report
    print("=" * 60)
    print(f"Ansible Playbook Report: {report['playbook']}")
    print("=" * 60)
    print(f"\nTotal Tasks: {report['total_tasks']}")
    print(f"\nHost Summary:")
    for host, stats in report['hosts'].items():
        print(f"  {host}:")
        print(f"    OK: {stats['ok']}, Changed: {stats['changed']}, "
              f"Failed: {stats['failed']}, Skipped: {stats['skipped']}")

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("Usage: generate-ansible-report.py ")
        sys.exit(1)

    parse_json_log(sys.argv[1])

Best Practices

Logging Best Practices:
  • Enable logging always: Set log_path in ansible.cfg
  • Rotate logs: Use logrotate to manage log file sizes
  • Structured logging: Use JSON format for machine parsing
  • Centralized logging: Forward logs to central server
  • Sensitive data: Use no_log to prevent logging secrets
  • Audit trail: Log all privileged operations
  • Performance monitoring: Enable profile callbacks
  • Retention policy: Define how long to keep logs
  • Access control: Restrict log file permissions (600 or 640)
  • Monitoring alerts: Set up alerts for failures

Next Steps