Best Practices

12 Common Ansible Mistakes and How to Fix Them

Teach me Ansible | 2025-01-18 | 18 min read

Avoid these 12 common Ansible mistakes that trip up beginners and experienced users alike. Learn the right way to write maintainable, efficient, and reliable automation.

1. Using Shell/Command Instead of Modules

❌ Wrong Way

- name: Install nginx
  shell: apt-get install nginx -y

✅ Right Way

- name: Install nginx
  apt:
    name: nginx
    state: present

Why? Modules are idempotent, handle errors better, and are cross-platform. Shell commands run every time and may cause unexpected changes.

2. Not Using Check Mode Before Production

❌ Wrong Approach

# Running directly in production
ansible-playbook production.yml

✅ Right Approach

# Always dry-run first
ansible-playbook production.yml --check --diff

# Then run for real
ansible-playbook production.yml

Why? Check mode shows what would change without making actual changes. The --diff flag shows exact differences.

3. Hardcoding Values Instead of Using Variables

❌ Wrong Way

- name: Configure nginx
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  # worker_processes hardcoded in template

✅ Right Way

- name: Configure nginx
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  vars:
    nginx_worker_processes: 4
    nginx_worker_connections: 1024

# In nginx.conf.j2:
worker_processes {{ nginx_worker_processes }};
events {
    worker_connections {{ nginx_worker_connections }};
}

Why? Variables make playbooks reusable across different environments and easier to maintain.

4. Not Using Ansible Vault for Secrets

❌ Wrong Way

# vars/database.yml - PLAIN TEXT PASSWORDS!
db_password: "SuperSecret123"
api_key: "sk_live_abc123xyz"

✅ Right Way

# Encrypt sensitive files
ansible-vault encrypt vars/secrets.yml

# Use vault in playbook
ansible-playbook site.yml --ask-vault-pass

# Or use vault password file
ansible-playbook site.yml --vault-password-file ~/.vault_pass

Why? Never commit plain-text passwords to version control. Vault encrypts sensitive data securely.

5. Ignoring Idempotency

❌ Wrong Way

- name: Add line to config
  shell: echo "export PATH=/opt/app/bin:$PATH" >> ~/.bashrc

- name: Create backup
  shell: cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.$(date +%s)

✅ Right Way

- name: Add line to config
  lineinfile:
    path: ~/.bashrc
    line: "export PATH=/opt/app/bin:$PATH"
    state: present

- name: Create backup
  copy:
    src: /etc/nginx/nginx.conf
    dest: /etc/nginx/nginx.conf.backup
    remote_src: yes
    force: no  # Don't overwrite if exists

Why? Idempotent tasks can be run multiple times safely without causing duplicate entries or unwanted changes.

6. Not Using Roles for Organization

❌ Wrong Way

# One massive 500-line playbook
- name: Configure entire infrastructure
  hosts: all
  tasks:
    # 100 tasks here...
    # Hard to maintain, reuse, or test

✅ Right Way

- name: Configure web servers
  hosts: webservers
  roles:
    - common
    - nginx
    - ssl
    - monitoring

- name: Configure database servers
  hosts: databases
  roles:
    - common
    - postgresql
    - backup

Why? Roles make playbooks modular, reusable, and easier to test. Each role has a single responsibility.

7. Not Handling Changed vs Failed States Properly

❌ Wrong Way

- name: Check if service is running
  command: systemctl is-active nginx
  register: result
  # This fails if nginx is stopped!

✅ Right Way

- name: Check if service is running
  command: systemctl is-active nginx
  register: result
  changed_when: false
  failed_when: false

- name: Start nginx if not running
  service:
    name: nginx
    state: started
  when: result.rc != 0

Why? Properly define when tasks should be marked as changed or failed to avoid false positives/negatives.

8. Using sudo Instead of become

❌ Wrong Way (Deprecated)

- name: Install package
  apt:
    name: nginx
  sudo: yes
  sudo_user: root

✅ Right Way

- name: Install package
  apt:
    name: nginx
  become: yes
  become_user: root  # Optional, defaults to root

Why? The sudo syntax is deprecated. Use become for privilege escalation.

9. Not Using Tags for Selective Execution

❌ Wrong Way

# Run entire playbook even for small changes
ansible-playbook site.yml  # Takes 30 minutes

✅ Right Way

tasks:
  - name: Install packages
    apt:
      name: "{{ item }}"
    loop: [nginx, postgresql]
    tags: [install, packages]

  - name: Configure services
    template:
      src: "{{ item }}.conf.j2"
      dest: "/etc/{{ item }}/{{ item }}.conf"
    loop: [nginx, postgresql]
    tags: [config]
    notify: restart services

# Run only config tasks
ansible-playbook site.yml --tags config

Why? Tags allow you to run specific parts of playbooks, saving time during testing and deployment.

10. Poor Error Handling

❌ Wrong Way

- name: Deploy application
  command: /deploy.sh
  ignore_errors: yes  # Swallowing all errors!

✅ Right Way

- name: Deploy application
  block:
    - name: Run deployment
      command: /deploy.sh
      register: deploy_result

    - name: Verify deployment
      uri:
        url: "http://localhost:8080/health"
        status_code: 200
      retries: 5
      delay: 10

  rescue:
    - name: Rollback on failure
      command: /rollback.sh

    - name: Send alert
      mail:
        subject: "Deployment Failed"
        body: "{{ deploy_result.stderr }}"

  always:
    - name: Clean up temp files
      file:
        path: /tmp/deploy
        state: absent

Why? Proper error handling with blocks allows recovery and cleanup, rather than silently ignoring failures.

11. Not Using Handlers for Service Restarts

❌ Wrong Way

- name: Update nginx config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf

- name: Restart nginx
  service:
    name: nginx
    state: restarted  # Restarts even if config didn't change!

✅ Right Way

- name: Update nginx config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: restart nginx

handlers:
  - name: restart nginx
    service:
      name: nginx
      state: restarted  # Only runs if notified

Why? Handlers only run when notified and run once at the end, even if notified multiple times.

12. Not Testing Playbooks

❌ Wrong Way

# Write playbook, commit, deploy to production
git add playbook.yml
git commit -m "New playbook"
ansible-playbook -i production playbook.yml  # 🔥 YOLO!

✅ Right Way

# Test progression
ansible-playbook playbook.yml --syntax-check
ansible-lint playbook.yml
ansible-playbook playbook.yml --check
ansible-playbook -i staging playbook.yml
# Only then:
ansible-playbook -i production playbook.yml

Why? Testing catches errors early and prevents production outages. Always test in staging first.

Bonus: Performance Mistakes

Mistake: Serial Execution When Parallel is Possible

# Slow - runs on one host at a time
- hosts: webservers
  serial: 1  # Why?!
  tasks:
    - name: Update package cache
      apt:
        update_cache: yes

Better: Use Appropriate Forks

# Fast - runs on 20 hosts at once
- hosts: webservers
  tasks:
    - name: Update package cache
      apt:
        update_cache: yes

# In ansible.cfg:
[defaults]
forks = 20

Pro Tip

Use ansible-lint to automatically catch many of these mistakes: pip install ansible-lint && ansible-lint playbook.yml

Quick Reference Checklist

  • ✅ Use modules instead of shell/command
  • ✅ Always test with --check before production
  • ✅ Use variables for all configurable values
  • ✅ Encrypt secrets with ansible-vault
  • ✅ Write idempotent tasks
  • ✅ Organize code with roles
  • ✅ Use handlers for service restarts
  • ✅ Tag tasks for selective execution
  • ✅ Handle errors with blocks/rescue/always
  • ✅ Test playbooks before deploying

Conclusion

Avoiding these common mistakes will make your Ansible automation more reliable, maintainable, and efficient. Start implementing these best practices today and watch your infrastructure automation improve!