What is YAML?

yaml wallpaper

YAML is a human-readable data serialization standard that can be used in conjunction with all programming languages and is often used to write configuration files.

Overview

The recursive YAML acroynym stands for “YAML Ain’t Markup Language,” denoting it as flexible and data-oriented. In fact, it can be used with nearly any application that needs to store or transmit data. Its flexibility is partially due to the fact that YAML is made up of bits and pieces of other languages. A few examples of these similarities include:

  • Scalars, lists, and associative arrays are based on Perl.
  • The document separator “—” is based on MIME.
  • Escape sequences are based on C.
  • Whitespace wrapping is based on HTML.

Features of YAML

Delimiter collision resistancy

YAML relies on indentation for structure, making it resistant to delimiter collision. Some languages require escape characters or sequences, padded quotation marks, and other workarounds for handling special characters. YAML is naturally insensitive to quotation marks and braces, making special characters easier to define, particularly in strings.

Security

In and of itself, YAML has no executable commands. It is simply a data-representation language. However, it’s integration with other languages allows Perl parsers, for example, which can execute Perl code. PyYAML, a parser and emitter for Python, includes documentation specifically warning against this security vulnerability and has a built-in function to protect against dangerous Python objects known as yaml.safe_load.

How YAML Works

Full documentation for YAML can be found on its official site, but outlined below are some simple concepts that are important to understand when starting to use YAML.

  1. Scalars, or variables, are defined using a colon and a space
integer: 25
string: "25"
float: 25.0
boolean: Yes
  1. Associative arrays and lists can be defined using a conventional block format or an inline format that is similar to JSON.
--- # Shopping List in Block Format
- milk
- eggs
- juice

--- # Shopping List in Inline Format
[milk, eggs, juice]
  1. Strings can be denoted with a | character, which preserves newlines, or a > character, which folds newlines.
data: |
   Each of these
   Newlines
   Will be broken up

data: >
   This text is
   wrapped and will
   be formed into
   a single paragraph

YAML vs. JSON

YAML 1.2 is a superset of JavaScript Object Notation (JSON) but has some built-in advantages. For example, YAML can self-reference, support complex datatypes, embed block literals, support comments, and more. Overall, YAML tends to be more readable than JSON as well. Below you can see the same process shown in JSON and YAML.

JSON version

{
  "json": [
    "rigid",
    "better for data interchange"
  ],
  "yaml": [
    "slim and flexible",
    "better for configuration"
  ],
  "object": {
    "key": "value",
    "array": [
      {
        "null_value": null
      },
      {
        "boolean": true
      },
      {
        "integer": 1
      }
    ]
  },
  "paragraph": "Blank lines denotenparagraph breaksn",
  "content": "Or wencan autonconvert line breaksnto save space"
}

YAML version

---
# <- yaml supports comments, json does not 
# did you know you can embed json in yaml? 
# try uncommenting the next line 
# { foo: 'bar' } json: 
#     - rigid 
#     - better for data interchange yaml: 
#     - slim and flexible 
#     - better for configuration object: key: value array: 
#     - null_value: - boolean: true - integer: 1 paragraph: >
#       Blank lines denote

   paragraph breaks
content: |-
   Or we
   can auto
   convert line breaks
   to save space

Most of the time JSON can be converted to YAML and vice-versa. Earlier versions of YAML are not entirely compatible with JSON but most JSON documents can still be parsed using Syck or XS.

Examples of YAML

By integrating their software with YAML, Red Hat developed Ansible, an open source software provisioning, configuration management, and application deployment tool. Ansible temporarily connects to servers via Secure Shell (SSH) to perform management tasks using playbooks which are blocks of YAML code that automate manual tasks.

In the example below, the playbook verify-apache.yml has been defined.

---
- hosts: webservers

  vars:
    http_port: 80
    max_clients: 200

  remote_user: root

  tasks:
  - name: ensure apache is at the latest version
    yum:
      name: httpd
      state: latest

  - name: write the apache config file
    template:
      src: /srv/httpd.j2
      dest: /etc/httpd.conf
    notify:
    - restart apache

  - name: ensure apache is running
    service:
      name: httpd
      state: started

  handlers:
    - name: restart apache
      service:
        name: httpd
        state: restarted

This job indicates that it should only be run on the hosts in the webservers group and that the job should be run as the remote user, root. There are three tasks in this playbook:

  1. The first task updates Apache to the latest version using Red Hat’s yum command.
  2. The second task uses template to copy over the apache configuration file. Once the configuration file is written, the Apache service is restarted.
  3. The third task starts the Apache service, just in case it did not come back up.

Now that the playbook has been written, it has to be run from the command line. Although the paths will vary based on the environment, the playbook can be run using this command:

ansible-playbook -i hosts/groups verify_apache.yml

The i option indicates which file contains the list of servers in the webservers group, which will limit the servers the playbook executes on.

Key Takeaways

  • YAML is a data-oriented language that has features derived from Perl, C, HTML, and other languages.
  • YAML is a superset of JSON that comes with multiple built-in advantages such as including comments, self-referencing, and support for complex datatypes.
  • Multiple software packages have implemented YAML to create powerful configuration management tools such as Red Hat’s Ansible.