data:image/s3,"s3://crabby-images/0e444/0e444593ff2415cd804732c3080d28aeec9d888d" alt="Puppet:Mastering Infrastructure Automation"
Structuring configuration data in a hierarchy
In the previous section, we reduced the data problem to a simple need for key/value pairs that are specific to each node under Puppet management. Puppet and its manifests then serve as the engine that generates actual configuration from these minimalistic bits of information.
A simplistic approach to this problem is an ini
style configuration file that has a section for each node that sets values for all configurable keys. Shared values will be declared in one or more general sections:
[mysql] buffer_pool=15G log_file_size=500M ... [xndp12-sql01.example.net] psk=xneFGl%23ndfAWLN34a0t9w30.zges4 server_id=1
Rails applications customarily do something similar and store their configuration in a YAML format. The user can define different environments, such as production
, staging
, and testing
. The values that are defined per environment override the global setting values.
This is quite close to the type of hierarchical configuration that Puppet allows through its Hiera binding. The hierarchies that the mentioned Rails applications and ini
files achieve through configuration environments are quite flat—there is a global layer and an overlay for specialized configuration. With Hiera and Puppet, a single configuration database will typically handle whole clusters of machines and entire networks of such clusters. This implies the need for a more elaborate hierarchy.
Hiera allows you to define your own hierarchical layers. There are some typical, proven examples, which are found in many configurations out there:
- The
common
layer holds default values for all agents - A
location
layer can override some values in accordance with the data center that houses each respective node - Each agent machine typically fills a distinct
role
in your infrastructure, such aswordpress_appserver
orpuppetdb_server
- Some configuration is specific to each single
machine
For example, consider the configuration of a hypothetical reporting client. Your common
layer will hold lots of presets such as default verbosity settings, the transport compression option, and other choices that should work for most machines. On the location
layer, you ensure that each machine checks in to the respective local server—reporting should not use WAN resources.
Settings per role are perhaps the most interesting part. They allow fine-grained settings that are specific to a class of servers. Perhaps your application servers should monitor their memory consumption in very close intervals. For the database servers, you will want a closer view at hard drive operations and performance. For your Puppet servers, there might be special plugins that gather specific data.
The machine
layer is very useful in order to declare any exceptions from the rule. There are always some machines that require special treatment for one reason or another. With a top hierarchy layer that holds data for each single agent, you get full control over all the data that an agent uses.
These ideas are still quite abstract, so let's finally look at the actual application of Hiera.
Configuring Hiera
The support for retrieving data values from Hiera has been built into Puppet since version 3. All you need in order to get started is a hiera.yaml
file in the configuration directory.
As the filename extension suggests, the configuration is in the YAML format and contains a hash with keys for the backends, the hierarchy, and backend-specific settings. The keys are noted as Ruby symbols with a leading colon:
# /etc/puppetlabs/code/hiera.yaml :backends: - yaml :hierarchy: - node/%{::clientcert} - role/%{::role} - location/%{::datacenter} - common :yaml: :datadir: /etc/puppetlabs/code/environments/%{::environment}/hieradata
Note that the value of :backends
is actually a single element array. You can pick multiple backends. The significance will be explained later. The :hierarchy
value contains a list of the actual layers that were described earlier. Each entry is the name of a data source. When Hiera retrieves a value, it searches each data source in turn. The %{}
expression allows you to access the values of Puppet variables. Use only facts or global scope variables here—anything else will make Hiera's behavior quite confusing.
Finally, you will need to include configurations for each of your backends. The configuration above uses the YAML backend only, so there is only a hash for :yaml
with the one supported :datadir
key. This is where Hiera will expect to find YAML files with data. For each data source, the datadir
can contain one .yaml
file. As the names of the sources are dynamic, you will typically create more than four or five data source files. Let's create some examples before we have a short discussion on the combination of multiple backends.
Storing Hiera data
The backend of your Hiera setup determines how you have to store your configuration values. For the YAML backend, you fill datadir
with files that each holds a hash of values. Let's put some elements of the reporting engine configuration into the example hierarchy:
# /etc/puppetlabs/code/environments/production/hieradata/common.yaml reporting::server: stats01.example.net reporting::server_port: 9033
The values in common.yaml
are defaults that are used for all agents. They are at the broad base of the hierarchy. Values that are specific to a location
or role
apply to smaller groups of your agents. For example, the database servers of the postgres
role should run some special reporting plugins:
# /etc/puppetlabs/code/environments/production/hieradata/role/postgres.yaml reporting::plugins: - iops - cpuload
On such a higher layer, you can also override the values from the lower layers. For example, a role-specific data source such as role/postgres.yaml
can set a value for reporting::server_port
as well. The layers are searched from the most to the least specific, and the first value is used. This is why it is a good idea to have a node-specific data source at the top of the hierarchy. On this layer, you can override any value for each agent. In this example, the reporting node can use the loopback interface to reach itself:
#/etc/puppetlabs/.../hieradata/node/stats01.example.net.yaml reporting::server: localhost
Each agent receives a patchwork of configuration values according to the concrete YAML files that make up its specific hierarchy.
Don't worry if all this feels a bit overwhelming. There are more examples in this chapter. Hiera also has the charming characteristic of seeming rather complicated on paper, but it feels very natural and intuitive once you try using it yourself.
Choosing your backends
There are two built-in backends: YAML and JSON. This chapter will focus on YAML, because it's a very convenient and efficient form of data notation. The JSON backend is very similar to YAML. It looks for data in .json
files instead of .yaml
for each data source; these files use a different data notation format.
The use of multiple backends should never be truly necessary. In most cases, a well-thought-out hierarchy will suffice for your needs. With a second backend, data lookup will traverse your hierarchy once per backend. This means that the lowest level of your primary backend will rank higher than any layer from additional backends.
In some cases, it might be worthwhile to add another backend just to get the ability to define even more basic defaults in an alternative location—perhaps a distributed filesystem or a source control repository with different commit privileges.
Also, note that you can add custom backends to Hiera, so these might also be sensible choices for secondary or even tertiary backends. A Hiera backend is written in Ruby, like the Puppet plugins. The details of creating such a backend are beyond the scope of this book.
You have studied the theory of storing data in Hiera at length, so it's finally time to see how to make use of this in Puppet.