Honeybee Server Cluster Provisioning
View Project Source: https://github.com/jessecascio/honeybee
As a developer it's important focus on rapid, scalable development. When a client, or colleague, has an idea to work on, the goal is to get a prototype up and running quickly to start testing demand. If the prototype is successful, the infrastructure needs to quickly grow with the application, while not detracting from development time. With larger teams, members have more specialty roles, i.e. database admin, server admin, QA/release manager, etc. When working alone, or as a pair, that luxury cannot be afforded. The knowledge of the team needs to be wide and sacrifices need to be made during development/deployment in order to keep the project moving forward.
Simply put, Honeybee is a templating structure built around the Python Fabric library to organize the shell provisioning of various servers in a cloud cluster. By setting a standard for where provision commands are stored, and placing installation instructions in centralized locations, Honeybee allows for reusability of installation instructions across various web application servers. This facilitates rapid, provider agnostic, server provisioning, easy infrastructure expansion, and seamless server migrations.
The Problem
The inspiration for Honeybee came from a combination of Amazon's AWS OpWorks and Chef. OpWorks allows definition of Chef cookbooks, which are installation instructions written in Ruby, for a layer of an application stack. When a new server instance is brought up into that layer, the instance is automatically provisioned defined by the Chef recipes' instructions.
The obvious issue with the OpWorks approach is the restriction to using Amazon. Amazon really thrives with their various managed services and reliability, but their smallest instances have poorer network performance than the larger instances, and OpWorks takes about 100-150MB of RAM, which on a micro is 10-15% of available RAM, not an option for a smaller stack.
The next solution was to use Puppet or Chef as a standalone service i.e. have a server in the cluster which handles propagating the instructions to the various nodes, but again in a smaller stack this is not practical. The Chef server cannot be run on a 512MB instance, and installing, configuring, and maintaining a Chef server has a considerable learning curve, which with the goal being getting some servers up and running for development, is not worth the time investment to learn or incorporate into the architecture.
There are also numerous smaller scale libraries available to aid with provisioning, such as Capistrano, Capfony, Phing, Fabric, etc. However provisioning a server generally requires research on how to configure necessary services, testing in a VM, then applying instructions to the servers, i.e. all command line bash. So the patience to learn new syntaxes, languages, tools, configurations, which detract from the simplicity of running shell commands, is limited.
Python Fabric
Fabric is a Python library for running shell commands across multiple servers in a cluster. Fabric will SSH into a defined hosts list and run the specified bash commands. Outside of brushing up on Python, which is very similar to PHP and JavaScript, there was very little learning curve to use the library and all the server instructions can be executed as shell commands.
Honeybee
As mentioned before, Honeybee is just a template built around Fabric. There are various tasks that are common amongst a variety of different application stacks and Honeybee strives to centralize these commands so that they can be reused on future projects. By defining how a server is provisioned programmatically, when another server needs to be brought up, it can be done using the exact same instructions, ensuring servers of similar types are configured the same.
Another benefit to having all the provisioning saved and mapped out, is if the stack needs to be migrated to another cloud provider, or OS, all the configurations, settings, services, will be automatically installed onto the new application stack or at least readily available to aid in migration.
Getting Started
Honeybee requires both Python 2.7 and Fabric
sudo apt-get install python fabric
Clone the code and add the directory to the Python path:
vi ~/.bashrc alias beekeeper='fab' export PYTHONPATH=$PYTHONPATH:/path/to/honeybee
OPTIONAL: Setting an alias to the Fabric command is not required, just done to keep the Honeybee theme going for this write up. The fab command can be run directly instead.
# reload to enforce source ~/.bashrc # verify beekeeper --version echo $PYTHONPATH
Before proceeding, there should be a basic understanding of how Fabric works. Take the time to read through the Fabric tutorial to get an idea of how the tool is used.
Folder Structure
At the root of the Honeybee directory there are three folders, applications, src, and test. Applications is where the various application instructions live, src is where the bootstrapping files and shared utility files are located, and test currently just holds a Vagrantfile for a local cluster.
src
Src is broken into different server types. Currently only Ubuntu has any instructions, but this is where additional server types would go. The utility folder is shared functionality used throughout the applications which is not reliant on the server type. There is a only a single file in their, common.py, as theres not much shared functionality, but as more functionality is added more granular files will be created.
Hive
Hive is the Python script for generating the application folder structure. Since Honeybee is just a template system built around Fabric, it's important to keep folders and files consistent among projects for ease of use. Hive will copy files and update many of the configs to minimize additional work the developer has to do.
LAMP Cluster
Let's start with a two server cluster for setting up two PHP websites, applicationsite.com and samplesite.com. The first server, web, will use Apache2 to serve the sites, and the second server, mysql, will run the MySQL server. The source code for this example can be found on github. To create the lamp application with two servers, web and mysql, use hive.py:./hive.py generate app lamp ./hive.py generate server lamp web ./hive.py generate server lamp mysql
NOTE: If applications/lamp dir already exists, the above commands will throw an error, in which case continue reading.
Now the source files for the lamp project are located in the applications folder, along with two servers. At the root of the lamp directory, is the fabfile.py, which is the file Fabric uses to run commands. It should be minimal, only defining functionality that is applicable to the whole application stack, IP addresses, users, etc. In each of the server directories, servers/web and servers/mysql, there is a honeycomb.py file which is the “controller" for how the servers are provisioned.
The honeycomb.py file only contains and exports three funtions, which are also Fabric tasks, as denoted by @task(): plant, pollinate, and harvest:
plant - This is the initial provision of the server when it is first started. Everything needed to set up the server, that does not need to happen more than once, should be a part of this function: installing packages, setting up users, creating base folder structure, defining mounts/swap/firewall rules, etc.
pollinate - This function is for importing configurations that may change, such as database, web server configurations. Actions in this function should be non destructive, i.e. avoid service reboots, anything that would cause downtime, as it is meant to be run multiple times as configurations change.
harvest - This is the actual deployment process. If another tool is used for deployment, this can be ignored, but as a simple use case this function could pull code to a server, run unit tests, and merge.
Additional helper functions can be added to the honeycomb files, but they should not be exported in order to keep the fab list minimal.
In situations where specialty tasks are needed, say to SSH in and run a script, those should be defined in lamp/tasks, this helps keep the function list consistent for all server types. Each server will have plant, pollination, and harvest, and that is it. Specialty tasks would then be denoted by, task.run_script, etc.
Web Provision
In the web server honeycomb file, update the plant function to do all the necessary initial provisioning:
NOTE: Depending on the host provider, all these steps may not be relevant. This example will be written for provisioning with DigitalOcean.
# import needed src files from src.ubuntu.apt import * from src.ubuntu.git import * from src.ubuntu.apache import * from src.ubuntu.php import * from src.ubuntu.iptables import * @task() @roles('web') def plant(): # (1) enable swap swap('2G') # (2) set up ssh rules scp('templates/sshd_config', '/etc/ssh/sshd_config', 'root', '644') sudo('service ssh reload') # (3) set the iptables iptables_web() # (4) install services apt_update() git() apache2() php55() # (5) set default apache configurations sudo('a2dismod status') sudo('a2enmod headers') sudo('a2enmod rewrite') sudo('service apache2 restart') # (6) pull code git_clone('git@github.com:symfony/symfony.git','/var/www/symfony','www-data','750')
- Set up the swap space, since DigitalOcean does not offer any by default. SEE: src/utility/common.py#swap()
- For security reasons it's good practice to disable root SSH access, and require a SSH key. By using the scp() function local config files are pushed to the server. Get a fresh copy of what an sshd_config should look like, and place it in the lamp/template folder as it will be used with all server types. Update to disable root access and require SSH keys. This also ensures all servers in the stack will have the same SSH rules. SEE: src/utility/common.py#scp(), applications/lamp/templates/sshd_config
- Update the iptables, which are the firewall settings, setting the restrictions to be a web server. At a minimum, a web server should only allow access via 80/443 for web and 22 for SSH, as well as the port for git, 9418. Currently there are some generic Ip rules set up in the src/ubuntu dir, but these should be tailored for custom needs. SEE: src/ubuntu/iptables.py
- Update Ubuntu and install the base services. SEE src/ubuntu
- Do some custom Apache configurations, turn on modules, etc.
- Pull the code to the server. This does not have to be done here, but remember the pollinate function is meant to be run multiple times, and the harvest is for deploying, whereas the code base only needs to be cloned once. SEE src/ubuntu/git.py#git_clone()
These are all the configurations that are needed for the web server provisioning. In the pollinate function we will place actions that could happen more than once, such as configuration changes. Apache and PHP custom configurations will go there:
from os import walk @task() @roles('web') def pollinate(): # (1) set up the php config files rsync(templates + '/php/', '/etc/php5/mods-available/', 'root', '644') # (2) set up the apache config files scp(templates + '/apache/apache2.conf', '/etc/apache2/apache2.conf', apache) scp(templates + '/apache/security.conf', '/etc/apache2/conf-available/security.conf', apache) scp(templates + '/apache/mpm_prefork.conf', '/etc/apache2/mods-available/mpm_prefork.conf', apache) # (3) move the vhosts and enable them rsync(templates + '/apache/vhosts/', '/etc/apache2/sites-available/', apache, '644') for (dirpath, dirnames, filenames) in walk(templates+'/apache/vhosts/'): for file in filenames: sudo('a2ensite %(file)s'%{'file':file}) # (4) reload server sudo('service apache2 reload')
- The custom PHP files all live in the same location and can therefore be moved up in batch via the rsync() function. SEE: src/utility/common.py#rsync()
- Each of the Apache conf files are moved separately since they have different locations
- Move all the Apache vhost files up, and recursively enable them
- Reload the server
Now whenever pollinate is called, all of the service configurations will be up to date. Whether they have been changed, or new ones are added, the server is never restarted to ensure no down time on configuration updates. As new web servers are brought up, we can ensure they all have the exact same base configurations.
The final function harvest will not be set up here as deployment procedures vary quite differently from project to project, but a basic flow would be:
- Pull code base into a temp directory and run unit tests
- Update composer dependencies
- Merge code into working folder
MySQL Provision
The MySQL provision will be very similar to the web provision, with the difference of the SSH tunnel. Typically in a server cluster, the only outward facing servers i.e. open to internet traffic, would be the web servers. All internal servers should be closed off and only allowed access from specific servers in the cluster. This makes provisioning a bit tricky, as the database server should not be directly accessed. This is where the SSH tunnel comes into play.
from src.ubuntu.apt import * from src.ubuntu.mysql import * from src.ubuntu.iptables import * @task() @roles('mysql') def plant(): web_pub = env.roledefs['web'][0] web_pri = env.private['web'][0] mysql_pub = env.tunnel['mysql'][0] mysql_pri = env.private['mysql'][0] # (1) set up tunnel info tunnel(web_pub, '2024', mysql_pub) # add swap swap('2G') # set up ssh rules scp('templates/sshd_config', '/etc/ssh/sshd_config', 'root', '644') sudo('service ssh reload') # (2) install services apt_update() mysql56(env.database_password) # (3) default db/users run('mysql -uroot -p%(pwd)s -e "CREATE DATABASE datahouse"'%{'pwd':env.database_password}) run('mysql -uroot -p%(pwd)s -e "CREATE USER \'rick\'@\'%(ip)s\' IDENTIFIED BY \'supersecret\'"'%{'pwd':env.database_password,'ip':web_pri,}) run('mysql -uroot -p%(pwd)s -e "GRANT ALL PRIVILEGES ON datahouse.* TO \'rick\'@\'%(ip)s\'"'%{'pwd':env.database_password,'ip':web_pri}) run('mysql -uroot -p%(pwd)s -e "FLUSH PRIVILEGES"'%{'pwd':env.database_password}) # (4) default db settings template(templates + '/my.cnf', '/etc/mysql/my.cnf', {'%%BIND-IP%%':mysql_pri}, 'root', '644') sudo('service mysql restart') # (5) build allowable ips to access server, set iptables ips = env.roledefs['web'] + env.private['web'] iptables_database(ips) # (6) kill tunnel detunnel(‘2024')
- In order to set up a tunnel, the SSH is run through an “open" server such as a web server. The tunnel needs the web server's public IP, the local port to attach, and the public IP of the mysql server. When tunneling the roledef of the tunneled server looks like: 127.0.0.1:2024. There is an additional env dict for the public IP addresses of the tunneled servers. SEE: src/utility/common.py#tunnel(), applications/lamp/fabfile.py
- When installing the MySQL service, a default password is passed into the install which is defined in config/config.ini. In the fabfile.py, located in applications/lamp, there is a config() call which loads all the vars from the config file into a dict: subsection_option
- Create the default database users, only allowing access from the webserver's private IP. If there were multiple web servers there would have to be more advanced access control.
- Upload the default MySQL conf settings. The template() function is used opposed to the scp() so that the conf file will be correctly updated with the MySQL IP address. SEE: src/utility/common.py#template()
- The IP tables for the database should only allow access to the database and SSH ports and only from specific IPs i.e. the web IPs. SEE: src/ubuntu/iptables.py
- Remove the SSH tunnel
For the pollinate just add the database configuration file so new configurations can be pushed to the servers:
@task() @roles("mysql") def pollinate(): # set up tunnel info tunnel(env.roledefs['web'][0], '2024', env.tunnel['mysql'][0]) # reload the conf file template(templates + '/my.cnf', '/etc/mysql/my.cnf', {'%%BIND-IP%%':env.private['mysql'][0]}, 'root', '644') sudo('service mysql reload') # kill tunnel detunnel('2024')
The harvest function would likely not be needed with a database server.
Now running fab list will show all the available provisioning functions
cd applications/lamp beekeeper --list Available commands: mysql.harvest mysql.plant mysql.pollinate web.harvest web.plant web.pollinate
Testing
In order to run the testing, Vagrant must be installed.
In the test/lamp dir, update the Vagrant file and make sure there are two servers under the VM Configs section:
config.vm.define "web" do |web| web.vm.network "forwarded_port", guest: "80", host: "1028", host_ip: "127.0.0.1" web.vm.network "private_network", ip: "10.2.2.2" end config.vm.define "mysql" do |mysql| mysql.vm.network "forwarded_port", guest: "3306", host: "1032", host_ip: "127.0.0.1" mysql.vm.network "private_network", ip: "10.2.2.4" end
After Vagrant has been brought up a couple things should be done to make provisioning easier (default Vagrant password is vagrant).
# copy key to vagrant boxes ssh-copy-id vagrant@10.2.2.2 ssh-copy-id vagrant@10.2.2.4 # copy a github key to the server scp /home/jesse/.ssh/id_rsa vagrant@10.2.2.2:/home/vagrant/.ssh
Also need to update lamp/fabfile.py to reflect the IP addresses:
env.roledefs = { 'web' : ['10.2.2.2'], 'mysql' : ['127.0.0.1:2024'], } env.private = { 'web' : ['10.2.2.2'], 'mysql' : ['10.2.2.4'], } env.tunnel = { 'mysql' : ['10.2.2.4'] }
roledefs - The public IP addresses of the different server types. If tunneling, need to use the local ip address and port used with the tunnel
private - The private IP addresses of the servers
tunnel - If tunneling is used, will be the actual public IP address of the tunneled machine
Once the configurations are set, the servers can be provisioned.
NOTE: Monitor the output, there will be a prompt when Vagrant attempts to clone the code base.
beekeeper web.plant --user=vagrant beekeeper web.pollinate --user=vagrant beekeeper mysql.plant --user=vagrant beekeeper mysql.pollinate --user=vagrant
Both servers are now provisioned, SSH in and verify all the settings, services, and configs.
NOTE: More automated testing could be done similar to the way Test Kitchen tests, using bats or other similar library.
Going Live
Now that the instructions are set up and tested, the servers are ready to go live. This example will use DigitalOcean droplets, with an afterthought for Amazon's AWS.
Log into a DigitalOcean account and create two 512MB Ubuntu 14.04 droplets, lamp.web and lamp.mysql, in the same region. Be sure to enable Private Networking when setting the droplet options.
Unlike Amazon, DigitalOcean gives full root access to each droplet. This is useful but also a hazard. First thing to do with the DO droplets is to create a user with sudo access to limit the use of the root user. This will be done manually. SSH into each of the boxes and create the user (replace with correct IP addresses):
# server 1 ssh root@104.236.73.117 # server 2 ssh root@104.236.73.118 adduser jesse adduser jesse sudo # prevent password echo "jesse ALL=(ALL:ALL) NOPASSWD:ALL" | sudo tee -a /etc/sudoers
NOTE: For AWS users, Amazon does not give root access, simply just a sudo user, ubuntu. There is no reason to create the new user, simply just run the fab file as the ubuntu user. Also there is no need to set up swap or configure the iptables as that is done differently in AWS i.e. security groups. Those instructions can be removed from the honeycomb files.
After creating the users, from the local machine copy the SSH key and the github key to the servers:
# copy ssh key to both servers ssh-copy-id jesse@104.236.73.117 ssh-copy-id jesse@104.236.73.118 # copy github key to web scp /home/jesse/.ssh/id_rsa jesse@104.236.73.117:/home/jesse/.ssh
NOTE: This is a perfect example of a task. This is an action that has to be done on each server, so add a Python file to lamp/tasks with a @task() for adding the user with correct permissions and keys.
Update the lamp/fabfile.py with all the correct IP addresses:
env.roledefs = { 'web' : ['104.236.73.117'], 'mysql' : ['127.0.0.1:2024'] } env.private = { 'web' : ['10.132.238.232'], 'mysql' : ['10.132.139.57'] } env.tunnel = { 'mysql' : ['104.236.73.118'] }And provision
beekeeper web.plant --user=jesse beekeeper web.pollinate --user=jesse beekeeper mysql.plant --user=jesse beekeeper mysql.pollinate --user=jesse
Conclusion
What Honeybee offers is quick, simple, cloud cluster provisioning. I created it to aid in rapid deployment of applications to web farms. With various enterprise level deployment options, Chef, Puppet, Docker; Honeybee strived to be basic with minimal learning curve in order to focus on development.
Although it's fairly basic at this point, my goal with the tutorial was to outline how other server types and configurations could be added to make the tool more robust for various different application types.