Cloud == ?
Friday, July 20, 2012
Moved to mattoncloud.org
I've moved my blogging to Matt On Cloud. Things will be quiet here now, so please come visit me on the new site!
Thursday, August 11, 2011
Java EE6 / CDI running in OpenShift
Java EE6 runing in JBoss AS7 brings a lot of new interesting technology to the table. It brings the capabilities of EJB into a dependency injection model with a ton of simplifications. Think Ruby on Rails simplicity coupled with the power of Java EE... Add to that OpenShift's ability to build and deploy code server side and you can start writing an EE6 CDI application with the ease of writing a PHP or Rails application. The syntax is even simplied to the point where I am actually able to get a decent amount of work done in vim. Don't believe me? Well, walk through an example with me and I'll let you decide for yourself.
Also, if you'd rather watch a screencast on the this, check out Pete Muir's excellent screencast on the topic:
http://www.vimeo.com/27605997
For some help setting up the client tools, you can check out the following screencasts that walk you through the process for various operating systems:
http://vimeo.com/27478061
http://vimeo.com/27444290
http://vimeo.com/27493566
A domain will be used in your url. I wanted http://<app>-oncloud.rhcloud.com so I ran:
rhc-create-domain -n oncloud -l <my-email>
rhc-create-app -a seamrocks -t jbossas-7.0
cd seamrocks
git remote add upstream -m master git://github.com/openshift/seambooking-example.git
git pull -s recursive -X theirs upstream master
git push
Open up your favorite editor and load the file src/main/resources/messages.properties
git commit -a -m "Making my mark"
git push
rhc-ctl-app -a <app> -c restart
rhc-tail-files -a <app>
Also, if you'd rather watch a screencast on the this, check out Pete Muir's excellent screencast on the topic:
http://www.vimeo.com/27605997
Step 1. Register on OpenShift
A free account on OpenShift will provide you with a free runtime environment for this demo. Yep, a free EE6 runtime environment that is pubicly accessible. Cool, huh? First, signup for a new account. You'll need to validate your email address after registering - just click the link in the email you get. After that, you should get an email once you have been granted access. Should be quick - we are sending people through as fast as we can.
For some help setting up the client tools, you can check out the following screencasts that walk you through the process for various operating systems:
http://vimeo.com/27478061
http://vimeo.com/27444290
http://vimeo.com/27493566
Step 2. Create your domain
A domain will be used in your url. I wanted http://<app>-oncloud.rhcloud.com so I ran:
Step 3. Create a JBoss application template
I want my URL to be http://seamrocks-oncloud.rhcloud.com so I ran:
At this point, you will have our 'Hello World' application running. It's a very simple, Maven-based application. You will see a pom.xml that specifies how to build and deploy the application as well as a 'src' directory that contains the application structure.
Step 4. Switch your application to the Seam Booking Quickstart
We maintain a lot of OpenShift quickstart applications out on GitHub. This gives people an easy way to get fairly sophisticated applications running in minutes. Here is how to get the Seam Booking example running (also documented in the GitHub readme):
git remote add upstream -m master git://github.com/openshift/seambooking-example.git
git pull -s recursive -X theirs upstream master
git push
Step 5. Wait for it...
Now is where the really cool part happens. You just pushed up a bunch of Java source file with a POM file. OpenShift will detect this and automatically kick off your Maven build. Be warned, downloading all the dependencies and doing the build will take about 20 minutes the first time you build. After the dependencies are downloaded, subsequent builds just take seconds. Kick back, grab a cup of coffee and wonder what you are going to do with that old build system...
When the build is done, your application will be automatically deployed and running. In my case, that application is running at:
Step 6. Fire up your text editor
That's right - let's actually do some Java editing with just a text editor! To keep this simple, we will just change something that requires a new build to be visible. You could just as easily change a JSF template, or one of the Java files - it all works the same. In this case, we are going to change the message bundle to put our mark on the demo app.
Next, change the following text in the file:
from:
home_header=About this example application to:
home_header=MY APP!!Now save the file and close it. Lastly, let's commit and push the change:
git push
Gotchas
One gotcha is that is that while JBoss is deploying, an error response (or 404) can sometimes be cached at our proxy and you'll see an erroneous error when you hit your app in the browser. We're working on correcting this but in the meantime, if you hit it just run:
Also, if something strange is happening, you can always tail the server logs with:
What now? Read more about OpenShift at http://red.ht/nP7D2t
Thursday, July 28, 2011
Selenium WebDriver + Linux (Headless) + Ruby + Jenkins == Awesome
Continuous Integration is a big part of our development process in OpenShift. In this post, I'll talk about our setup to allow us to run automated Selenium tests for our website on a headless Linux machine, integrated with Jenkins.
For our web testing, we use the brand new Selenium WebDriver, integrated into Jenkins with our tests in Ruby. The trick is that we are are running Jenkins and the Selenium tests on headless (i.e. no graphical desktop) machines but we still need to launch a real browser like Firefox, do the tests and capture screenshots if anything goes wrong. The following setup is working like a charm for us, so I figured I would share:
Step 1 - Install Prerequisites
These prereqs are for a Fedora or RHEL 6.x machine (this was tested on RHEL 6, 6.1 and Fedora 14). For RHEL, make sure you are subscribed to the Optional channel if you are using RHN to get the ruby dependencies like rubygem. For Fedora 14 & 15, I had some issues building the ffi rubygem, so make sure to install the latest ffi rubygem from Koji.
Fedora and RHEL Prereqs:
sudo yum -y install make rubygems ruby-devel xorg-x11-font* wget
Fedora 14 only:
32 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc14/i686/rubygem-ffi-1.0.9-2.fc14.i686.rpm
64 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc14/x86_64/rubygem-ffi-1.0.9-2.fc14.x86_64.rpm
Fedora 15 only:
32 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc15/i686/rubygem-ffi-1.0.9-2.fc15.i686.rpm
64 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc15/x86_64/rubygem-ffi-1.0.9-2.fc15.x86_64.rpm
Step 2 - Install Xvfb and Firefox
This essentially allows a graphical display to run on your headless node. It's lightweight though, so you're not having to install Gnome or a full desktop environment. This will also install firefox so you have a browser to launch. I prefer this to the Xvnc based environments.
sudo yum -y install xorg-x11-server-Xvfb firefox
Step 3 - Install Selenium WebDriver
These rubygems package up the Ruby bindings to the WebDriver libraries and let's you write your Selenium tests in Ruby without running SeleniumRC. SeleniumRC has treated us well for many years, but WebDriver is going to be that much better.
sudo gem install selenium-webdriver
Step 4 - Install Headless
This allows your tests to directly spawn a headless display automatically and tear it down at the end of the tests. With this gem, you don't need any special Jenkins plugins to manage the display on each test run.
sudo gem install headless
Step 5 - Write Your First Test
Put this in a file called openshift_test:
#!/usr/bin/env ruby
require 'rubygems'
require 'headless'
require 'selenium-webdriver'
# Create a headless display
headless = Headless.new
headless.start
# Create a firefox Selenium driver
driver = Selenium::WebDriver.for :firefox
# Navigate to OpenShift
puts "Navigating to OpenShift"
driver.navigate.to 'http://openshift.redhat.com'
# Find an element on the page
puts "Finding an element on the page by id"
element = driver.find_element(:id, 'app_promos')
puts element
# Now, save a screenshot
driver.save_screenshot("openshift.png")
# Clean up the display
headless.destroy
Step 6 - Run Your Test
Assuming the file is saved in the current directory and named openshift_test, run:
chmod 755 openshift_test
./openshift_test
You should see some output and a png image of the screenshot in the current directory as well.
Step 7 - Install Jenkins
Now, let's install Jenkins and fire it up. We'll install Jenkins from their yum repo:
sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
sudo rpm --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
sudo yum -y install java-1.6.0-openjdk jenkins
sudo service jenkins start
sudo chkconfig jenkins on
Jenkins will now be running on your hostname, port 8080. For example: http://localhost:8080. If you are accessing remotely make sure 8080 is allowed on your firewall.
Step 8 - Setup Jenkins
Now, let's make Jenkins run this thing. You'll just need to create a new Jenkins job that runs that file you just created and archives the screenshots. Since lots of screenshots can each up space, we usually use the Advanced settings and only keep archived screenshots for 10 days or so.
Conclusion
Hope this helps some people out. Here are some references to the projects I've referenced above and for more info on the Ruby WebDriver project:
Projects:
Reference Info:
For our web testing, we use the brand new Selenium WebDriver, integrated into Jenkins with our tests in Ruby. The trick is that we are are running Jenkins and the Selenium tests on headless (i.e. no graphical desktop) machines but we still need to launch a real browser like Firefox, do the tests and capture screenshots if anything goes wrong. The following setup is working like a charm for us, so I figured I would share:
Step 1 - Install Prerequisites
These prereqs are for a Fedora or RHEL 6.x machine (this was tested on RHEL 6, 6.1 and Fedora 14). For RHEL, make sure you are subscribed to the Optional channel if you are using RHN to get the ruby dependencies like rubygem. For Fedora 14 & 15, I had some issues building the ffi rubygem, so make sure to install the latest ffi rubygem from Koji.
Fedora and RHEL Prereqs:
sudo yum -y install make rubygems ruby-devel xorg-x11-font* wget
Fedora 14 only:
32 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc14/i686/rubygem-ffi-1.0.9-2.fc14.i686.rpm
64 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc14/x86_64/rubygem-ffi-1.0.9-2.fc14.x86_64.rpm
Fedora 15 only:
32 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc15/i686/rubygem-ffi-1.0.9-2.fc15.i686.rpm
64 bit:
sudo yum install --nogpgcheck -y http://kojipkgs.fedoraproject.org/packages/rubygem-ffi/1.0.9/2.fc15/x86_64/rubygem-ffi-1.0.9-2.fc15.x86_64.rpm
Step 2 - Install Xvfb and Firefox
This essentially allows a graphical display to run on your headless node. It's lightweight though, so you're not having to install Gnome or a full desktop environment. This will also install firefox so you have a browser to launch. I prefer this to the Xvnc based environments.
sudo yum -y install xorg-x11-server-Xvfb firefox
Step 3 - Install Selenium WebDriver
These rubygems package up the Ruby bindings to the WebDriver libraries and let's you write your Selenium tests in Ruby without running SeleniumRC. SeleniumRC has treated us well for many years, but WebDriver is going to be that much better.
sudo gem install selenium-webdriver
Step 4 - Install Headless
This allows your tests to directly spawn a headless display automatically and tear it down at the end of the tests. With this gem, you don't need any special Jenkins plugins to manage the display on each test run.
sudo gem install headless
Step 5 - Write Your First Test
Put this in a file called openshift_test:
#!/usr/bin/env ruby
require 'rubygems'
require 'headless'
require 'selenium-webdriver'
# Create a headless display
headless = Headless.new
headless.start
# Create a firefox Selenium driver
driver = Selenium::WebDriver.for :firefox
# Navigate to OpenShift
puts "Navigating to OpenShift"
driver.navigate.to 'http://openshift.redhat.com'
# Find an element on the page
puts "Finding an element on the page by id"
element = driver.find_element(:id, 'app_promos')
puts element
# Now, save a screenshot
driver.save_screenshot("openshift.png")
# Clean up the display
headless.destroy
Step 6 - Run Your Test
Assuming the file is saved in the current directory and named openshift_test, run:
chmod 755 openshift_test
./openshift_test
You should see some output and a png image of the screenshot in the current directory as well.
Step 7 - Install Jenkins
Now, let's install Jenkins and fire it up. We'll install Jenkins from their yum repo:
sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
sudo rpm --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
sudo yum -y install java-1.6.0-openjdk jenkins
sudo service jenkins start
sudo chkconfig jenkins on
Jenkins will now be running on your hostname, port 8080. For example: http://localhost:8080. If you are accessing remotely make sure 8080 is allowed on your firewall.
Step 8 - Setup Jenkins
Now, let's make Jenkins run this thing. You'll just need to create a new Jenkins job that runs that file you just created and archives the screenshots. Since lots of screenshots can each up space, we usually use the Advanced settings and only keep archived screenshots for 10 days or so.
Conclusion
Hope this helps some people out. Here are some references to the projects I've referenced above and for more info on the Ruby WebDriver project:
Projects:
- Selenium WebDriver - http://code.google.com/p/selenium
- WebDriver RubyGem - http://rubygems.org/gems/selenium-webdriver
- Headless RubyGem - http://github.com/leonid-shevtsov/headless
- Jenkins - http://jenkins-ci.org
Reference Info:
- WebDriver Getting Started - http://code.google.com/p/selenium/wiki/GettingStarted
- WebDriver Ruby Bindings - http://code.google.com/p/selenium/wiki/RubyBindings
Wednesday, June 29, 2011
Because Everyone Loves HTML5...
Since HTML5 is the hot thing these days, I figured I'd better get some hands on practice. However, let me first explain why I think HTML5 is a key technology. Browser-based technology is becoming more and more pervasive. We spend a significant portion of our time each day in browsers whether we are on our desktop, laptop, tablet or phone. However, there is still a fairly stark contrast between a browser-based application and a true native or 'rich' application. Browser applications typically have limitations such as only working when you are connected, having minimal video and graphics rendering support and also limited data storage options. Flash and other native additions to browsers have been built to try and compensate for these shortcomings but they are non-standard and usually closed systems that only cover a subset of browsers or operating systems. Oh yeah, and Steve Jobs appears to not like Flash...
Okay, you understand - browsers are the limiting factor today. However, the HTML5 specification changes all of that. Once browsers have full support for the various HTML5 components, you will be able to manage sql-like data storage, render complicated graphics (SVG, 2D / 3D drawing, and videos), operate off-line, and even provide context based information with geo-location. Browsers are quickly going to become our client application platforms.
So what do we need all this cloud stuff for then? Well, I agree that some of the traditional workload is going to shift from server to client given the new browser capabilities. However, I think there is also going to be a drastic shift in application development away from the 'native' approach. What's on your desktop today is going to be running in your browser tomorrow. Also, HTML5 isn't going to be the only driver in this movement. Native mobile apps are still very popular (given the limted computing power) and they are all going to depending on that same server infrastructure. In other words, I think the server side infrastructure supporting all these new applications is going to grow. And it's going to grow by a lot.
This is where the cloud and specifically Platform as a Service (PaaS) comes in. The cloud provides utility like resources on demand - a great way to quickly get all those servers you need. A Platform as a Service builds on that capability and further abstracts you from a lot of the traditional operational management you have to do. While far from a standardized service, a really good Platform as a Service should make you a really efficient at both developing and getting that code to production.
Have I convinced you? Well then let's get cookin! Before you get into all the whiz bang features of HTML5, is starts with the basics - markup and hosting. Since it's 2011, I'm going to use a PaaS for my hosting and even though I'm going to use all this newfangled markup for HTML5, I still need it to work in IE{6,7,8} and look decent. This leads me to HTML5 Boilerplate and OpenShift. HTML5 Boilerplate is going to make my next-gen development work in the browsers of today. OpenShift is going to be my PaaS. I'll use the OpenShift PHP runtime for this example but this blog is generally applicable to all the OpenShift runtimes (Perl, Ruby, Python, etc).
Step 1. Register on OpenShift
A free account on OpenShift will provide you with a free runtime environment for this demo. First, signup for a new account. You'll need to validate your email address after registering - just click the link in the email you get. Once your account is validated and you've gotten the note that you've been approved, you're off to the races - or more accurately, Step 2.
Step 2. Create your domain
A domain will be used in your url. I wanted http://<app>-nextgen.rhcloud.com so I ran:
rhc-create-domain -n nextgen -l <my-email>
Step 3. Create your application
I'll use PHP for this example and I wanted my URL to be http://html5-nextgen.rhcloud.com so I ran:
rhc-create-app -a html5 -t php-5.3
Step 4. Merge in HTML5 Boilerplate
Now I've got a html5 directory in my current directory. Time to pull in HTML5 Boilerplate. In short, this project has all the fanciness to get you off to the races with HTML5 included some chance of supporting older browsers as well. But... I want to pull in the latest from HTML5 Boilerplate into my 'php' directory which OpenShift sets up. Time for some git magic:
# Go into your app directory - mine is named 'html5'
cd html5
# Now get the HTML5 Boilerplate content
git remote add boilerplate git://github.com/paulirish/html5-boilerplate.git
git fetch boilerplate
git read-tree --prefix=php/ -u boilerplate/master
git commit -a -m "Merging in HTML5 Boilerplate"
Step 5. Switch over index.php
First, let's switch over to use the index.php file so we can use a php function for the example.
cd php
cp index.html index.php
git rm index.html
git commit -a -m "Switching over to use index.php"
Now, open up index.php and change the section after '<div id="main" role="main">' to add '<?php phpinfo(); ?>':
<div id="main" role="main">
<?php phpinfo(); ?>
</div>
Don't forget to commit:
git commit -a -m "Added some fancy php info"
Step 6. Publish
Now, let's see how easy publishing changes can be.
git push
Yep, that's it - open your browser to your URL. Not convinced it's working? Highlight something. Yep, hot pink :) You are officially a OpenShift / HTML5 Boilerplate user now.
Experiment some on your own while I write my next blog post. Next one will be about creating something a little more involved with this base setup.
Referenced Projects
Tuesday, June 28, 2011
Cloud, Packaging and State (Part 1 - Embrace RPM's)
Well, let's first discuss which side of the Platform as a Service you are on. In this post I'm going to talk about the people building and running the Platform as a Service and why technologies like RPM's are still very, very relevant. However, a brief detour for the users of a PaaS. As a developer on OpenShift I don't want my users to have to worry about RPM's. Heck, I don't want them to have to worry about WARs, EARs, SARs, Gems or Eggs either. I want our users to spend as much time working with the thing they created - their source code. That is where our users put their creativity and hard work and I want to remove as many barriers as possible in their road from source code to running project. To our users I say 'You are correct, you will not need RPM's in our cloud!!' (crowd cheers...)
Now, onto the main focus group of this article - those maintaining a Platform as a Service or similar supporting infrastructure in the cloud. Service-side cloud technologies like Infrastructure as a Service (IaaS) bring a tremendous amount of power, but they also change some design patterns. Today's infrastructures need to be designed to respond. You no longer stand up 2 servers and throttle traffic. You spin up as many servers as you need and respond to your changing needs. If well done, your computing components appear very fluid.
Let's face it, this is seldom done well and it's a hard problem. One of the most common things I see go wrong is that people lose track of the underlying state of the system and fluidity quickly turns to chaos. The underlying architectures and process to support a truly fluid infrastructure actually have to be even more disciplined than they have ever needed to be in the past. Knowing exact state at any point in time is critical to the consistency of a cloud. It let's you make all the management decisions in a very dynamic manner - whether it's applying routine updates or coordinating a new major release.
In general, I break state into three sections: what developers control, what operations controls, and what the end users control. You understanding of each of these areas is extremely critical to being able to manage a highly dynamic system. In this post, I'm going to focus more on the first two - the sometimes contentious development / operations relationship.
Now, as developers we often focus a little too much on our Source Code Management (SCM) system. We spend out days in git or SVN, we tag or branch releases, and that's where we tend to put our effort. Sometimes... we even assume that all the important 'state' for a system is in the SCM... (crowd gasps...) I know... I've seen it happen. However, the stark reality is that at some point, code has to leave the SCM, and get packaged up to be deployed. It usually gets deployed in QA and Staging environments before it reaches Production. Developers - don't lose track of your code when it's leaves the nest! Keep track of that code!
For OpenShift, RPM's are a key part of keeping a handle on that transition. Wait! Aren't RPM's are old, mysterious, and a little bit evil? Look, I'll be the first to admit that I've had a long love / hate relationship with RPM's. It's not the easiest technology to learn and there is a little bit of magic to them. A great resource that will help demystify them is Maximum RPM and many of the Fedora pages like their macros page or their Ruby packaging guidelines but it's cumbersome reading. However, the real reason to use RPM's is that I just haven't found a better tool for the job. Yes, there is some pain that goes into the upfront process of getting everything into RPMs. It makes you really think through your packaging, permissions and system layout - stuff development often ignores. However, I promise if you put in this work upfront, you'll never go back. To try and convince you, let's talk about some of the things we actually use them for.
One nice thing about RPM's is that in addition to just a packaging specification, each machine maintains a database of what's installed. And it's not just a database of package names - it's includes all the details of how the software was installed. Let's go through some real OpenShift use cases. Real people, real packages, real questions...
Basic Investigation
"What's currently installed on hostXYZ?"
[root@ip-10-85-70-89 ~]# rpm -qa | grep rhc
rhc-cartridge-php-5.3-0.73.4-1.el6_1.noarch
rhc-devenv-0.73.3-1.el6_1.noarch
rhc-0.73.5-1.el6_1.noarch
rhc-cartridge-perl-5.10-0.4.5-1.el6_1.noarch
rhc-cartridge-rack-1.1-0.73.4-1.el6_1.noarch
rhc-server-common-0.73.6-1.el6_1.noarch
rhc-common-0.73.2-1.el6_1.noarch
rhc-cartridge-jbossas-7.0-0.73.6-1.el6_1.noarch
rhc-broker-0.73.5-1.el6_1.noarch
rhc-selinux-0.73.2-1.el6_1.noarch
rhc-cartridge-wsgi-3.2-0.73.4-1.el6_1.noarch
rhc-site-0.73.5-1.el6_1.noarch
rhc-node-0.73.5-1.el6_1.noarch
With one command, we can easily see the details about every custom component that we install. We use the prefix 'rhc-' for our packages to make this queries really easy.
"What other software does the 'rhc' package depend on?"
[root@ip-10-85-70-89 ~]# yum deplist rhc | grep -v provider:
Loaded plugins: product-id, subscription-manager
Updating Red Hat repositories.
Repository jenkins is listed more than once in the configuration
Finding dependencies:
package: rhc.noarch 0.73.5-1.el6_1
dependency: ruby >= 1.8.6
dependency: rubygem-parseconfig
dependency: /usr/bin/ruby
dependency: git
dependency: /usr/bin/env
dependency: rubygem-json
I filtered the output with grep to only get the dependencies. By default, it will show you what provides these dependencies as well. Now it's subtle but you'll notice I'm using yum here instead of RPM. Yum manages the relationships and metadata between packages. What your package needs installed to work, etc. Yum will also nicely manage the installation of all those dependencies for you too.
"Ahh, it needs Ruby. What version of Ruby are we running?"
[root@ip-10-85-70-89 ~]# rpm -q ruby
ruby-1.8.7.299-7.el6_1.1.x86_64
Because unfortunately there's a big difference between Ruby 1.8.6, 1.8.7 and 1.9...
Little More Advanced
"I don't remember if I installed the 'production' or 'development' rhc package..."
Put On Your Seatbelt...
"How do I really know where that Signature came from?"
Well, let's figure it out. First, let's get the signature again.
[root@ip-10-85-70-89 ~]# rpm -qi rhc | grep Signature
Signature : RSA/8, Thu 23 Jun 2011 04:05:57 PM EDT, Key ID 938a80caf21541eb
Okay, so this thing has a signature Key ID of 938a80caf21541eb. Let's see what MIT's PGP server says about that key. Open up http://pgp.mit.edu and enter '0x938a80caf21541eb' in the search box. Don't forget that '0x' at the beginning of the string.
pub 4096R/F21541EB 2009-02-24 Red Hat, Inc. (beta key 2) <security@redhat.com>
Okay, the security@redhat.com email address looks promising. But how do I really trust that? Well, let's just verify that fingerprint on Red Hat's site as well. Go to https://access.redhat.com/security/team/key/ and search on the page for the fingerprint:
B08B 659E E86A F623 BC90 E8DB 938A 80CA F215 41EB
You should see a match at the bottom of the page. Good news, the originator of this package was Red Hat.
Linking this to Development
Now, the above gives you a great view into your operational state, but how do you tie this back into development? I can only really describe what we do since there are an infinite number of ways to approach this. First, we use git for our SCM and we use tito to standardize our link between the state of the code and the RPMs.
I won't go into why we use git too much. It's a distributed revision control system and it's wonderful. Enough said. Let's talk about tito though. The real mechanics that link source code to a RPM for us are git tags. We have a single git repository with lots of separate components in as top level folder. For example:
openshift/
/client
/site
/broker
...
Now there are lots of ways that you could approach tagging that would work and there are lots of ways you can build RPM's. The key is consistency - you want to mark the code at a point in time for a release. Then you just need to pick an approach and stick with it. Tito essentially tags the git repo for each RPM build that you do. Tito increments the RPM spec, puts in the comments, tags the git repository with the full package name (e.g. rhc-0.72.22-1) and submits the build. We use an internal Koji system to make sure we have reproducible RPM builds with all the dependencies nice and orderly.
With that approach, you can walk back to an exact state in the SCM just given a package version. Whether that package is running in production or any other environment, you know the exact code state that created it and you can also easily make a small patch to it and rebuild it. Development to operations with full visibility - it's a beautiful thing.
Honestly, this is just the tip of the iceberg. The point is that this diligence in development, packaging and versioning allows you to tell the exact, painful details about any running system. That understanding will make your releases smoother, updates easier to manage, and your users happier. And in the end, hopefully this knowledge will help your fluid architecture stay far away from chaotic.
Appendix
I was recently burned again by trying to send package names through the standard sort. This works fine for 0.1, 0.2 ... 0.9 but when you add 0.10, the normal sort puts it right after 0.1. Let's start with a simple example:
irb(main):055:0> a = ['0.1', '0.2']
=> ["0.1", "0.2"]
irb(main):056:0> a.sort
=> ["0.1", "0.2"]
Yep, that's what I would expect. Now, let's see what happens when I add '0.10'.
irb(main):053:0> a = ['0.1', '0.2', '0.10']
=> ["0.1", "0.2", "0.10"]
irb(main):054:0> a.sort
=> ["0.1", "0.10", "0.2"]
Now, onto the main focus group of this article - those maintaining a Platform as a Service or similar supporting infrastructure in the cloud. Service-side cloud technologies like Infrastructure as a Service (IaaS) bring a tremendous amount of power, but they also change some design patterns. Today's infrastructures need to be designed to respond. You no longer stand up 2 servers and throttle traffic. You spin up as many servers as you need and respond to your changing needs. If well done, your computing components appear very fluid.
Let's face it, this is seldom done well and it's a hard problem. One of the most common things I see go wrong is that people lose track of the underlying state of the system and fluidity quickly turns to chaos. The underlying architectures and process to support a truly fluid infrastructure actually have to be even more disciplined than they have ever needed to be in the past. Knowing exact state at any point in time is critical to the consistency of a cloud. It let's you make all the management decisions in a very dynamic manner - whether it's applying routine updates or coordinating a new major release.
In general, I break state into three sections: what developers control, what operations controls, and what the end users control. You understanding of each of these areas is extremely critical to being able to manage a highly dynamic system. In this post, I'm going to focus more on the first two - the sometimes contentious development / operations relationship.
Now, as developers we often focus a little too much on our Source Code Management (SCM) system. We spend out days in git or SVN, we tag or branch releases, and that's where we tend to put our effort. Sometimes... we even assume that all the important 'state' for a system is in the SCM... (crowd gasps...) I know... I've seen it happen. However, the stark reality is that at some point, code has to leave the SCM, and get packaged up to be deployed. It usually gets deployed in QA and Staging environments before it reaches Production. Developers - don't lose track of your code when it's leaves the nest! Keep track of that code!
For OpenShift, RPM's are a key part of keeping a handle on that transition. Wait! Aren't RPM's are old, mysterious, and a little bit evil? Look, I'll be the first to admit that I've had a long love / hate relationship with RPM's. It's not the easiest technology to learn and there is a little bit of magic to them. A great resource that will help demystify them is Maximum RPM and many of the Fedora pages like their macros page or their Ruby packaging guidelines but it's cumbersome reading. However, the real reason to use RPM's is that I just haven't found a better tool for the job. Yes, there is some pain that goes into the upfront process of getting everything into RPMs. It makes you really think through your packaging, permissions and system layout - stuff development often ignores. However, I promise if you put in this work upfront, you'll never go back. To try and convince you, let's talk about some of the things we actually use them for.
One nice thing about RPM's is that in addition to just a packaging specification, each machine maintains a database of what's installed. And it's not just a database of package names - it's includes all the details of how the software was installed. Let's go through some real OpenShift use cases. Real people, real packages, real questions...
Basic Investigation
"What package manages the file /usr/bin/rhc-create-app?"
[root@ip-10-85-70-89 ~]# rpm -qf /usr/bin/rhc-create-app
rhc-0.72.29-1.el6_1.noarch
We can walk from an installed file to a package. Pretty nice if you didn't do all the packaging and are doing some investigation.
"What's currently installed on hostXYZ?"
[root@ip-10-85-70-89 ~]# rpm -qa | grep rhc
rhc-cartridge-php-5.3-0.73.4-1.el6_1.noarch
rhc-devenv-0.73.3-1.el6_1.noarch
rhc-0.73.5-1.el6_1.noarch
rhc-cartridge-perl-5.10-0.4.5-1.el6_1.noarch
rhc-cartridge-rack-1.1-0.73.4-1.el6_1.noarch
rhc-server-common-0.73.6-1.el6_1.noarch
rhc-common-0.73.2-1.el6_1.noarch
rhc-cartridge-jbossas-7.0-0.73.6-1.el6_1.noarch
rhc-broker-0.73.5-1.el6_1.noarch
rhc-selinux-0.73.2-1.el6_1.noarch
rhc-cartridge-wsgi-3.2-0.73.4-1.el6_1.noarch
rhc-site-0.73.5-1.el6_1.noarch
rhc-node-0.73.5-1.el6_1.noarch
"What other software does the 'rhc' package depend on?"
[root@ip-10-85-70-89 ~]# yum deplist rhc | grep -v provider:
Loaded plugins: product-id, subscription-manager
Updating Red Hat repositories.
Repository jenkins is listed more than once in the configuration
Finding dependencies:
package: rhc.noarch 0.73.5-1.el6_1
dependency: ruby >= 1.8.6
dependency: rubygem-parseconfig
dependency: /usr/bin/ruby
dependency: git
dependency: /usr/bin/env
dependency: rubygem-json
I filtered the output with grep to only get the dependencies. By default, it will show you what provides these dependencies as well. Now it's subtle but you'll notice I'm using yum here instead of RPM. Yum manages the relationships and metadata between packages. What your package needs installed to work, etc. Yum will also nicely manage the installation of all those dependencies for you too.
"Ahh, it needs Ruby. What version of Ruby are we running?"
[root@ip-10-85-70-89 ~]# rpm -q ruby
ruby-1.8.7.299-7.el6_1.1.x86_64
Because unfortunately there's a big difference between Ruby 1.8.6, 1.8.7 and 1.9...
Little More Advanced
"I don't remember if I installed the 'production' or 'development' rhc package..."
[root@ip-10-85-70-89 ~]# rpm -qi rhc | grep Signature
Signature : RSA/8, Thu 23 Jun 2011 04:05:57 PM EDT, Key ID 938a80caf21541eb
In our process, we only sign stuff going to production so since this RPM has a signature, it was from the real production build system, not a local laptop build. You can also use different signatures for each environment. That adds some overhead but is a nice way to tell where packages came from.
"Have any files been modified in that package?"
[root@ip-10-85-70-89 ~]# rpm -V rhc
S.5....T. c /etc/openshift/express.conf
The rpm man page has all the gory details on the format here but this basically says that this one file has changed. The 'c' denotes it as a config file and the other letters mean the [S]ize has changed, the MD[5] sum is different and the modified [T]ime is different.
Changed config files are pretty normal. If a binary file had changed, that might be another story...
Put On Your Seatbelt...
"How do I really know where that Signature came from?"
Well, let's figure it out. First, let's get the signature again.
[root@ip-10-85-70-89 ~]# rpm -qi rhc | grep Signature
Signature : RSA/8, Thu 23 Jun 2011 04:05:57 PM EDT, Key ID 938a80caf21541eb
pub 4096R/F21541EB 2009-02-24 Red Hat, Inc. (beta key 2) <security@redhat.com>
Mark Cox Internal RSA 4096 test key <mjc@redhat.com>
Fingerprint=B08B 659E E86A F623 BC90 E8DB 938A 80CA F215 41EB
Okay, the security@redhat.com email address looks promising. But how do I really trust that? Well, let's just verify that fingerprint on Red Hat's site as well. Go to https://access.redhat.com/security/team/key/ and search on the page for the fingerprint:
B08B 659E E86A F623 BC90 E8DB 938A 80CA F215 41EB
You should see a match at the bottom of the page. Good news, the originator of this package was Red Hat.
"Is there anything installed on my system that isn't signed or I don't have a public key for?"
This fancy command is courtesy of Mike McGrath.
[root@ip-10-85-70-89 ~]# rpm -q --queryformat '%{NAME} %{SIGPGP:pgpsig}\n' -a | sort | egrep -v "$(rpm -qa gpg-pubkey* | awk -F'-' '{ print $3 }' | tr '\n' '\|' | sed 's/|$//')"
jboss-as7 (none)
jenkins (none)
maven3 (none)
mcollective-client (none)
mcollective-common (none)
...
"Wow. Can I completely depend on this for security?"
Technically someone could really exploit your system and alter your RPM DB to hide any changes. In those cases, you are probably looking at installing something like Tripwire to help even detect those cleanup efforts. Security is always sort of a cat and mouse game but I'm going to try and gracefully dodge the deep security questions since the focus of this article is on system state. Use RPM's for state but don't assume they give you a free pass to ignore security.
Linking this to Development
Now, the above gives you a great view into your operational state, but how do you tie this back into development? I can only really describe what we do since there are an infinite number of ways to approach this. First, we use git for our SCM and we use tito to standardize our link between the state of the code and the RPMs.
I won't go into why we use git too much. It's a distributed revision control system and it's wonderful. Enough said. Let's talk about tito though. The real mechanics that link source code to a RPM for us are git tags. We have a single git repository with lots of separate components in as top level folder. For example:
openshift/
/client
/site
/broker
...
Now there are lots of ways that you could approach tagging that would work and there are lots of ways you can build RPM's. The key is consistency - you want to mark the code at a point in time for a release. Then you just need to pick an approach and stick with it. Tito essentially tags the git repo for each RPM build that you do. Tito increments the RPM spec, puts in the comments, tags the git repository with the full package name (e.g. rhc-0.72.22-1) and submits the build. We use an internal Koji system to make sure we have reproducible RPM builds with all the dependencies nice and orderly.
With that approach, you can walk back to an exact state in the SCM just given a package version. Whether that package is running in production or any other environment, you know the exact code state that created it and you can also easily make a small patch to it and rebuild it. Development to operations with full visibility - it's a beautiful thing.
Honestly, this is just the tip of the iceberg. The point is that this diligence in development, packaging and versioning allows you to tell the exact, painful details about any running system. That understanding will make your releases smoother, updates easier to manage, and your users happier. And in the end, hopefully this knowledge will help your fluid architecture stay far away from chaotic.
Appendix
I was recently burned again by trying to send package names through the standard sort. This works fine for 0.1, 0.2 ... 0.9 but when you add 0.10, the normal sort puts it right after 0.1. Let's start with a simple example:
irb(main):055:0> a = ['0.1', '0.2']
=> ["0.1", "0.2"]
irb(main):056:0> a.sort
=> ["0.1", "0.2"]
Yep, that's what I would expect. Now, let's see what happens when I add '0.10'.
irb(main):053:0> a = ['0.1', '0.2', '0.10']
=> ["0.1", "0.2", "0.10"]
irb(main):054:0> a.sort
=> ["0.1", "0.10", "0.2"]
Ouch. Since each entry is treated like strings, it has no concept that the '10' is greater than '2'. This is further complicated by the fact that most packages are named with the convention package-major.minor.patch-revision (e.g. mypackage-0.1.23-2).
Since I always end up digging for a while to try and get the regex's and sorts right, I figured I would write it down this time.
First, let's build ourselves a package list:
pkg_list = 25.times.collect {|num| "mypackage-0.#{num+1}.1-1"}
Now, let's sort it:
pkg_list.sort_by {|pkg| /(.*)-(\d+)\.(\d+)\.(\d+)-?(\d)?/.match(pkg); [$1, $2.to_i, $3.to_i, $4.to_i, $5.to_i]}
This uses the regular express to match each component of the package name. Next, it returns the values in the proper forms in the correct priority order. For example, that array returned above essentially says 'compare name first, then major version (as an integer), the minor version (as an integer), then patch version (as an integer), then the revision (as an integer). This will make sort work and hopefully save you a bug down the road.
Subscribe to:
Posts (Atom)