Juno Design Summit has ended
Back To Schedule
Tuesday, May 13 • 12:05pm - 12:45pm
Ironic Performance and Scalability

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

This session will cover efforts by multiple teams to begin profiling Ironic's performance and scalability, and identifying areas that need improvement.

* Does Ironic's code scale to hundreds of conductors and tens of thousands of nodes? What about Ironic's interaction with other services?
* What bottlenecks do we hit when deploying to bare metal in parallel? What limitations do we hit with network and disk IO, IPMI latency and instability, etc? What recommendations can we make to deployers to mitigate these?
* What optimizations can be done to minimize the end-to-end time to boot a single bare metal machine? Are they worth the investment?

The session will be co-presented by Mirantis, Rackspace, and HP.


This session will include the following subject(s):

Benchmarking Ironic:

Let's discuss the following aspects of benchmarking Ironic:

- Measuring API performance (easy)
- Ironic Nova interaction
- Ironic neutron interaction
- Provisioning nodes. This might be useful for owners of a specific hardware.

(Session proposed by Roman Prykhodchenko)

Deploying as fast as possible:

Today Ironic can deploy a server in around 22 minutes (citation needed). We can do better than this. Let's talk about strategies to speed up deployments to less than 10 minutes, or ideally the same speed as Nova. Here's some ideas to get us started:

* Caching popular images on boot devices
* Leaving servers powered on and ready to be deployed to
* Using kexec instead of reboot

What else can we come up with?

(Session proposed by Jim Rollenhagen)

Running Ironic at scale:

Ironic is still in its infancy. The architecture is pretty good, the HA model should work well, and the API is fairly fast. However, nobody is running it at scale yet, and there could be problems yet to be uncovered.

Let's take a deep dive into Ironic's architecture and talk about how it might be improved to support large deployments. We could cover items such as:

* Hash ring mapping - how to find nodes for a conductor more efficiently
* Making API calls more async - Nova doesn't ever wait for an RPC response - why does Ironic?
* Making the conductor more efficient - today the conductor can easily be overloaded with work - are there ways to make this better besides spinning up more conductors?
* Imaging efficiency - how can many machines be provisioned at once without saturating Glance or the network itself?

Or any other topics folks want to cover.

(Session proposed by Jim Rollenhagen)

Tuesday May 13, 2014 12:05pm - 12:45pm EDT

Attendees (0)