OpenStack API Benchmarking and Scaling — 3 Test Cases

Have you ever been curious how much of a workload the OpenStack control plane can handle before needing to scale horizontally? Based on the load, how will the API performance adjust? How much overhead will load on the OpenStack APIs add to my application deployment timeline? What behavior should I look for to determine its time to add more control plane resources?

These are just a few questions I have been asked about operating OpenStack clouds.  While performing API benchmarking can be measured in many different ways, it is a good idea to have some high level frame of reference.

There are about a million ways to measure performance, and your approach may differ from what I have done. The tests I created, conducted and describe below are meant to provide a starting reference for you to build upon. Don’t treat this as a baseline standard; you’ll want to adjust the tests based on your objectives, environments and scenarios.

Also, it’s important to remember that no matter how the API is created, all APIs have a point where the load will cause performance to degrade. The objective is to try to find that point and scale before reaching it. I like to stay 30 percent away from the breaking point at all times — so if your API begins to fail when hit with more than 30 tps, or transactions per second, you need to add enough API resources to keep it under 21 tps.

Adding resources could mean adding more servers to run your API on and sitting them behind a load balancer. While I didn’t find that point during my tests (mainly due to not having an lab environment to accommodate such a test), I did discover a pattern that can be used to determine expected API behavior. Again, these test results are directly related to my lab environments setup and resources. Your results will differ based on the environment the cloud is running on.

Testing Strategy

If you’ve heard any of my Meetup or Summit presentations, you know I’m big on having an objective or strategy for the task at hand. Testing an API may sound simple in theory, but it’s not just about sending large amounts of traffic to an API individually and collecting results. I prefer to look at it from the user’s perspective. With OpenStack, every request, outside of requesting an authentication token, is a series of API requests to return the information you requested. Listing the servers within your project takes at least two to three API requests based on the parameters passed. So I decided to create a test script that tries to mimic normal cloud utilization.

The script steps:

  • lists servers within the project (mainly meant to test authentication),
  • create instances in that project (the number of instances created is configurable),
  • allow for instance build time (delay time is configurable),
  • list servers within the project in order to attain new instance IDs (information used for subsequent requests),
  • create a snapshot of one of the new instances,
  • resize one of the new instances,
  • confirm the resize of the instance resized above,
  • delete one of the new instances.

That script can be executed continuously with any number of simulated users to gather API metrics and performance patterns. It made sense to stay focused on the computing service (Nova) since that service calls almost all the other services to perform specific actions.  It’s also the heaviest utilized service within the OpenStack ecosystem focused on providing computing resources. We will cover Object or Block storage focused clouds in future blog posts.

My objective was to put varying loads on the OpenStack APIs, based on a set time limit, using the above script to observe the API and control plane server performance.

I was looking to answer questions like:

  • How long did each step take to complete?
  • Were the requested tasks completed as expected?
  • How much CPU utilization on the control plane did it cause?
  • How many iterations of the script did it complete?
  • How many instances were created successfully?
  • What was the overall instance provisioning time?
  • What were the API response metrics?

Tools

Next step in the process was to determine what tools I would use to perform the testing. I wanted something open source and relatively easy to use, so I went back to my application support roots and went with my friend SoapUI.

If you’re not familiar with SoapUI, I suggest downloading it and giving it a try. This tool has changed since the last time I used it — there is now a fancy version called SoapUI NG Pro. It’s not free, but includes more testing and load features. Since I can’t look away from new shiny things, I went ahead and tried it. (The series of test requests within SoapUI work with both the free and licensed versions.)

It was fairly simple to create each step of the script outline above within SoapUI.  I did a few things to simplify them:

  • Requested the authentication token separate from the actual test script, because the token is returned inside of the http header and SoapUI was not able to read the response header in the manner I needed it to. Since the token lasts for just a day, this was not a big deal.
  • Created a custom property within the project named ‘X-Auth-Token’ to pass the token value to each test request. That way, I just had to update the token in one place and it would be inherited by all the other test requests.
  • Configured each test request to pass the custom property created above into the http header of the request.  Did this by defining a parameter in each request with a value that passed the custom property variable. The screenshot below provides more details.
OpenStack API test
Click to make image larger

Execution and Results

The three test scenarios below are just three out of what seems like about a hundred different tests I executed on the lab. For about a week I beat up the lab as much as I could. Run the script, letting it creating instances, altering them, deleting them and repeating the steps continuously.

I am proud to say, OpenStack held up to the torment. Did not experience not one single API request failure throughout my numerous load tests — yet another proof point that OpenStack is ready for enterprise/production use.

Before getting into the metrics collected, I wanted to share some behaviors and observations I had during this benchmarking exercise. As you might guess, the script execution time became longer as the load (simulated virtual users) increased. This is expected, of course. More users = more load = slower API response time.

The good news is, the difference in API response times was marginal and not a nose-dive degradation. Going from 10 to 20 virtual users almost exactly doubled the scripts total execution time (see charts below). Another thing worth noting is the API response time for the test step to list the instances within the project begins to take longer as the virtual users and iterations increase. That’s because the API query process takes longer due to the increasing number of instances running/being created. Again, to me, this is expected API behavior.

From the bare metal controller node perspective, the increase in load did begin to increase the CPU utilization, also expected behavior (see charts below). Overall utilization stayed under 50 percent.

The details

Lab Environment:

  • 2 x HAProxy Nodes: Hex Core processor with 128GB of RAM
  • 3 x Controller Nodes: Dual Hex Core processor with 256GB of RAM and 2.4TB ephemeral storage
  • 6 x Compute Nodes:  Dual Hex Core processor with 256GB of RAM and 2.4TB ephemeral storage

OS Deployment Method: Rackspace Private Cloud powered by OpenStack (openstack-ansible aka OSA)

HA Approach: The controller nodes API traffic was balanced across all three controllers actively using the HAProxy nodes running Keepalived for high availability.

Cloud Network Setup: single Provider network; each instance was configured to connect to this network and be assigned an address via DHCP

Test Script Configuration:

  • 5 instances created
  • instance build time delay of 20 seconds
  • instance resize time delay of 10 seconds

Test Scenario #0 (script benchmark) – intended to provide the base script benchmark

# of VU’s Duration Iterations # of Test Steps Total # of API Requests # of Instances Created
1 35 s 1 10 18 5
Test Step API Response (Median) Count Error
Step 1 [init-server-list] 259 ms 1 0
Step 2 [create-server] 1228 ms 1 0
Step 3 [build-time] 19999 ms 1 0
Step 4 [server-list] 235 ms 1 0
Step 5 [get-server-id] 9 ms 1 0
Step 6 [create-snapshot] 368 ms 1 0
Step 7 [resize-server] 557 ms 1 0
Step 8 [resize-delay] 9999 ms 1 0
Step 9 [confirm-resize-server] 120 ms 1 0
Step 10 [delete-server] 209 ms 1 0

 

Test Scenario #1

# of VU’s
(Virtual Users)
Duration Iterations # of Test Steps Total # of API Requests # of Instances Created
5 5 min 40 10 720 200
Test Step API Response (Median) Count Error
Step 1 [init-server-list] 888 ms 45 0
Step 2 [create-server] 1271 ms 45 0
Step 3 [build-time] 19999 ms 45 0
Step 4 [server-list] 874 ms 40 0
Step 5 [get-server-id] 760 ms 40 0
Step 6 [create-snapshot] 180 ms 40 0
Step 7 [resize-server] 168 ms 40 0
Step 8 [resize-delay] 9999 ms 40 0
Step 9 [confirm-resize-server] 108 ms 40 0
Step 10 [delete-server] 181 ms 40 0

OpenStack API test

Infra CPU statsTest Scenario #2

# of VU’s
(Virtual Users)
Duration Iterations # of Test Steps Total # of API Requests # of Instances Created
10 5 min 70 10 1260 372
Test Step API Response (Median) Count Error
Step 1 [init-server-list] 1522 ms 77 0
Step 2 [create-server] 1589 ms 76 0
Step 3 [build-time] 19999 ms 75 0
Step 4 [server-list] 1938 ms 71 0
Step 5 [get-server-id] 3834 ms 70 0
Step 6 [create-snapshot] 256 ms 70 0
Step 7 [resize-server] 200 ms 70 0
Step 8 [resize-delay] 9999 ms 70 0
Step 9 [confirm-resize-server] 140 ms 70 0
Step 10 [delete-server] 173 ms 70 0

New Stats Group 2

Infra CPU Stats 2Test Scenario #3

# of VU’s
(Virtual Users)
Duration Iterations # of Test Steps Total # of API Requests # of Instances Created
20 5 min 95 10 1800 399
Test Step API Response (Median) Count Error
Step 1 [init-server-list] 3048 ms 112 0
Step 2 [create-server] 1890 ms 112 0
Step 3 [build-time] 19999 ms 112 0
Step 4 [server-list] 2247 ms 112 0
Step 5 [get-server-id] 14066 ms 95 0
Step 6 [create-snapshot] 217 ms 95 0
Step 7 [resize-server] 221 ms 95 0
Step 8 [resize-delay] 9999 ms 95 0
Step 9 [confirm-resize-server] 184 ms 95 0
Step 10 [delete-server] 195 ms 95 0

New Stats Group 3

Infra CPU Stats 3

Walter Bentley was a Rackspace Private Cloud Technical Marketing Engineer and author with a diverse background in production systems administration and solutions architecture. He has more than 15 years of experience in sectors such as online marketing, financial, insurance, aviation, the food industry, education and now in technology. In the past, he was typically the requestor, consumer and advisor to companies tin the use of technologies such as OpenStack. Today he’s an OpenStack promoter and cloud educator. Walter helped customers build, design and deploy private clouds built on OpenStack. That includes professional services engagements around operating OpenStack clouds and DevOps engagements creating playbooks/roles with Ansible. He presented and spoke regularly at OpenStack Summits, AnsibleFest and other technology conferences, plus webinars, blog posts and technical reviews. His first book, ‘OpenStack Administration with Ansible’ was released in 2016.

1 COMMENT

  1. Hi Walter Bentley,

    Thank you for your post. It’s a great post. I am trying to benchmark our openstack to identify request failure rate( included number of request failure and the failures could be resource exhaustion or failures in creating port/network…etc). After watching your video, i am really interested in using SOAPUI for make the test with my scenario: Plan to make 5-1000 requests for creating VMs and then identifying request failure. With SOAPUI is it possible to do and get that kind of data? Could you please give me some suggestions in this case ? And Would be great if you could share your soapui script that running with your test cases above in your post as reference. AS I’m new with soapui.
    Thank you so much and have a great day
    Thanks and

LEAVE A REPLY

Please enter your comment!
Please enter your name here