Python Webserver Framework Performance Benchmark

Complete Python Webserver Framework Performance Benchmark

To find out which Python webserver frameworks perform the best, I’m going to benchmark several of them. I will be measuring metrics such as requests per second and latency under different circumstances do that under different situations.

For updates and roadmap see the bottom of this page.

Benchmarking Methodology

Hardware setup

Component Specification
CPU Intel Core i7 12700H
Memory Max Frequency 3200.0 MHz

Test-specific components (e.g. CPU/memory) are reported on each test. Most tests use 100% of a single vCPU. This ensures there is plenty of resources available to the application server. This is also fair if all servers are given the same resources during each test. Do note, however, that your results may be very different since in real scenarios the results will be much higher since most deployments don’t use only 100% of a vCPU. The goal of these benchmarks is to get an idea of the performance level and compare results between different frameworks given a level playing field.

Restricting the resources also means restricting the throughput that the framework is able to run through. The upside to this is that the testing tool won’t require massive amounts of resources, and therefore is not resource-restricted and is free to use as many resources as it wants in order not to be the bottleneck. All requests are issued from the same machine hosting the server.

Testing Environment and Tools

All tests are performed inside docker to ensure consistency. You can check the docker environment used by each test by viewing its source code.

Metrics are measured using the wrk tool. You can check the details of the configuration used for each test in the test’s docker-compose in the source code (coming soon). Otherwise, the test would specify any configurations used.

Tests

Simple “Hello World” / Echo application

This simple test runs the application with as little work as possible by simply returning a “Hello World” string with a content type “text/plain” where possible.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server Avg ms Stdev ms Max ms Req/Sec Total Requests
falcon 116.11 572.14 6,900.00 4,817 96,434
falcon-gunicorn 3.04 0.43 12.77 6,551 131,154
falcon-uwsgi 1.33 0.33 9.23 13,711 274,340
falcon-bjoern 0.33 0.22 8.55 62,733 1,254,669
falcon-bjoern-nuitka 0.32 0.17 7.07 62,813 1,256,254
falcon-bjoern-pypy 9.30 25.78 153.86 33,665 673,989
falcon-uvicorn 3.58 0.30 12.46 5,583 111,766
falcon-uvicorn-uvloop 2.48 0.33 10.48 8,066 161,511
cherrypy 12.21 14.32 213.46 2,537 50,799
cherrypy-uwsgi 6.61 0.50 16.99 2,997 60,023
cherrypy-tornado 9.19 0.80 19.39 2,174 43,555
cherrypy-twisted 9.99 2.09 54.90 2,003 40,112
flask 7.50 0.54 23.59 2,658 53,220
flask-fastwsgi 1.95 0.29 11.62 10,301 206,186
flask-gunicorn-eventlet 28.09 45.32 398.87 4,552 91,154
flask-gunicorn-gevent 27.99 42.84 301.40 4,502 90,124
flask-gunicorn-gthread 5.55 0.58 15.80 3,600 72,072
flask-gunicorn-meinheld 1.99 0.46 9.73 10,062 201,365
flask-gunicorn-tornado 6.32 0.68 16.84 3,164 63,347
flask-gunicorn 4.66 0.50 16.73 4,279 85,654
flask-meinheld 2.17 0.24 11.12 9,142 182,964
flask-bjoern 1.63 0.27 9.85 12,312 246,376
flask-bjoern-nuitka 1.31 0.27 9.23 15,275 305,634
fastwsgi 0.11 0.20 8.99 214,112 4,303,606
bottle 64.17 332.82 3,500.00 5,319 106,470
bjoern 0.12 0.20 7.68 178,353 3,567,317
bjoern-pypy 894.35 1,800.00 7,400.00 14,181 283,954
aiohttp 2.18 0.21 10.07 9,172 183,588
aiohttp-uvloop 1.29 0.24 8.99 15,419 309,930
aiohttp-gunicorn 1.80 0.23 10.40 11,131 222,769
aiohttp-gunicorn-uvloop 0.93 0.23 10.89 21,519 430,447
hug 369.86 1,500.00 14,000.00 4,246 85,017
meinheld 0.46 0.62 38.95 44,706 895,005
muffin-uvicorn 4.88 0.47 16.47 4,100 82,078
netius 2.08 0.20 7.88 9,598 192,114
pycnic-gunicorn 3.57 0.43 14.91 5,580 111,709
tornado 3.96 0.49 15.04 5,055 101,201

There’s no surprise that almost all servers perform well in this test given that this is simply a max throughput test without any real application work. Most hover in the 3-6k request per second range. The ridiculously fast servers such as bjoern, meinheld, and fastwsgi (the latter pulling in 214k requests per second) are of course the compiled servers that are written in C so they can pull in these numbers. When these servers are used with a framework such as flask, the framework – which is written in Python – becomes the bottleneck. Although having said that, 62k requests per second by falcon and bjoern is still very high.

The ASGI framework aiohttp performed really well compared to the WSGI-based falcon and flask when paired with gunicorn’s default sync worker (9.1k, 6.5k, and 4.2k respectively). Although the WSGI-based servers pull ahead when paired with optimized servers such as bjoern.

The slowest framework in this test was cherrypy, which handled less than 3k requests per second regardless of the server used.

JSON serialization

This test returns a 5.5KB JSON string with the content type “application/json”.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server Avg ms Stdev ms Max ms Req/Sec Total Requests
falcon 313.75 1,400.00 13,800.00 3,899 78,077
falcon-gunicorn 4.16 0.60 19.91 4,799 96,091
falcon-uwsgi 3.02 0.31 12.32 6,528 130,908
falcon-bjoern 1.27 0.21 8.20 15,756 315,492
falcon-bjoern-nuitka 1.28 0.22 9.09 15,618 312,426
falcon-bjoern-pypy 2.07 3.74 40.97 16,466 329,427
falcon-uvicorn 4.69 0.33 9.04 4,263 85,360
falcon-uvicorn-uvloop 3.55 0.47 17.48 5,639 112,984
cherrypy 15.25 15.17 122.74 1,604 32,134
cherrypy-uwsgi 11.52 0.80 22.17 1,728 34,609
cherrypy-tornado 13.58 0.93 23.05 1,471 29,476
cherrypy-twisted 14.52 2.64 74.54 1,377 27,595
flask 9.49 0.68 22.27 2,102 42,082
flask-fastwsgi 3.39 0.43 12.06 5,902 118,178
flask-gunicorn-eventlet 17.50 24.33 188.12 3,100 62,061
flask-gunicorn-gevent 18.85 25.62 178.69 3,125 62,578
flask-gunicorn-gthread 7.50 0.63 14.66 2,663 53,347
flask-gunicorn-meinheld 3.43 0.81 15.26 5,834 116,763
flask-gunicorn-tornado 8.00 0.74 17.59 2,499 50,070
flask-gunicorn 6.48 0.53 17.17 3,074 61,556
flask-meinheld 3.73 0.35 15.14 5,335 106,797
flask-bjoern 3.08 0.38 21.98 6,498 130,064
flask-bjoern-nuitka 2.86 0.35 20.94 6,997 140,071
fastwsgi 1.41 0.33 10.57 14,251 285,268
bottle 307.21 1,500.00 14,000.00 4,085 81,826
bjoern 1.13 0.22 8.93 17,803 356,312
bjoern-pypy 1.29 1.38 31.92 18,514 370,420
aiohttp 3.36 0.40 14.77 5,964 119,389
aiohttp-uvloop 2.39 0.35 10.88 8,389 167,978
aiohttp-gunicorn 2.94 0.25 10.07 6,790 135,875
aiohttp-gunicorn-uvloop 2.01 0.24 8.00 9,952 199,244
hug 323.22 1,500.00 14,100.00 3,193 63,935
meinheld 1.58 0.20 10.55 12,537 250,936
muffin-uvicorn 5.76 0.56 10.13 3,473 69,504
netius 5.23 0.46 10.65 3,802 76,134
pycnic-gunicorn 4.76 0.51 15.72 4,188 83,847
tornado 5.06 0.42 13.90 3,954 79,164

Like in the previous simple test, it is no surprise that most servers perform quite well in this JSON serialization test. Most servers are handling a respectable 2-6k requests per second. Although the JSON is small, all operations are performed in memory since the JSON is preloaded from a file on server start-up. The serialization of the JSON object is a CPU-intensive operation that happens in the Python code, therefore the optimization gains seen in the previous test from the servers written in C (bjoern, fastwsgi, and meinheld) begin to become less exaggerated.

The winner, although not by much is the bjoern server running with pypy which would have optimized the Python code to run more efficiently and was able to handle 18.5k requests per second. The slowest performing framework in this test was cherrypy which handled less than 2k requests per second regardless of the server type.

Simulated CPU Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also runs a loop that hogs the CPU for around 90-100ms before returning.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server Avg ms Stdev ms Max ms Req/Sec Total Requests
falcon 897.93 1,600.00 14,500.00 11 220
falcon-gunicorn 1,600.00 285.83 1,800.00 11 226
falcon-uwsgi 1,600.00 269.28 1,700.00 12 233
falcon-bjoern 1,700.00 867.70 7,000.00 11 223
falcon-bjoern-nuitka 1,700.00 868.65 7,000.00 11 223
falcon-bjoern-pypy 50.62 5.19 222.49 395 7,924
falcon-uvicorn 1,600.00 273.51 1,800.00 11 228
falcon-uvicorn-uvloop 1,700.00 253.09 2,900.00 11 218
cherrypy 1,800.00 268.35 2,100.00 10 209
cherrypy-uwsgi 1,600.00 274.33 1,700.00 11 230
cherrypy-tornado 1,800.00 51.31 1,900.00 11 220
cherrypy-twisted 1,900.00 409.84 3,200.00 10 198
flask 1,700.00 250.09 2,000.00 11 222
flask-fastwsgi 1,600.00 445.82 3,300.00 11 228
flask-gunicorn-eventlet 1,500.00 1,100.00 4,700.00 11 227
flask-gunicorn-gevent 1,600.00 1,000.00 4,200.00 11 227
flask-gunicorn-gthread 1,600.00 281.04 1,700.00 11 227
flask-gunicorn-meinheld 1,500.00 618.52 3,100.00 11 227
flask-gunicorn-tornado 1,700.00 16.39 1,700.00 11 220
flask-gunicorn 1,700.00 295.80 1,900.00 11 220
flask-meinheld 1,600.00 274.71 1,700.00 11 228
flask-bjoern 1,800.00 912.25 7,400.00 11 217
flask-bjoern-nuitka 1,000.00 481.17 4,800.00 20 395
fastwsgi 1,600.00 429.57 3,300.00 11 229
bottle 865.07 1,600.00 14,500.00 12 233
bjoern 1,800.00 882.96 7,100.00 11 223
bjoern-pypy 19.33 1.97 106.06 1,035 20,744
aiohttp 1,700.00 292.94 1,800.00 11 220
aiohttp-uvloop 1,700.00 161.27 1,800.00 11 222
aiohttp-gunicorn 1,700.00 292.77 1,800.00 11 222
aiohttp-gunicorn-uvloop 1,700.00 242.12 2,200.00 11 220
hug 1,200.00 2,000.00 14,100.00 8 156
meinheld 1,800.00 317.47 1,900.00 10 208
muffin-uvicorn 1,700.00 290.47 1,900.00 11 220
netius 1,700.00 156.77 1,900.00 11 222
pycnic-gunicorn 1,600.00 278.18 1,800.00 11 230
tornado 1,600.00 276.69 1,800.00 11 230

For this test, perhaps unsurprisingly, most servers performed almost identically since the majority of CPU time is spent in the artificial loop for around 90-100ms for each request. This leads to most servers hovering around the 10-11 request per second mark here.

The exception is when using pypy to run the server. Falcon running with bjoern and pypy ran around 36 times faster than most servers, and the bjoern with pypy ran around 94 times faster than most servers, topping 1000 requests per second.

In the real world, your mileage may vary with pypy depending on your application, but I wanted to highlight that no matter the framework when you have a CPU-bound server request, most framework and server combinations will have very similar results (no magic here I’m afraid).

Simulated IO Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also sleeps the thread for 100ms before returning (note that the sleep method is not guaranteed to wake the thread up after exactly 100ms).

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server Avg ms Stdev ms Max ms Req/Sec Total Requests
falcon 1,100.00 1,800.00 13,900.00 7 135
falcon-gunicorn 9,900.00 5,700.00 19,900.00 10 196
falcon-uwsgi 7,600.00 3,300.00 10,300.00 10 197
falcon-bjoern 4,200.00 2,300.00 19,000.00 9 183
falcon-bjoern-nuitka 4,300.00 2,400.00 19,200.00 9 185
falcon-bjoern-pypy 4,300.00 2,400.00 19,300.00 9 185
falcon-bjoern-fork 222.41 255.88 2,600.00 1,791 35,992
falcon-uvicorn 114.16 5.32 134.19 2,603 52,267
falcon-uvicorn-uvloop 113.81 10.04 269.77 2,611 52,463
cherrypy 2,100.00 1,700.00 16,200.00 95 1,917
cherrypy-uwsgi 7,700.00 3,400.00 10,500.00 10 194
cherrypy-tornado 18,500.00 0.84 18,500.00 1 30
cherrypy-twisted 1,400.00 292.01 2,000.00 192 3,848
flask 129.03 18.06 363.61 1,448 29,030
flask-fastwsgi 8,300.00 6,200.00 19,800.00 10 196
flask-gunicorn-eventlet 101.94 2.23 143.24 2,919 58,587
flask-gunicorn-gevent 101.68 1.90 136.13 2,921 58,674
flask-gunicorn-gthread 9,900.00 5,700.00 19,900.00 10 195
flask-gunicorn-meinheld 5,400.00 4,800.00 17,900.00 10 197
flask-gunicorn-tornado N/A N/A N/A N/A N/A
flask-gunicorn 9,900.00 5,700.00 19,900.00 10 195
flask-meinheld 10,000.00 5,700.00 19,900.00 10 197
flask-bjoern 4,000.00 2,000.00 18,200.00 9 178
flask-bjoern-nuitka 4,300.00 2,500.00 19,400.00 9 185
fastwsgi 9,200.00 6,300.00 19,800.00 10 198
bottle 815.78 1,100.00 14,900.00 10 196
bjoern 4,200.00 2,200.00 18,500.00 9 184
bjoern-pypy 3,800.00 2,300.00 19,700.00 9 177
aiohttp 127.75 8.99 151.92 2,316 46,508
aiohttp-uvloop 122.31 9.85 255.23 2,434 48,920
aiohttp-gunicorn 120.12 8.67 141.70 2,476 49,752
aiohttp-gunicorn-uvloop 111.93 6.03 135.04 2,656 53,320
hug 932.48 1,400.00 14,800.00 10 194
meinheld 9,900.00 5,700.00 19,800.00 10 198
muffin-uvicorn 112.23 5.09 140.91 2,647 53,190
netius 5,100.00 2,000.00 8,100.00 4 81
pycnic-gunicorn 9,900.00 5,700.00 19,900.00 10 197
tornado 9,700.00 5,900.00 19,900.00 10 197

This test is designed to simulate a server request that does not do any local processing on the server itself but makes requests to other servers or resources instead as a middleman. In real-world scenarios, this intermediate server will usually run some lightweight logic to do things such as figure out where to fetch data from, aggregate data once fetched, or any other “stitching” logic required.

This test category is where ASGI frameworks shine since each request can be handled independently of each other, whereas WSGI applications will typically block the request thread while waiting for “external” resources to respond (in this test, the main thread is blocked for the entire 100ms of the mock resource fetch).

The winner in this test was flask with gunicorn and either of gevent and eventlet with 2.9k requests per second. falcon with uvicorn (with or without uvloop) handling 2.6k requests per second is a close runner up. aiohttp with uvloop, gunicorn (with and without uvloop) handled 2.3-2.6k requests per second. muffin with uvicorn handled 2.6k requests.

Using bjoern‘s forking spawns more “workers” to handle more requests simultaneously, but this does not appear as optimized as the ASGI servers.

The slowest servers were the WSGI servers as expected since they are blocked in between handling requests.

Todo

  • Add more frameworks and servers, and combinations of frameworks and servers.
  • Publish source code
  • Explore more parameters to play with such as nuitka compilation options and pypy
  • Publish charts for visual comparison

Changelog

  • 11/10/2023 – Initial post.
Published