Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance for large (~100MB) payloads #443

Closed
dmontagu opened this issue Oct 1, 2019 · 4 comments
Closed

Poor performance for large (~100MB) payloads #443

dmontagu opened this issue Oct 1, 2019 · 4 comments

Comments

@dmontagu
Copy link

dmontagu commented Oct 1, 2019

(Cross posted to encode/starlette#651; I'm not sure where the issue lies, but I currently think it is either starlette or uvicorn.)

I'm noticing especially slow handling of large request bodies when running in uvicorn, and I'm trying to get to the bottom of it.

If this kind of performance is expected for for payloads of this size for any reason, please let me know.


The script below posts a large payload (~100M in size) to a starlette.applications.Starlette endpoint, which just returns a success response. Running via the starlette TestClient, I get a response in ~0.65 seconds; running via uvicorn it takes ~17.5 seconds (or ~27x slower).

(I'll note that this discrepancy becomes much smaller as the size of the payload decreases -- I think it was about 3-5x for 10MB, and not really significant below 1MB.)

I was able to get speeds comparable to the TestClient runs speed using a flask implementation. I also get similar slowdowns when running via gunicorn with a uvicorn worker (I haven't tested other servers; not sure if there are recommended alternatives).

Click here to expand script
import sys
from datetime import datetime

import requests
import uvicorn
from requests import Session
from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import Response
from starlette.testclient import TestClient

app = Starlette()


@app.route("/", methods=["POST"])
async def endpoint(request: Request):
    payload = await request.json()
    assert isinstance(payload, dict)
    return Response("success")


def _speed_test(session: Session, url: str):
    payload = {"payload": "a" * 100_000_000}
    start = datetime.utcnow()
    response = session.post(url=url, json=payload)
    elapsed = datetime.utcnow() - start
    assert response.status_code == 200
    assert response.content == b"success"
    print(elapsed)


def asgi_test():
    client = TestClient(app)
    _speed_test(client, "/")


def uvicorn_test():
    session = requests.Session()
    _speed_test(session, f"http://127.0.0.1:8000/")


def main():
    if "--asgi-test" in sys.argv:
        asgi_test()
        # 0:00:00.650825
    elif "--uvicorn-test" in sys.argv:
        uvicorn_test()
        # 0:00:17.502396
        # cProfile:
        # Name   Call Count   Time (ms)  Own Time (ms)
        # body      391         16670        16649
    else:
        uvicorn.run(app)


if __name__ == "__main__":
    main()

This script can perform three actions:

  • Start the uvicorn server if run without arguments
  • Hit the uvicorn server if run with the argument --uvicorn-test (requires the server to have been previously started)
  • Use starlette's TestClient if executed with the argument --asgi-test

The script performs only a single request, but the speed difference is very consistently this extreme.


I ran cProfile over the server while the slow response (to a single request) was being generated, and by far the line that stood out was:

Name   Call Count   Time (ms)  Own Time (ms)
body      391         16670        16649

where body here is a reference to the starlette.requests.Request.body method. Nothing else was remotely close in the Own Time column. (Only uvloop.loop.Loop.run_until_complete was more than 1%, and I think that was just downtime while waiting for me to trigger the request.)


This was originally an issue posted to fastapi fastapi/fastapi#360, but seems to be an issue with either uvicorn or starlette. (I am going to cross post this issue to uvicorn as well.) In that issue, a (more complex) script was posted comparing the performance to that of flask; flask was able to achieve similar performance to what I can get using the ASGI TestClient.

@dmontagu
Copy link
Author

dmontagu commented Oct 1, 2019

I set this up to run using hypercorn (with and without uvloop) by changing the else clause in the main() function to:

Click here to expand
        # Uncomment as appropriate

        # Hypercorn:
        from hypercorn.asyncio import serve, Config

        config = Config()
        config.bind = ["localhost:8000"]  # As an example configuration setting
        asyncio.run(serve(app, config))

        # Hypercorn + uvloop 
        # import uvloop
        # from hypercorn.asyncio import serve, Config
        # config = Config()
        # config.bind = ["localhost:8000"]  # As an example configuration setting
        # asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
        # loop = asyncio.new_event_loop()
        # asyncio.set_event_loop(loop)
        # loop.run_until_complete(serve(app, config))

        # Uvicorn:
        # import uvicorn
        # uvicorn.run(app)

Where uvicorn handles the request in ~16-17s, hypercorn without uvloop handles it in ~1m6s, and hypercorn with uvloop handles it even slower at ~1m13s 👀. So maybe this is a starlette issue after all...

Either way, any insight or suggested lines of investigation would be appreciated!

@gvbgduh
Copy link
Member

gvbgduh commented Oct 2, 2019

Hey @dmontagu, nice investigation! I'll try to have a deeper look a bit later this week.

It's interesting as it's not the most common scenario, as some cloud providers and services will rather return 413 for that size of payload, but it's definitely an interesting one.

On the other side, I'm a bit in doubt if the test client provides a "fair" time, I have a sneaky suspicion it may reuse already allocated memory for the payload avoiding expensive operation.

And also just to clarify, are we talking only about the json payload?
Or are you interested in multi-part, binary, streaming as well?

@dmontagu
Copy link
Author

dmontagu commented Oct 2, 2019

I am actually more interested in the multi-part speed since I already make use of the file submission capabilities for large data transfers in some of my projects, but I have only noticed the issue (and tested) for the json body.

I haven't noticed any serious issues in my projects due to this issue (although perhaps some of my endpoints could be more responsive...). It's just that the ~40x slowdown I saw in fastapi/fastapi#360 scared me, and I was hoping to get in front of potentially serious performance issues.

(More generally, from an advocacy perspective, I would rather just prevent situations where someone could start arguing that starlette is 50 times slower than flask...)

I wouldn't mind if the end conclusion was that streaming / file submissions currently have no performance issues, and bodies should be limited in size for performance. (I would be especially fine with this if it was due to a tradeoff that made handling small bodies more performant). I just want to understand what is going on.

(And fix it if something has a problem.)


It wouldn't surprise me if the TestClient was "cheating" a little bit, but the flask server in the fastapi issue I linked above was able to handle the exact same request that was being sent to fastapi only about 10% slower than the asgi client. That example involved a ~175MB payload; on my computer it took ~1.3 seconds via asgi, ~1.4 seconds via flask (running in a separate process, listening to a local port), and ~52 seconds via fastapi (and fastapi was doing basically nothing on top of starlette).

If necessary, I can port the code in my script above to work with flask for comparison, but I'd rather not if possible 😬 -- I have very little experience with flask and just relied on the implementation provided by @MarlieChiller.


I agree that these payloads are unreasonably large, but even in the high single-digit megabyte range the slowdown is substantial (just not a factor of 30+ 😄), I think I was still seeing ~2x slowdown (it is easy to test by changing the number of "a"s in the script above). And I think this size might be realistic for json payloads in some data-oriented use cases.

@dmontagu
Copy link
Author

dmontagu commented Oct 2, 2019

@gvbgduh I got to the bottom of it -- it was in fact the body function from starlette.

This line was causing quadratic scaling for request building, which was becoming very significant with large request sizes:
https://github.com/encode/starlette/blob/6a1c7d38c3c95cddfc76791d595ccfb96157e598/starlette/requests.py#L183

I did testing with smaller payloads, and even at the 5MB mark changing this caused the server to handle the request ~15-20% faster.

I opened encode/starlette#653 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants