-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance serialization speed #360
Comments
Just to confirm, you do have Route 1 is going to be fastest as it is just returning the JSONResponse directly. Route 2 is taking the dict, running I could be wrong on this one, have never really looked at the response portion of the fastapi code, but from a quick glance FWIW, with ujson on a MBP:
|
I don't have ujson installed and my machine was a dual i5-2697v2 (until today when it seems I fried my mb bios...) well. |
I'm investigating it, a bit, with line_profiler. Maybe calling
|
Maybe, it is good that running |
And ujson's effect is limited, maybe. because calling fastapi/routing.py:getapp()
|
And this test case's input values are not valid encodable expression. So, the bottoms of each recursive call, exception is raised again and again, and, in generally, exception handling is high-cost, so this code is too slow. |
yep it's a mistake, I corrected it and don't see same discrepancies, sorry for that ! it seems the original code I based this test case on might have same sort of bug that I didn't detect because of the Any type, which as you said raises exceptions thanks for the hint, will dig more and see what's happening, Any is evil |
I am also experiencing the same/related issue whereby a large nested payload takes ~10 mins to reach a point in the api whereby i can assign it to a variable. I have removed pydantic validation etc to just leave the bare endpoint and tried ujson but this doesnt seem to have helped significantly. for reference, the same payload is processed in ~1.8 seconds in flask. |
@Charlie-iProov any chance you could put together a minimal example that is much faster in flask? It could be a good starting point for performance work. |
Hi heres a quick example - not sure where to host the files so just made a public repo https://github.com/MarlieChiller/api-serialisation-comparison the test uses a payload that contains a nested base64 encoded image string as that was my initial use case where i discovered the difference. However, you can swap the image out for an array and i found the difference is still present (although the time difference was reduced). I think the larger the array length, the larger the time discrepancy though |
for reference, in the base example in that repo i was getting approximately 44 seconds in fastapi to 1.4 seconds in flask. When i switched test cases to an array, i used a length of 10000 instead of the image string. Hope that helps |
@MarlieChiller Something is definitely behaving strangely here. I'm getting similar results to you when I run your script. On the other hand, if I make use of the ASGI test client, I get performance in line with flask (maybe a little faster) -- ~1.3s of execution time on my machine (vs ~1.4 for the flask going through the server): import base64
import json
import sys
from datetime import datetime
import requests as r
from fastapi import FastAPI
from starlette.testclient import TestClient
app = FastAPI(title="fast_api_speed_test")
@app.post("/test")
async def endpoint(payload: dict):
if payload:
print(type(payload))
return 200
test_client = TestClient(app)
def main():
iterations = 1000
with open("black.png", "rb") as image_file:
img = image_file.read()
img = base64.b64encode(img)
fast_api = send_request(iterations, img, 8000)
print("fastapi speed >>> ", fast_api)
def send_request(iterations, encoded_string_img, port):
payload = {"count": iterations, "payload": []}
for i in range(iterations):
payload["payload"].append(
{"arbitrary_field": f"{i}", "image": encoded_string_img.decode("utf-8")}
)
print(sys.getsizeof(json.dumps(payload)))
x = datetime.utcnow()
response = test_client.post("/test", json=payload)
y = datetime.utcnow() - x
print(response.content)
return y
if __name__ == "__main__":
main() Because of this, I don't think the performance issue is with fastapi, but maybe uvicorn instead? I'm looking into it some more... |
out of the blue, can it be sync stuff ?
…On Tue, Oct 1, 2019 at 11:48 AM dmontagu ***@***.***> wrote:
@MarlieChiller <https://github.com/MarlieChiller> Something is definitely
behaving strangely here. I'm getting similar results to you when I run your
script. On the other hand, if I make use of the ASGI test client, I get
performance in line with flask (maybe a little faster) -- ~1.3s of
execution time on my machine (vs ~1.4 for the flask going through the
server):
import base64import jsonimport sysfrom datetime import datetime
import requests as rfrom fastapi import FastAPIfrom starlette.testclient import TestClient
app = FastAPI(title="fast_api_speed_test")
@app.post("/test")async def endpoint(payload: dict):
if payload:
print(type(payload))
return 200
test_client = TestClient(app)
def main():
iterations = 1000
with open("black.png", "rb") as image_file:
img = image_file.read()
img = base64.b64encode(img)
fast_api = send_request(iterations, img, 8000)
print("fastapi speed >>> ", fast_api)
def send_request(iterations, encoded_string_img, port):
payload = {"count": iterations, "payload": []}
for i in range(iterations):
payload["payload"].append(
{"arbitrary_field": f"{i}", "image": encoded_string_img.decode("utf-8")}
)
print(sys.getsizeof(json.dumps(payload)))
x = datetime.utcnow()
response = test_client.post("/test", json=payload)
y = datetime.utcnow() - x
print(response.content)
return y
if __name__ == "__main__":
main()
Because of this, I don't think the performance issue is with fastapi, but
maybe uvicorn instead? I'm looking into it some more...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#360>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAINSPWJKQOQ3KAWLPO656DQMMMILANCNFSM4H4YY6OA>
.
--
benoit barthelet
http://pgp.mit.edu/pks/lookup?op=get&search=0xF150E01A72F6D2EE
|
@euri10 I don't think that should be the issue (again, the ASGI TestClient speed seems to indicate the problem is not with application-level stuff). But this is deeply disturbing. To be fair, it is a ~175MB payload, but I still think it should be significantly faster to process. |
I didnt see Flask code but printing stuff take lts of time
…On Tue, Oct 1, 2019 at 11:54 AM dmontagu ***@***.***> wrote:
@euri10 <https://github.com/euri10> I don't think that should be the
issue (again, the ASGI TestClient speed seems to indicate the problem is
not with application-level stuff). But this is deeply disturbing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#360>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAINSPXKXEPKX33ZS5GCUFLQMMM33ANCNFSM4H4YY6OA>
.
--
benoit barthelet
http://pgp.mit.edu/pks/lookup?op=get&search=0xF150E01A72F6D2EE
|
Okay, so if you change the annotation from I think it may be worth trying to run the app using a different server (e.g., hypercorn) to see if that has any impact. |
@MarlieChiller I simplified your script a bit, isolating the problem as specifically the payload size (and using just starlette, not even fastapi): import sys
from datetime import datetime
import requests
import uvicorn
from requests import Session
from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import Response
from starlette.testclient import TestClient
app = Starlette()
@app.route("/", methods=["POST"])
async def endpoint(request: Request):
payload = await request.json()
assert isinstance(payload, dict)
return Response("success")
def _speed_test(session: Session, url: str):
payload = {"payload": "a" * 100_000_000}
start = datetime.utcnow()
response = session.post(url=url, json=payload)
elapsed = datetime.utcnow() - start
assert response.status_code == 200
assert response.content == b"success"
print(elapsed)
def asgi_test():
client = TestClient(app)
_speed_test(client, "/")
def uvicorn_test():
session = requests.Session()
_speed_test(session, f"http://127.0.0.1:8000/")
def main():
if "--asgi-test" in sys.argv:
asgi_test()
# 0:00:00.650825
elif "--uvicorn-test" in sys.argv:
uvicorn_test()
# 0:00:17.502396
# cProfile:
# Name Call Count Time (ms) Own Time (ms)
# body 391 16670 16649
else:
uvicorn.run(app)
if __name__ == "__main__":
main() I'm going to post an issue on the starlette and the uvicorn repos about this. |
sounds good, thanks for the help |
I posted to starlette and uvicorn just now, I guess we'll see if there is any response there! |
@MarlieChiller I got to the bottom of this -- it was due to how the request body was being built by starlette. I opened a PR to fix it: encode/starlette#653 |
Nice! Why was |
It has to do with the way strings work in python — a new string is created in memory out of the two inputs every time you call +=. So you are basically making a copy of everything you’ve seen every time the += is called. The list-joining approach doesn’t do any copying until it puts all the pieces together at the end. |
Just ran into this as well. The example below is taking about 30ms: @app.get('/', response_class=UJSONResponse)
async def root():
return {'results': list(range(10000))} doing the ujson dumps in the body cuts that down to around 3-5ms: @app.get('/', response_class=UJSONResponse)
async def root():
return ujson.dumps({'results': list(range(10000))}) |
Here's some strange behavior I don't understand: import time
from fastapi import FastAPI
from starlette.responses import UJSONResponse, Response
from starlette.testclient import TestClient
app = FastAPI()
@app.get('/a', response_class=UJSONResponse)
async def root():
content = {'results': list(range(10000))}
return content
@app.get('/b', response_class=Response)
async def root():
content = {'results': list(range(10000))}
return UJSONResponse.render(None, content)
# return ujson.dumps(content, ensure_ascii=False).encode("utf-8")
client = TestClient(app)
t0 = time.time()
for _ in range(100):
client.get("/a")
t1 = time.time()
print(t1 - t0)
# 1.7897768020629883
t0 = time.time()
for _ in range(100):
client.get("/b")
t1 = time.time()
print(t1 - t0)
# 0.32788991928100586 Seems like it's not using UJSONResponse properly; might be a bug. EDIT: It is using UJSONResponse; the problem is that it is also applying the relatively poorly performing |
I investigated -- the problem is that This seems like a pretty substantial shortcoming -- I think there should be a way to override the use of A 6x overhead is not good! |
@dmontagu was about to mention that it's mostly likely the |
that's why I stopped using response_class, but I recon people may want to
use the swagger niceness of having it documented
…On Thu, Oct 3, 2019 at 3:52 AM Mike ***@***.***> wrote:
@dmontagu <https://github.com/dmontagu> was about to mention that it's
mostly likely the jsonable_encoder and all of the validation code.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#360>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAINSPQ2USMGKKEID7LF7IDQMVF6ZANCNFSM4H4YY6OA>
.
--
benoit barthelet
http://pgp.mit.edu/pks/lookup?op=get&search=0xF150E01A72F6D2EE
|
Yeah, it's also easy enough to write a decorator that performs the conversion to a response for endpoints you know are safe. Something like: def go_fast(f):
@wraps(f)
async def wrapped(*args, **kwargs):
return UJSONResponse(await f(*args, **kwargs))
return wrapped (Might want to use |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Description
I have to return sometimes big objects, I'm constrained in the fact that chunking them is not an option
An example of such an object would be a dict
{"key:: value}
where value is a list of list, 20 list of 10k elements.I wrote this simple test case that shows quite clearly the massive hit in several scenarios (run with
pytest tests/test_serial_speed.py --log-cli-level=INFO
)Here's the output:
all routes do the same with slight variations:
as you can see the time taken to build such an object is small, around 0.05s, but...
route1 just returns it, it takes 9s
route2 returns it but has the
response_model=BigData
in the signature, it takes 1s moreroute3 is not intuitive to me, I thought that by already building a BigData object and returning it, there would be no penalty, but it's again slower
How can I [...]?
improve performance
edit: the tests are available at this branch, can PR should you want to https://github.com/euri10/fastapi/tree/slow_serial
The text was updated successfully, but these errors were encountered: