vllm.entrypoints.api_server ¶

NOTE: This API server is used only for demonstrating usage of AsyncEngine and simple performance benchmarks. It is not intended for production use. For production use, we recommend using our OpenAI compatible server. We are also not going to accept PRs modifying this file, please change vllm/entrypoints/openai/api_server.py instead.

generate `async` ¶

generate(request: Request) -> Response

Generate completion for the request.

The request should be a JSON object with the following fields: - prompt: the prompt to use for the generation. - stream: whether to stream the results or not. - other fields: the sampling parameters (See SamplingParams for details).

Source code in vllm/entrypoints/api_server.py

@app.post("/generate")
async def generate(request: Request) -> Response:
    """Generate completion for the request.

    The request should be a JSON object with the following fields:
    - prompt: the prompt to use for the generation.
    - stream: whether to stream the results or not.
    - other fields: the sampling parameters (See `SamplingParams` for details).
    """
    request_dict = await request.json()
    return await _generate(request_dict, raw_request=request)

health `async` ¶

health() -> Response

Health check.

Source code in vllm/entrypoints/api_server.py

@app.get("/health")
async def health() -> Response:
    """Health check."""
    return Response(status_code=200)

vllm.entrypoints.api_server ¶

generate async ¶

health async ¶

generate `async` ¶

health `async` ¶