Categories
All Softwares
Sublime Text VSCode Binaries Postman TeamViewer Firebase Visual Studio Code Contentful Tribe Circle Notion Datadog NewRelic Vmix Archbee Docker Desktop Bitly GitHub BitBucket Upsource Zapier Make Medium substack Facebook Amazon S3 Maya 3DS MAX Adobe Substance Airtable Roam Research Azure DevOps Retool Powerapps Appsheet 1Password Plex Emby Netflix Apple HomeKit Algolia Lightshot Confluence Toad tableau Data Studio Salesforce SAP Calendly Google photos Bloomberg Terminal BigQuery ML Google AutoML Tables Shopify BigCommerce Google Drive Redis Memcached Windows media player WhatsApp Heroku Render Looker Quizlet Google Analytics Auth0 Trello Elasticsearch Adobe Premiere Pro Zerotier Zoom Skype Docker Polypane Google Chrome Microsoft Edge Safari Gitbook Gmail Google Vertex AI Kdb+ Amplitude Google Docs Typora Roboflow ML Kit Azure Intercom Quicken YNAB Uptime Robot Figma npm TigerGraph Amazon Neptune Fivetran Okta YouTube LastPass Mailchimp Sendinblue Adobe Acrobat Pocket Reddit Onenote Shogun DaVinci Resolve UiPath Taliscale Adobe Lightroom FullStory LogRocket RescueTime Boxcryptor LaunchDarkly ArcGIS AWS SageMaker Tailscale NordVPN WooCommerce Twitter Dropbox Nagios Zabbix Prtg Google Cloud Webflow ActiveCampaign Quickbooks .Net Maui Airplane.dev Pipedream Evernote Autodesk AutoCAD HCL Connections Google Sheets Excel Rundeck Ansible Tower Salt Twilio Pastebin Zoho Unity3D GameMaker AWS Config GCP Cloud Asset inventory AWS GuardDuty Unreal Engine (UE4) Jira YouTrack Stytch Suite CRM Greynoise Photoshop LinkTree BlackBoard Zendesk Discord Rollout.io Disqus Oracle Fusion ERP Cloud Odoo Microsoft Dynamics Alfred Sophos Firewall UniFi Security Gateway Azure AD Doodle Office Online Power BI MicroStrategy Qlik Ampache Socrata Drone CI IOS WordPress IDM FDM Ninja Download Manager McAfee Google Meet WIX cPanel LucidChart HubSpot Landbot Typeform CCleaner Ecwid Spotify Stackstrom N8N Substance Painter Onshape SketchUp Canny Miro XMind Segment GoogleForms Adobe Illustrator MultiSim Proteus Prezi Slack Microsoft Teams SumSub JAWS Wetransfer Framer Microsoft 365 Telegram Threema Signal Lokalise Crowdin Phrase WolframAlpha Dataclay Templater Bot WorkOS FrontEgg Snorkel AI ZohoCRM Voicemod Chromatic Percy POEditor Transifex Microsoft Office Selenium vBulletin Xenforo Hightouch Logseq Bundlephobia Webpack Esbuild Rollup Session Berty WHMCS Stripe Billing Google Camera ImgIX Netlify Google Keep SocialPilot Hootsuite Firebase Analytics Access Manager Wordle Amazon Redshift Snowflake Microsoft Active Directory ClubHouse Tenable Nessus Obsidian Scrivener IDA Neo4j Pushbullet Pushover TinkerCAD Fusion360 SolidWorks TablePlus Cryptomator Glasswire Comodo Firewall Coyim Splunk Hungry Bring Panther IFTTT openHAB Alexa Google Home Twitch Asana IBM Watson Discovery FL Studio Ableton Google Maps Gather Aseprite Instagram Agora Wowza Docuware ELO Office Apollo GraphQL Supabase Hasura Stepzen Postgraphile Lyket.dev Kahoot Clubdesk Fairgate Bandicam Revoltchat Element Imply Pinot MongoDB Oracle Peoplesoft CurseForge Google Tag Manager MS SQL AppWrite Nhost AWS Kendra QnA Maker Apigee Google Cloud IoT Core Microsoft OneNote Amazon API Gateway Qualtrics Sprig Hotjar Sibelius Finale Dorico Snyk Common Room Orbit Toggl Track Adobe Scan Microsoft Lens CamScanner Vercel Stack Overflow Traktor Pro 3 Markup CMS Documentation Atlassian Confluence Raindrop Akeneo Salsify Informatica SuiteCRM VtigerCRM Cruise Tesla autopilot Waymo Adobe Animate Pencil2D Men&Mice Solarwinds Infoblox Device42 AWS WAF
Jina.ai

Jina.ai

Open Source Alternative to Algolia
Language
Python
Stars
21556
Watchers
21556
Forks
2226
Open Issues
17
Last Updated
5/9/2025

REAMDE.md

Jina-Serve

PyPI PyPI - Downloads from official pypistats Github CD status

Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.

Key Features

  • Native support for all major ML frameworks and data types
  • High-performance service design with scaling, streaming, and dynamic batching
  • LLM serving with streaming output
  • Built-in Docker integration and Executor Hub
  • One-click deployment to Jina AI Cloud
  • Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI

Key advantages over FastAPI:

  • DocArray-based data handling with native gRPC support
  • Built-in containerization and service orchestration
  • Seamless scaling of microservices
  • One-command cloud deployment

Install

pip install jina

See guides for Apple Silicon and Windows.

Core Concepts

Three main layers:

  • Data: BaseDoc and DocList for input/output
  • Serving: Executors process Documents, Gateway connects services
  • Orchestration: Deployments serve Executors, Flows create pipelines

Build AI Services

Let's create a gRPC-based AI service using StableLM:

from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline


class Prompt(BaseDoc):
    text: str


class Generation(BaseDoc):
    prompt: str
    text: str


class StableLM(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

Deploy with Python or YAML:

from jina import Deployment
from executor import StableLM

dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)

with dep:
    dep.block()
jtype: Deployment
with:
 uses: StableLM
 py_modules:
   - executor.py
 timeout_ready: -1
 port: 12345

Use the client:

from jina import Client
from docarray import DocList
from executor import Prompt, Generation

prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])

Build Pipelines

Chain services into a Flow:

from jina import Flow

flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)

with flow:
    flow.block()

Scaling and Deployment

Local Scaling

Boost throughput with built-in features:

  • Replicas for parallel processing
  • Shards for data partitioning
  • Dynamic batching for efficient model inference

Example scaling a Stable Diffusion deployment:

jtype: Deployment
with:
 uses: TextToImage
 timeout_ready: -1
 py_modules:
   - text_to_image.py
 env:
  CUDA_VISIBLE_DEVICES: RR
 replicas: 2
 uses_dynamic_batching:
   /default:
     preferred_batch_size: 10
     timeout: 200

Cloud Deployment

Containerize Services

  1. Structure your Executor:
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
  1. Configure:
# config.yml
jtype: TextToImage
py_modules:
 - executor.py
metas:
 name: TextToImage
 description: Text to Image generation Executor
  1. Push to Hub:
jina hub push TextToImage

Deploy to Kubernetes

jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s

Use Docker Compose

jina export docker-compose flow.yml docker-compose.yml
docker-compose up

JCloud Deployment

Deploy with a single command:

jina cloud deploy jcloud-flow.yml

LLM Streaming

Enable token-by-token streaming for responsive LLM applications:

  1. Define schemas:
from docarray import BaseDoc


class PromptDocument(BaseDoc):
    prompt: str
    max_tokens: int


class ModelOutputDocument(BaseDoc):
    token_id: int
    generated_text: str
  1. Initialize service:
from transformers import GPT2Tokenizer, GPT2LMHeadModel


class TokenStreamingExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = GPT2LMHeadModel.from_pretrained('gpt2')
  1. Implement streaming:
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
    input = tokenizer(doc.prompt, return_tensors='pt')
    input_len = input['input_ids'].shape[1]
    for _ in range(doc.max_tokens):
        output = self.model.generate(**input, max_new_tokens=1)
        if output[0][-1] == tokenizer.eos_token_id:
            break
        yield ModelOutputDocument(
            token_id=output[0][-1],
            generated_text=tokenizer.decode(
                output[0][input_len:], skip_special_tokens=True
            ),
        )
        input = {
            'input_ids': output,
            'attention_mask': torch.ones(1, len(output[0])),
        }
  1. Serve and use:
# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
    dep.block()


# Client
async def main():
    client = Client(port=12345, protocol='grpc', asyncio=True)
    async for doc in client.stream_doc(
        on='/stream',
        inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
        return_type=ModelOutputDocument,
    ):
        print(doc.generated_text)

Support

Jina-serve is backed by Jina AI and licensed under Apache-2.0.