Deep dive into the technical architecture of ZhiShuYun's million-scale batch code generation engine, including async task scheduling, Redis queue optimization, database sharding strategies, and maintaining code generation stability under high concurrency.

In anti-counterfeit traceability systems, the code generation engine is one of the most critical infrastructure components. ZhiShuYun's platform generates over 5 million anti-counterfeit codes daily, with peaks reaching tens of millions, and has cumulatively generated over 10 billion codes. This article provides an in-depth look at the architecture design and practical experience of our million-scale async code generation engine.

Traditional synchronous code generation approaches face clear bottlenecks when handling large-volume demands: single request timeouts, database write hotspots, and code resource pre-allocation conflicts. We adopted an async + task queue architecture: after the frontend submits a code generation request, the system writes the task to a Redis queue, with backend workers asynchronously consuming and processing. Users can view code generation progress in real time through the SaaS backend, with automatic notifications upon task completion.

At the database level, we implemented a sharding strategy. Using tenant ID as the sharding key, different enterprises' code data is distributed across different database instances. Individual table data is controlled within 5 million rows, combined with read-write separation and index optimization, ensuring scan query response time remains stable within 150ms. We also use the Snowflake algorithm to generate globally unique code IDs, avoiding ID conflicts in distributed environments.

Redis plays multiple roles in the system: task queue (List structure), code generation progress cache (Hash structure), and hot code data cache (String structure). For large-tenant batch code generation scenarios, we implemented a Pipeline batch write mechanism, reducing single Redis operations from N to O(log N), significantly lowering network round-trip overhead.

For fault tolerance and reliability, we designed a three-tier retry mechanism: Worker exception retry (3 times, exponential backoff), task timeout transfer (automatic re-queue after 30 minutes of no response), and dead letter queue fallback (manual intervention). Since system launch, code generation success rate has remained above 99.99%. We also implemented full-chain monitoring through Prometheus + Grafana, tracking queue backlog, worker load, and code generation latency in real time.

Design and Practice of a Million-Scale Async Code Generation Engine

Related Resources