Benchmark: Stable Diffusion Image Generation

The Challenge

Generative AI applications, especially those providing real-time image generation, face immense computational demands. Services built on models like Stable Diffusion must handle concurrent user requests with minimal latency to provide a good user experience. Scaling these services with traditional cloud instances can be prohibitively expensive and complex.

The DGX Spark Solution

The NVIDIA DGX Spark is engineered for high-throughput inference. Leveraging its NVIDIA Blackwell architecture and optimized software like TensorRT, it can process large batches of generation requests in parallel. This makes it an ideal, cost-effective solution for deploying generative AI models at the edge or as a dedicated, at-your-desk development and testing platform before scaling to the cloud.

Technical Deep Dive

The benchmark was conducted using Stable Diffusion v1.5. To maximize performance, we utilized PyTorch with xFormers for memory-efficient attention and AITemplate to compile the model into a highly optimized engine. The test measured sustained throughput for generating 512x512 pixel images with 50 inference steps, simulating a real-world API workload.

Quantifiable Results

Under these conditions, a single DGX Spark unit achieved a sustained generation speed of 112 images per second. This level of performance enables the development of robust, responsive applications that can serve thousands of users without the high costs of large-scale cloud infrastructure.

Key Result: 112 Images/Second

The Challenge

The DGX Spark Solution

Technical Deep Dive

Quantifiable Results

Benchmark Settings

Power Your Generative AI Vision