AI Acceleratorfor AI CCTV

iSpur NPU

Overview

Dedicated accelerator for AI

The Supergate NPU is an AI processor designed for high-performance vision AI and generative AI inference. Based on Supergate's custom ISA and memory architecture, it is optimized for high-speed on-device processing of various computer vision and natural language processing models.

Key Features

Up to 64 TOPS / 32 TFLOPS performance

AI computational accelerators and parallel architectures simultaneously process high-resolution image analysis and large-scale language model inference.

Multi-channel simultaneous processing

Parallel analysis of multi-channel real-time video streams at 15 to 30 FPS to accurately detect more objects.

Support for language models (sLM)

Supports generative AI inference with our own AI compiler and low-level (C/C++) optimization backend.

Designed for power efficiency

By optimizing data locality using embedded SRAM memory in a hierarchical structure, DRAM access is minimized, enabling low-power, high-efficiency computation.

On-device inference optimization

It supports data types such as FP16, FP8, and INT8, and can be optimized through quantization.

AI-specific ISA implementation

Users can directly control operators specialized for generative AI operations.
Supports open source and extensible hardware ISA design.

Various AI Model Zoo

We help you effectively develop and deploy AI applications by providing a Zoo of diverse AI models ported to SuperGate NPUs and proven for performance.

Providing on-device serving by directly optimizing the PyTorch-based GGUF model.
Supports over 60 of the latest SLM models

AI development environment

A developer-friendly development environment, including an AI compiler, helps achieve effective development and deployment of various AI models.

Specifications

AI processing capabilities

The Supergate NPU's AI vision processor provides optimized generative AI-based language models along with various vision processing and video processing algorithms.

Memory architecture

Maximizes data transfer speeds and minimizes power consumption through a unique memory architecture.

AI development environment and ecosystem

Includes model conversion tools: Supports various formats including GGUF, HuggingFace, safetensors, and pytorch.
Unified SDK: Command-line API, performance analysis tools, debugging tools

Support Model

Vision Model (Image-Based AI)

YOLOv5/v8
ViT (Vision Transformer)
DeepSORT
SAM (Segment Anything)

Generative AI Model (LLM)

LLaMA 2/3
TinyLlama
Mistral
Qwen
CodeLlama
Baichuan, Gemma, etc.

Application Cases

Smart City

CCTV object detection + automatic generation of natural language descriptions

Industrial sites

Equipment status recognition + maintenance alert generation

Drone Analyzer

Real-time video analysis and command-based narrative generation

Cashierless Store

Behavior detection + automatic product description provision

Performance

Computational performance

Up to 64 TOPS / 32TFLOPS

Image inference

8 channels @ 30FPS

Language processing speed

Up to 420 tokens/sec (based on LLaMA 2 7B)

Power consumption

About 3~5W (based on edge camera environment)

Additional Support

01

PCIe-based host integration
(x86 host → NPU command transfer)

02

LLM quantization and fixed-point support

03

Model Zoo continuous updates and 24/7 support