Demo Apps

Face Recognition

Voice Verification

eKYC API Tester

Speech to Text

Video Summerize

Benchmarks

This section provides benchmark information for the open-source components used in this demo. Results may vary depending on hardware and configuration.

Component	Metric	Value	Notes
Face Recognition (ageitgey/face_recognition)	Face Detection Time (per image)	~0.1 - 0.3 seconds	Varies depending on image size, number of faces, and hardware. Measurements taken on an Intel i7-8700K CPU with a NVIDIA GeForce RTX 2080 Ti GPU.
Face Recognition (ageitgey/face_recognition)	Face Encoding Time (per face)	~0.5 - 1.5 seconds	Using 128-dimensional face encodings. Similar hardware as above.
Face Anti-Spoofing (hairymax/Face-AntiSpoofing)	Inference Time (per frame)	~0.05 - 0.15 seconds	Using YOLOv5 for face detection and a custom CNN for spoof detection. Results obtained on an NVIDIA GeForce GTX 1660 Ti GPU.
Face Anti-Spoofing (hairymax/Face-AntiSpoofing)	Accuracy	~95-98%	Measured on a custom dataset containing print attacks, replay attacks, and real faces. Accuracy can vary based on the quality of the training data.
SpeechBrain (speechbrain/speechbrain)	Voice Activity Detection (VAD) Processing Time (per second of audio)	~0.02-0.05 seconds	Using the CRDNN-based VAD model. Performance measured on an Intel Xeon CPU.
SpeechBrain (speechbrain/speechbrain)	Speaker Verification Time (per pair of audio files)	~0.3-0.7 seconds	Using the ECAPA-TDNN model trained on VoxCeleb. Performance can improve with GPU acceleration.
Whisper (openai/whisper)	Transcription Time (per minute of audio)	~10-60 seconds	Using the 'small' model. Transcription time varies drastically depending on model size, audio quality, and hardware (CPU vs. GPU).
Whisper (openai/whisper)	Word Error Rate (WER)	~5-15%	WER depends heavily on the language, accent, and background noise.
Gemma (huggingface.co/blog/gemma3)	Time take to generate summary with Gemma-4b	~10-120 seconds	Using the 'gemma3:4b' model with ollama API. Generation time varies drastically depending on vision content and voice context size, and hardware(CPU).

Disclaimer: These benchmarks are for informational purposes only and may not reflect real-world performance. Performance can vary significantly based on hardware, software configuration, and data characteristics.

Notifications

No new notifications at this time.