
Scalable ML Infrastructure for CV Model
Development
SQUAD designed and deployed a cloud-native ML infrastructure for computer vision model development that reduced training time, lowered infrastructure overhead, and improved the efficiency of large-scale data processing.
7x faster model training
from 21 days to 3 days
$3.2M annual cost savings
$2M on GPU compute and $1.2M on storage
55-point drop in preprocessing
from 70 percent to 15 percent of sprint capacity
Client at a Glance
Service Type
MLOps and infrastructure for computer vision model development
Industry
Consumer electronics and smart security cameras
Engagement
Long-term collaboration on AI infrastructure and model development
Region
Global
The client is a global consumer electronics brand that produces smart indoor and outdoor security cameras.
Challenge
The client’s computer vision team was working with a very large video dataset and increasingly complex training pipelines, but the supporting ML infrastructure was not scaling efficiently.
This created several issues:
A 14 PB video dataset required heavy preprocessing, which consumed about 70 percent of the AI team’s sprint capacity.
The training and evaluation cycle for computer vision models took up to one month on average, slowing down iteration and validation.
Uncontrolled training runs led to significant time and cost overruns.
Manual cloud resource management caused compute downtime and reduced cost efficiency.
The client needed an infrastructure layer to scale computer vision development, reduce operational overhead, and improve distributed training and cloud resource utilization.

Solution
SQUAD designed and implemented a cloud-native ML infrastructure for scalable computer vision model development and evaluation.
The main elements of the solution were:
Deployment of a Kubernetes-based ML toolkit with automated resource management, supporting PyTorch, NVIDIA DALI, and OpenMMLab.
Development of a specialized data loading library to streamline frame extraction, data transformations, and multi-modal input for HPC training pipelines.
Implementation of an Infrastructure-as-Code stack for automated AWS pipeline management, with cost-optimized instances and automatic termination controls.
Introduction of a standardized approach for distributed training and evaluation, improving repeatability across the team.
Technologies and frameworks
The work relied on the following tools and platforms:
Core technologies: AWS, PyTorch, NVIDIA DALI, OpenMMLab, Kubeflow, Python
Data processing: OpenCV, Turbo-JPEG, Albumentations, Kornia, FFmpeg
Optimization and monitoring: Optuna, CloudWatch
Results & Impact
technical outcomes
Seven times faster model training
The overall model training cycle was reduced by a factor of seven, from 21 days to 3 days, allowing the client to iterate on computer vision models much faster.
Faster data access and loading
Data fetching time was reduced by four times, from 14 seconds to 3.8 seconds for 8k images, and sensor pipeline data loading became five times faster, from 150 minutes to 30 minutes.
business outcomes
Lower compute and storage costs
Annual GPU-based EC2 costs were reduced by 2 million dollars, and storage costs were reduced by 1.2 million dollars by optimizing 4 PB of processed data out of a 14 PB dataset.
Better cloud resource utilization
Cloud resource utilization was stabilized at about 80 percent load on GPU-based instances, improving efficiency and reducing waste across training workloads.
customer outcomes
Faster delivery of AI-based features
By shortening training and evaluation cycles, the client was able to move computer vision features through development more quickly and support a broader product roadmap.
More engineering time focused on model development
Dataset preprocessing effort fell from 70 percent to 15 percent of sprint capacity, giving the team more time for model quality and feature development.
Contact us
by filling out
the form
to get started.
Get In Touch
Other Cases

Fisheye Distortion Correction for Wide Angle Security Cameras
Consistent rectified video feed
Reduced geometric distortion
Real-time dewarping at 30 FPS

Development and Optimization of Edge Computer Vision Algorithms