Tag

Machine Learning Posts

Vector Database Architecture: How Vector Search Powers RAG Systems

March 09, 2026 at 00:00

I built my first vector search system with a flat numpy array and brute-force cosine similarity. Three hundred fifty chunks, 1024 dimensions, under 2MB. Search completed in microseconds. That works fine for a few hundred documents. It stops working when you hit millions of vectors, need sub-10ms latency at thousands of queries per second, and your index no longer fits in memory on a single node. That is where vector databases earn their place: they solve the hard problem of approximate nearest neighbor search at scale, and they form the retrieval backbone of every serious RAG (Retrieval-Augmented Generation) system in production today.

Video Content Moderation with SageMaker Pipelines and Open-Source Models

February 25, 2026 at 00:00

I have built video analysis pipelines that process thousands of uploads per day, routing each file through multiple ML models for content moderation, face recognition, transcription, and object detection. The architecture I keep returning to uses SageMaker Pipelines as the orchestration backbone, with open-source models deployed across Processing Jobs and Batch Transform steps. This approach gives you full control over model versions, GPU instance selection, and inference logic without per-API-call pricing from managed AI services. The tradeoff is real: you own every container, every model artifact, and every failure mode. This article is the architecture reference for building that pipeline. I cover model selection for each analysis domain, the SageMaker Pipeline DAG design, GPU instance sizing, and the operational patterns that keep it running at scale. If you need a deeper understanding of how SageMaker Pipelines work under the hood, start with SageMaker Pipelines: An Architecture Deep-Dive.

Video Content Moderation: AWS Managed Services vs. Open-Source Models

February 25, 2026 at 00:00

I have built video content moderation pipelines both ways: one using AWS managed AI services orchestrated by Step Functions, another using open-source models running on SageMaker endpoints orchestrated by SageMaker Pipelines. Both architectures process uploaded video, detect unsafe visual content, transcribe audio for toxic language analysis, and route flagged material to human reviewers. They solve the same problem with fundamentally different trade-offs in cost, accuracy, operational overhead, customization depth, and data control. This article is the comparative analysis. I break down every dimension that matters when making this architectural decision, with real pricing data, accuracy benchmarks, and operational experience from running both approaches in production. For the full implementation details, see the companion articles: Video Content Moderation with Step Functions and AWS AI Services for the managed services approach and Video Content Moderation with SageMaker Pipelines and Open-Source Models for the open-source approach.

Video Content Moderation with Step Functions and AWS AI Services

February 25, 2026 at 00:00

Every platform that accepts user-uploaded video faces the same operational reality: a single piece of unmoderated content can produce legal liability, advertiser flight, and reputational damage that takes months to repair. I have built content moderation systems for platforms processing thousands of hours of video per day, and the architectural pattern I keep returning to is a Step Functions orchestration layer coordinating AWS managed AI services. Rekognition scans frames for nudity, violence, drugs, and other policy violations; it also identifies celebrities and labels objects and scenes. Transcribe pulls the audio track into a timestamped transcript. Step Functions ties these asynchronous, variable-duration jobs into a single deterministic pipeline that writes a structured metadata package back to S3 alongside the original video. This article is the architecture reference for that pipeline: the service integrations, the ASL definitions, the failure modes, the cost model, and the operational lessons that only surface under production load.

SageMaker Pipelines: An Architecture Deep-Dive

February 01, 2026 at 08:30

I have deployed SageMaker Pipelines across production ML platforms ranging from simple training-to-deployment workflows to multi-model ensembles with conditional quality gates. It is a fundamentally different orchestration paradigm than what most teams expect. The SDK trades orchestration flexibility for zero-cost execution, native SageMaker integration, and first-class support for the ML lifecycle patterns that actually matter in production: parameterization, caching, experiment tracking, and model registration. This article goes deep on the internal workings. How the execution engine resolves dependencies. How caching decisions happen. How data moves between steps. How to design pipelines that hold up under real operational pressure. If you are still deciding between Pipelines and Step Functions, I cover that comparison in Building Large-Scale SageMaker Training Pipelines with Step Functions. I assume here that you have already committed to Pipelines and want to know what is actually going on beneath the Python API.

Building Large-Scale SageMaker Training Pipelines with Step Functions

January 15, 2026 at 07:45

I have spent the last several months orchestrating ML training pipelines that coordinate dozens of SageMaker jobs: preprocessing, feature engineering, distributed training, hyperparameter tuning, evaluation, conditional deployment. The pattern I keep seeing is that teams pour effort into model architecture and training code while treating the orchestration layer as an afterthought. Then the orchestration layer is exactly where the ugliest production failures happen. This article is my architecture reference for building training pipelines on AWS Step Functions at scale. If you have already read my AWS Step Functions: An Architecture Deep-Dive, the execution model and state types will be familiar. Here we get into the problems specific to ML pipelines: training jobs that run for hours, spot instances that vanish mid-epoch, models that need human sign-off before they touch production traffic, and the retraining loops that keep everything from going stale.

Best Practices for Networking in AWS SageMaker

October 09, 2025 at 15:20

Three years of locking down SageMaker environments across regulated industries taught me one thing early: your networking decisions on day one determine whether the ML infrastructure passes an audit six months later. Teams treat SageMaker networking as an afterthought. Notebook instances get default settings. Models train with full internet access. Then the security review arrives and everybody scrambles. Give the networking layer the same rigor you'd give any production VPC workload.