Start with the Pipeline in Mind

Author: Torin Bakos Published: 2025-12-04

A series on data science process management.

Welcome to Start with the Pipeline in Mind. A series on data science process management, and the inaugural post for EmpiriQuant.com!

Data science is a team endeavor, both within the data science group and the broader organization in which a data science team operates. This series aims to lay a road map for data science teams of all sizes to reference when configuring research and development workflows with two main goals in mind.

The first goal is to streamline the research process by making collaboration and iteration easy to facilitate and track. With these principles in mind, we can distill the core facets of this workflow to portability and reproducibility. By leveraging a small set of tools, we’ll be able to reliably track experiments, optimize models, and ship deliverables with minimal headache.

The second goal is to make it as painless as possible for Dev/ML Ops to take the data science team’s models from R&D to production. Fortunately, this goal can largely be achieved by working towards the first goal, as the tools that compose our framework are portable to an engineering or operations context.

Readers may notice that the framework favors tools that do one thing well rather than monolithic solutions. This is for two reasons. First, if something breaks, it’s best to contain the fallout to as few systems as possible. That way it’s much easier to find and implement a solution when issues arise. Second, while not always, I’ve found that when a tool is designed for one purpose, it tends to do a better job at fulfilling that purpose than a tool that has been designed to do many things.

Check back in a few days for the complete series!