MLE · ML Systems

Felipe Felix Arias

MLE on Uber's ML Training team. I build GPU training infrastructure, GenAI systems, and support LLM fine-tuning — spanning infra, applied ML, and production deployment.

Founder of Marovi AI, a research comprehension and translation platform that makes knowledge accessible across languages. Former NSF Graduate Research Fellow at UIUC.

MS CS, UIUC (2023) BS CS Honors (2019) NSF GRF (2021)
ML Training Infrastructure Evaluation & Data Pipelines Agentic Systems Distributed Training Ray / Kubernetes Python PyTorch Go

San Francisco · Fluent in English and Spanish · Born in La Paz, Bolivia

Felipe Felix Arias

Impact at a Glance

Signals from production systems, research, and Marovi AI.

0

Downtime migrating Uber's Ray training controller across 2,000+ ML pipelines to regional federated compute.

3.7%

Drop in unsafe Uber driver behavior from a GenAI + telemetry-based distracted driving detection system.

Top 10% x2

Y Combinator selections for Marovi AI, a research comprehension and translation platform across 9+ languages.

4 Labs

Research at Stanford, UC Berkeley, UIUC (NSF fellow), and published with Google Brain (ICRA 2021).

Technical Focus

Systems areas I work in.

ML Training Infrastructure

Job controllers, GPU scheduling, training platform reliability, and supporting LLM fine-tuning (LoRA, QLoRA, full) at scale.

Evaluation & Agentic Systems

Agentic systems, RAG, structured evaluation, and production GenAI with safety guardrails.

Distributed Training Infrastructure

GPU cluster orchestration, Ray on Kubernetes, DDP/FSDP, and federated compute at scale.

Research Comprehension & Translation

Crowdsourced translation and AI-powered learning platforms that make research accessible across languages.

Experience Highlights

Selected work across LLM systems, GenAI, and research.

Uber — ML Training Team

Software Engineer · San Francisco · Mar 2024 — Present

  • Build job controllers and GPU training infrastructure; led the move from zonal to regional federated compute with zero downtime.
  • Co-designed Ray job submission across federated Kubernetes clusters and shipped AI safety systems plus internal assistants for incident response.
  • Leading mTLS rollout for batch workloads and serving as primary point of contact for LLM fine-tuning partnerships.

Marovi AI

Founder · San Francisco · 2024 — Present

  • Building a research comprehension and translation platform with a provider-agnostic API across OpenAI, Anthropic, Google, and DeepL.
  • Designed crowdsourced correction loops and agentic translation pipelines; content across 9+ languages; selected top 10% at Y Combinator twice.
  • Full-stack: AWS deployment, MediaWiki frontend, Python orchestration, Pydantic schemas, and benchmark evaluation (FLORES, XL-SUM).

Research — Stanford, UC Berkeley, UIUC, Google Brain

NSF Graduate Research Fellow · 2017 — 2023

  • UIUC Parasol Lab (NSF fellow): Self-supervised data labeling via simulation for multi-agent motion planning (5x speedup). Published with Google Brain at IEEE ICRA 2021. Master's thesis on motion pattern prediction.
  • Stanford Hazy Research: Extended Snorkel's programmatic data labeling for multi-sentence relation extraction (+12% F1). Collaborated with Alex Ratner (now CEO of Snorkel AI) and Christopher Ré.
  • UC Berkeley: Built shake-gesture detection for E-mission, an open-source mobility platform (+7% F1 via SVM on imbalanced signals).

Selected Projects

A few representative projects.

NavIRL Wainscott demo NavIRL hallway pass

NavIRL: Indoor Social Navigation Toolkit

Research Tool · Rooted in thesis work

Agent-driven indoor social-navigation toolkit with deterministic simulation, verification gates, and an Aegis overseer pipeline for VLM-backed qualitative realism checks and tuning.

ACPRM visual

Avoidance Critical Probabilistic Roadmaps

IEEE ICRA 2021 · With Brian Ichter, Aleksandra Faust, Nancy M. Amato

Self-supervised programmatic data labeling through simulation: learns avoidance-critical regions from generated collision data and distills them into spatial models for scalable multi-agent motion planning.

Video sensor visual

Video as a Sensor for Urban Risk

With Daniel Carmody, Richard Sowers, Jayati Singh, Kevin So

Weak supervision and behavioral heuristics surface risky roadway events when labeled data is scarce.

Publications

Selected academic work.

Now

Building ML training infrastructure and GenAI systems at Uber. Growing Marovi AI on the side. Experimenting with agentic research via NavIRL.

Outside Work

I play guitar, drums, Latin dance, and travel. Covers and clips on Instagram.