First End-to-End Machine Learning Platform Is Embraced by the Community with over 2 Million Downloads Per Month and over 200 Contributors in Only 2 Years
San Francisco, JUNE 25, 2020 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that MLflow, an open source machine learning (ML) platform created by Databricks, will join the Linux Foundation. Since its introduction at Spark + AI Summit two years ago, MLflow has experienced impressive community engagement from over 200 contributors and is downloaded more than 2 million times per month, with a 4x annual growth rate in downloads. The Linux Foundation provides a vendor neutral home with an open governance model to broaden adoption and contributions to the MLflow project even further.
“The steady increase in community engagement shows the commitment data teams have to building the machine learning platform of the future. The rate of adoption demonstrates the need for an open source approach to standardizing the machine learning lifecycle,” said Michael Dolan, VP of Strategic Programs at the Linux Foundation. “Our experience in working with the largest open source projects in the world shows that an open governance model allows for faster innovation and adoption through broad industry contribution and consensus building.”
Databricks created MLflow in response to the complicated process of ML model development. Traditionally, the process to build, train, tune, deploy, and manage machine models was extremely difficult for data scientists and developers. Unlike traditional software development that is only concerned with versions of code, ML models need to also track versions of data sets, model parameters, and algorithms, which creates an exponentially larger set of variables to track and manage. In addition, ML is very iterative and relies on close collaboration between data teams and application teams. MLflow keeps this process from becoming overwhelming by providing a platform to manage the end-to-end ML development lifecycle from data preparation to production deployment, including experiment tracking, packaging code into reproducible runs, and model sharing and collaboration.
Matei Zaharia, the original creator of Apache Spark and creator of MLflow, shared the news with the data community during his keynote presentation today at Spark + AI Summit. “MLflow has become the open source standard for machine learning platforms because of the community of contributors, which consists of hundreds of engineers from over a hundred companies. Machine learning is transforming all major industries and driving billions of decisions in retail, finance, and health care. Our move to contribute MLflow to the Linux Foundation is an invitation to the machine learning community to incorporate the best practices for ML engineering into a standard platform that is open, collaborative, and end-to-end.“
Organizations are presenting their experience with MLflow at Spark+ AI Summit, including Starbucks, Exxonmobil, T-Mobile and Accenture. New features that continue to simplify MLflow and the ML lifecycle are also being announced today, including autologging for experiments, and enhanced model management and deployment in the MLflow model registry.