Data lakes and data science platform vendors have made a lot of promises over the last 10–12 years, but most of them (around 85 percent) have yet to be realized. Headlines like “Hadoop has failed us” are abundant, and stories of frustrated data scientists quitting their jobs, because the infrastructure they need to do their best work is insufficient or non-existent, are rampant.
Lentiq, a spin-off of bare metal cloud provider Bigstep, aims to change all of that with a first of its kind, flexible cloud data lake service that promises to provide data scientists with everything they need to get busy right away. Literally on the same day, maybe in under an hour, according to Cristina Grosu, product manager at the Chicago-based company.
Lentiq accomplishes this by the use of flexible, interconnected “data pools” configured specifically for the task at hand , instead of the single, massive data lakes which are used today. Data lakes, according to Grosu, tend to be overgeneralized (so it’s hard to choose the right tools for solving a specific problem,) over-centralized (same technologies, same schema model regardless of organizational impact,) complex (built for all possible use cases, Hadoop, key value stores, advanced data management and data lineage,) and expensive.
Data pools, on the other hand, serve individual use cases, have their own budgets and resources, and aim to be closer to both the data and the end user. They can live across multiple clouds — AWS, Azure and GCP — and regions. They communicate through a central data catalog where information is governed and managed. Data is handled in a publish/subscribe manner with respect to PII rules.
Moreover, work is kept simple because data workers are free to pick precisely the tools — Apache Spark, Apache Kafka, Streamsets — and notebooks they prefer for the specific project at hand. Seaborn, Keras/Tensorflow, Bokeh, Dash, Plotly are integrated out of the box. Analytics and visualization tools like Looker, Qlikview and Tableau are also seamlessly available.
Collaboration between data workers is key in analytics and data science. Here Lentiq aims to make that seamless by making it easy to curate and publish data sets complete with comments and tags.
The beauty of all of this is that Lentiq eliminates much of the grunt work on which data analysts and scientists spend their valuable (and expensive) time and frees them up to work in the data and bring forth insights sooner.
While Lentiq is a no-brainer for small and medium sized businesses, it will likely also prove to be useful to any organization that is strapped for data scientists, data engineers and devops workers.