The name ‘Pendleton’ has been floating around for a while in press circles, but we’ve only just discovered its true function. Thanks to Twitter’s WalkingCat, we have access to its ‘Getting Started Guide, and it should be particularly exciting to data scientists.
Microsoft describes Pendleton as “a set of flexible and scalable tools to help you explore, discover, understand and fix problems in your data. It allows you to consume data in many forms and to transform that data into new forms that are better suited for your usage.”
According to research from Microsoft earlier this year, data scientists spend up to 80% of their working hours wrangling data. The raw material must be extracted, cleaned, and formatted correctly, and there’s simply no way to avoid it.
Pendleton, however, could speed it up significantly. The machine learning tool runs on Windows 10 and OSX, utilizing Python to correctly format columns and flag missing data. Bundled are analytics tools and support for Azure Blobs, SQL Server, and Data Lakes. As a result,
Project Pendelton New Features
According to ZDNet’s Mary Jo Foley, Microsoft has been testing the tool for over a year and making constant improvements. Recent additions include column metrics for data views, sampling, random percentage sampling, and the ability to open remote Excel files.
The most recent iteration is preview version 24, so it’s clear Microsoft has come a long way in that time. What’s missing is a release date, but it should reach the public eventually.
Until then, you can discover how to use Pendleton via a leaked video. It’s short, but should be enough to give users a solid idea of the features: