SEARCH FINANCIAL SERVICES INFRASTRUCTURE SECURITY SCIENCE INTERVIEWS

 

     

Whatís New in Skytree 16.0

By Nick Bal, Staff Data Scientist, Skytree

January 30, 2017

Skytree is announcing a major new release, version 16.0, in which we continue to combine the powers of automation, scalability, and ease of use to provide a powerful tool for data science, for both experts and non-experts.

In our last release (http://www.skytree.net/2016/10/10/creating-data-prep-pipelines-interactively) we added the functionality of transform snippets to the Skytree Platform. Snippets allow users to perform the arbitrary data preparation demanded by real world dataflows while taking advantage of the scalability of Spark without the requirement to know the Spark language. Furthermore, with snippets added to the system, GUI users can use them without needing to know any coding at all.

In 16.0, we continue our trend of increasing user ease of use via automation, further enabling non-expert users to access the power of machine learning to add value to their business. Now that a major stride in the area of data preparation has been added to the platform, in this release we turn our attention to another major demand on data scientistís time: feature engineering.
 

As is well-known, feature engineering is important for best results from data, but there is no systematic agreed-upon way to approach it because of its strong dependence on domain expertise. For example, one may have a customerís credit balance and credit limit, but the model actually gets better results from explicitly adding the ratio of the balance to the limit.

In this initial release, the featurization available is basic: normalization, horizontalization (aka. one-hot encoding or dummy variables), mean imputation, and removal of zero variance columns. But because this functionality is also available via transform snippets, there is a clear path to a full suite of automated feature engineering transforms that will go a long way towards completely automating this part of data science. The ratio mentioned above, for example, is very straightforwardly implemented as a transform snippet.

Having now addressed data preparation (Skytree 15.6) and featurization (this release), we therefore have in place a full end-to-end platform. While there is always more to be done, the framework is in place for scalable end-to-end machine learning and data science usable by experts and non-experts alike.

dataflow

The full data science dataflow is now available within the Skytree Platform

As well as feature engineering, several other new functions have been added to Skytree.

Major new features in Skytree 16.0 include:

  • Ability to apply feature engineering to your dataset before running machine learning
  • Automatic pre-requisite transforms for AutoModel
  • Major redesign of the Skytree GUI to improve usability and navigation
  • Additional search iterations for AutoModel and Smart Search

Feature engineering was described above. Expanding on the other new features:

AutoModel pre-requisite transforms: Until now, to gain the full set of algorithms from AutoModel, rather than a subset, the input dataset had to be numerical and without missing values, to satisfy the requirements of algorithms such as generalized linear model and support vector machine. The transforms could be made, but had to be done as separate steps. Now, AutoModel is able to detect that columns are non-numerical or contain missing values, and apply these transforms automatically as part of running the model. So the user can, for example, load a dataset with numerical and categorical columns, and missing values, and immediately run AutoModel and still have it run the full range of machine learning algorithms. This significantly increases the convenience and flexibility of AutoModel in saving user time and obtaining best value from the data.

Skytree GUI redesign: The graphical interface to Skytree has been overhauled for 16.0 to give a more consistent and friendly user experience, both in the way information is displayed, and for navigation between different parts of a project. The most obvious differences are the dataset, model, and results lists now have a sidebar with available actions, rather than a menu that pops up on mousing over, and the audit trail (DAG), is displayed in a more deterministic way with much greater clarity than before. Many further detail improvements have been made.

list

dag
Screenshots of the new Skytree GUI. The first panel shows a model list with the new sidebar of allowed actions. The second panel shows the new audit trail, with more deterministic display and improved clarity

Additional search iterations: For AutoModel and Smart Search, these allow you to take an existing search of the machine learning parameter space and continue it, using the information found from the search so far. This means that you are not forced to guess in advance how many iterations are needed to give you your best result, but can specify a relatively small number and then continue the search if needed. Thus accuracy and value are increased, while time wasted is reduced.

In addition to the above new features, a number of other enhancements have been added:

  • Generate test result dataset with predictions alongside the data in one output
  • Tune a model to maximize the area under the curve (AUC), precision at k, or recall at k (k being a point along the curve)
  • Obtain area under the curve (AUC), precision at k, and recall at k as metrics with model tuning results
  • Export Skytree native model files in the GUI and SDK
  • Context-based filtering for the DAG, plots view and list views
  • Visualization of the best model when selected by AutoModel
  • Enhancements to streaming functionality to support sparse features and a separate JSON structure for each data point in batch queries
  • Enhancements to performance of concurrent streaming queries
  • New implementations of kernel density estimation and kernel discriminant analysis on the command line
  • Easily access variable importances from the SDK as well as the GUI
  • New transform snippets to expand data preparation functionality

You can download Skytree Express GUI, Python SDK and Command Line Interface (CLI) for free at this link. Enjoy, and as always please let us know how we can make it better.

Terms of Use | Copyright © 2002 - 2016 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement