Kris Sharma
on 21 February 2022
Open source Machine Learning toolkit for financial services
The financial services sector is adopting Artificial Intelligence technologies at a growing rate. Areas such as asset management, algorithmic trading, credit underwriting, blockchain based finance solutions, fraud detection and claims processing have all seen increased adoption of Machine Learning to drive more robust data-driven decision processes and better understanding of customer needs . This shift is mainly driven by the emergence of more cost-effective computing capacity and the abundance of available data.
Artificial intelligence and Machine Learning
On one hand, and to put it simply, Artificial Intelligence (AI) is the broader field of development of systems able to perform tasks that would normally require a “human intelligence”. On the other hand, Machine learning (ML) is a sub-category of AI focusing on the development of models for prediction and pattern recognition from data, with limited human intervention.
In the financial services industry, the application of ML methods has the potential to improve outcomes for both businesses and consumers. ML also holds promise to make financial services and markets more efficient, accessible and tailored to consumer needs.
Before we dig deeper into how financial institutions can leverage open source technologies for their ML needs, let us look at the high-level ML process.
The ML process
A typical ML process is composed of a few steps starting with writing an ML algorithm that will analyse data fed into it to create an ML model. Next step in the process is to find a data set or create (if one doesn’t exist) to ‘train’ the model that allows the ML algorithm to find patterns or other insights within the data set which the model can use to make predictions or generate recommendations. Once the model is trained, it needs to be tested to ensure that it accurately delivers the insights that it was intended to provide. After the model has been trained and tested, it is deployed in production to generate insights and serve the business use case.
For financial institutions to reap the rewards of their machine learning efforts, models must be developed within a repeatable process using an ML toolkit that empowers data scientists to manage the end-to-end ML process efficiently.
Open source ML toolkit for financial services
Open-source machine learning software has enabled the rapid growth and evolution of ML frameworks and libraries, and thus made it possible for financial institutions to solve increasingly complex challenges and foster a mindset of innovation, growth and community. Additionally, open-source ML platforms will help accelerate AI adoption within the financial services sector, which in turn makes AI better and smarter, benefiting everyone.
Data scientists at financial institutions are always looking for ways to deploy, scale, distribute ML models across clusters of servers and optimise models using techniques like GPU offloading.
Machine Learning on Kubernetes
Kubernetes offers some key advantages as a platform for training and deploying machine learning models and also addresses some of the key challenges faced by data scientists. K8s offers reusable deployment resources, automated scaling, multi-tenancy, GPU access and the ability to redistribute workloads in the event one node in the cluster fails.
Kubeflow – the ML toolkit on Kubernetes
Kubeflow is a free and open-source machine learning platform designed to enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. It provides components for each stage in the ML lifecycle, from exploration through to training and deployment. Kubeflow is based on Google’s internal method to deploy TensorFlow models called TensorFlow Extended.
Kubeflow provides the cloud-native interface between Kubernetes and data science tools: libraries, frameworks, pipelines, and notebooks.
Read more about what is Kubeflow?
Kubeflow supports a TensorFlow Serving container to export trained TensorFlow models to Kubernetes. Kubeflow is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, NVIDIA Triton Inference Server for maximised GPU utilisation when deploying ML models at scale, and MLRun Serving, an open-source serverless framework for deployment and monitoring of real-time ML pipelines.
Kubeflow includes services to create and manage interactive Jupyter notebooks. Data scientists can customise the notebook deployment and compute resources to suit their data science needs.
Kubeflow Pipelines is a comprehensive solution for deploying and managing end-to-end ML workflows. Data scientists at financial institutions can use Kubeflow Pipelines for rapid and reliable experimentation – schedule and compare runs, and examine detailed reports on each run.
The Kubeflow community extends the support of PyTorch, Apache MXNet, MPI, XGBoost, Chainer, and more. Kubeflow also provides integration with Istio and Ambassador for ingress, Nuclio as a fast multi-purpose serverless framework, and Pachyderm for managing data science pipelines.
To read more about the components and architecture of Kubeflow, please see the Kubeflow Architecture page.
MLOps at any scale – Charmed Kubeflow
Despite the clear benefits of Kubeflow for ML operations, deploying, configuring, and maintaining Kubeflow is still hard. The number of applications and potential scenarios makes it difficult for the Kubeflow community to provide a one-size-fits-all solution for data scientists at financial institutions to consume.
Canonical addresses this issue by packaging each application inside Kubeflow and providing a fully supported MLOps platform for any cloud – Charmed Kubeflow.
Charmed Kubeflow packages the 20-plus applications and services that make up the latest version of Kubeflow, to make deployment and operations even faster and simpler – anywhere, on workstations, on-premises, on public, private and edge clouds.
Charmed Kubeflow is driven by Juju – an enterprise Operator Lifecycle Manager (OLM) that provides model-driven application management and next-generation infrastructure-as-code. In Juju, operators and applications are bundled as Charms – packages that include an operator together with metadata that supports the integration of many operators in a coherent aggregated system.
Juju provides a central view of Kubernetes operators in a deployment, the configuration, scale and status of each of them, and the integration lines between them. It keeps track of potential updates and upgrades for each operator and coordinates the flow of events and messages between operators.
Let us take a look at some of the key benefits of Charmed Kubeflow.
Multi-cloud portability
Many financial institutions today choose to operate in hybrid-cloud or multi-cloud scenarios, enjoying the lower-cost compute of on-prem and the elasticity of public clouds. Thanks to Ubuntu, Charmed Kubernetes and Charmed Kubeflow provide portability of ML workloads across infrastructures, from the data center to the public cloud.
Kubernetes agnostic
Charmed Kubeflow is compatible with any conformant Kubernetes, including AKS, EKS, GKE, MicroK8s, Charmed Kubernetes, and any kubeadm-deployed cluster.
GPU acceleration
Detect and configure GPUs automatically on MicroK8s and on Charmed Kubernetes for high-throughput training and inference. Data scientists at financial institutions can accelerate the ML workloads on their Kubeflow pipeline with GPU pass-through from machine to Kubernetes to Kubeflow.
Easy to get started
Canonical provides a full set of enterprise services, from evaluation to day-2 operations, which include on-site training, deployment, enterprise-grade support and fully-managed Kubeflow. Alternatively, if you want to start small, while evaluating Kubeflow against other technologies, try out Kubeflow on MicroK8s.
Visit ubuntu.com/kubeflow/install for a quick trial.
Get in touch with us to learn how enterprises across the globe are using open source Machine Learning technologies to drive competitive advantage.