The Netezza Performance Server for IBM Cloud Pak for Data is an amalgam of leading edge technologies with lots of new concepts to get your head around, so we thought it would be helpful to explain the platform’s features in layman’s terms.
A Brief History of Netezza
Launched in 2003, legacy Netezza started out as a strong analytics database for structured data. In 2010 it became enabled for Artificial Intelligence via the INZA engine as well GeoSpatial analysis via the ESRI engine. When IBM acquired Netezza, they added Fluid Query, which is a script-based virtualization concept, and then Big SQL, a massively parallel processing (MPP) SQL engine for accessing Hadoop Big Data. Having such rich analytical capability and processing power, Netezza became one of the leading platforms for building traditional data warehouses, data marts and big data analytical systems.
How Netezza Was Relaunched
Whereas data warehouse appliances such as Netezza were once the standard in the industry for performance and ease of use, times have changed, and now hardware-bundled appliances have seen their day. That led to IBM announcing in 2018 that it would be ending Netezza production. However, keeping up with the trend towards cloud computing and containerisation, IBM acquired Red Hat and Netezza was resurrected as Netezza Performance Server for IBM Cloud Pak for Data (hereafter referred to as simply NPS) in 2019. In 2020 the first Netezza Performance Server was implemented in the Cloud on Amazon Web Services.
Even though die-hard Netezza fans will take a while to see it as anything else, NPS is no longer a stand-alone data warehouse appliance but a component part of a greatly enhanced modularised solution. It has been incorporated into IBM Cloud Pak for Data (CP4D), with broad new analytics and data processing capabilities. So, in addition to traditional data warehousing, with NPS you can do data science and machine learning as well as a lot of other analytical tasks. There are in fact modules catering for practically any analytical requirement an enterprise, big or small, will have - either on-premises, or in the Cloud.
An Overview of the NPS Platform
In this system overview diagram, NPS comprises 5 layers:
- The Services Ecosystem
- Data Virtualization
- The Platform Interface
- Red Hat OpenShift
- Any Cloud
The Services Ecosystem
This layer is an ecosystem of analytics services and templates from IBM and third parties that can be added to CP4D to run alongside Netezza. The services catalog contains the following types of services: AI (Watson); Analytics services such as Cognos Analytics and Db2® Big SQL; data governance services such as IBM InfoSphere and a Regulatory Accelerator; data sources such as IBM DB2 and MongoDB; developer tools such as Jupyter Notebooks and RStudio; and some industry solutions such as customer churn predictions and AI weather insights. So quite an impressive range of products and services have been containerized and can be added, as modules, to the solution.
The main benefit of a unified platform is that you don’t need to install the software somewhere else, and you don’t need to buy new hardware either. You can just spin all these features up within your existing CP4D system and they can run anywhere including clouds from other service providers. The downside is that these services share the same computing resources as Netezza, and so there will be a performance trade-off, and of course you have to pay extra for most of them. Nonetheless the ability to consolidate a range of analytical services and development tools in a single environment is very attractive for businesses wanting to consolidate a range of IBM products and services.
If you are interested in seeing what services are available for CP4D see https://www.ibm.com/uk-en/products/cloud-pak-for-data/services.
Data Virtualization and the Platform Interface
A typical enterprise will have a number of disparate analytical systems as well as systems for data management and data governance. From IBM’s perspective, the ideal environment will integrate all of these functions. So, for example an installation may integrate, via the platform interface, IBM DataStage (for DataOps), IBM Cognos Analytics (for business analytics) and IBM Watson Knowledge Studio for machine learning.
With Data Virtualization, you can query data across many systems without having to copy and replicate data, real-time. So, whereas data management, data governance and data analysis systems may access a multitude of different databases, they can appear to be accessing the same dataset thus provide a single user experience across all the functions. This is true even when the enterprise continues to use its non-IBM data integration and BI tools rather than those available via the NPS services layer. For data virtualization, those system databases are simply other sources that are connected via standard ODBC, JDBC and OLE DB drivers. Data Virtualization also makes a hybrid cloud solution possible, as you will read later.
If you’d like to see an interesting video about Data Virtualization, click here.
IBM Red Hat OpenShift
NPS is built on IBM Red Hat OpenShift which is the leading hybrid cloud, enterprise container platform that IBM bought in 2019.
Containerization is the packaging up of software code and all its related configuration files, libraries and dependencies so that it can run uniformly and consistently on any infrastructure. So, whereas with traditional programming methods there would be considerable effort, both in terms of coding and testing, to transfer code between two computing environments, this is not the case with containerized applications. The “container” is abstracted away from the host operating system, and hence it stands alone and becomes portable and is able to run across any platform or cloud.
Whereas early-gen Netezza appliances also ran on Red Hat Enterprise Linux, the Netezza processing nodes are no longer hardwired to the operating system but have been containerized and added to the Cloud Pak for Data (CP4D) System base. So Netezza can be deployed on-prem or in any cloud.
Kubernetes is an orchestration tool that allows you to run and manage all container-based workloads. The name Kubernetes originates from Greek, meaning helmsman or pilot, which aptly describes the function it performs. Kubernetes was created and first released by Google in 2015, to manage the massive number of clusters that formed the basis of the Google search engine. Shortly thereafter, Google donated Kubernetes to the Cloud Native Computing Foundation (CNCF) that it had set up with the Linux Foundation to promote container technology.
Within the containerized Netezza environment, Kubernetes monitors all the data nodes and host nodes, checking their health and restarting any that have failed. Underneath the Kubernetes layer are all the physical resources and Kubernetes decides which CPU will run a given process. Without Kubernetes your host would be running on a dedicated processor which, if it failed, would need replacing and would result in downtime and as such it provides a much higher level of fault tolerance and reliability.
We have barely scratched the surface here, so if you are interested to know more, here are two informative videos about containerization and Kubernetes.
As you will know, Hybrid-cloud is the talk of the town. Many organizations don’t want to commit to a fully cloud or fully on-premises solution, they want to be able to combine both on-premises and cloud solutions to handle the workloads and processes for which each is best suited.
One of the great features of NPS is it can be deployed in the public cloud, any cloud. Well, that is not strictly the case. Whilst it is true that Data Virtualization enables you to put your data in whatever cloud you like, actually hosting Netezza in the cloud is a different proposition. That’s possible because of containerization which allowed IBM to deploy the solution in the cloud, hosted by major cloud providers via Infrastructure as a Service (IaaS). At present, they are IBM Cloud, AWS and as of December 2020, Azure. What it does mean, however, is that the NPS Platform is truly multi-cloud, as you are able to choose from many permutations including retaining some or all of your hardware on-prem, which is the current trend.
Indeed, when it was first launched, some elements of NPS were still in development and as such many of its headlined features weren’t actually available. For example, at launch the Netezza host was still x86 (32-bit) hardware and it wasn’t until release 11.2 in December 2020 that it became 64 bit and fully containerized. So many businesses will have installed NPS on-prem rather than waiting for NPS to be cloud-enabled and given the fact that NPS has 100% compatibility with the early-gen Netezza systems, they may have regarded the move to the new system to be upgrade than a migration. These clients have a range of options available to them. They can migrate NPS to the public cloud and keep their on-prem infrastructure for development or as a backup. Or they can completely decommission their on-prem infrastructure (it is probably has a good resale value). Or they can divide their services between the two, or just use the cloud for storage. That’s the beauty of a modular, containerized, cloud enabled solution.
A One-Stop-Shop for Data Analytics
In wrapping up you can see that as a solution, the possibilities are endless with NPS. It really is a one-stop-shop for all things data. At one end of the spectrum are organizations that view NPS only as an upgrade to their existing early-gen Netezza appliance where they will continue running their existing ETL, data and AI applications alongside Netezza in their existing data centre. At the other end of the spectrum are big IBM clients that use a number of additional IBM services such as data management and data governance that have moved, or intend to move, their systems to the Cloud.
To be able to cater for the whole spectrum, the system is modular and containerized. So in the same way that a car manufacturer makes a chassis that is common to many car models that all appear to be very different, so too the NPS platform is a shell that is enhanced and scaled to meet the diverse needs of each client.
If you would like to find out more about IBM Netezza Performance Server for IBM Cloud Pak for Data, click here.