IBM Integrated Analytics System - Do Data Science Faster
In the last few years we have seen a rapid evolution of data. The need to embrace the growing volume, velocity and variety of data from new technologies such as Artificial Intelligence (AI) and Internet of Things (IoT) has been accelerated.
The ability to explore, store, and manage your data and therefore drive new levels of analytics and decision-making can make the difference between being an industry leader and being left behind by the competition. The solution you choose must be able to:
The IBM Integrated Analytics System meets these needs and includes embedded IBM Netezza Analytics technology with multiple algorithms, including linear regression, decision tree clustering, k-means clustering and Esri-compatible geospatial extensions. The system is designed to work with business analytics and visualisation tools, including IBM Cognos, SAP BusinessObjects, Kognitio, Microsoft Excel, QlikView, SAS, Microsoft SQL Server Reporting Services (SSRS) and Tableau. The system also handles model-building and scoring tools such as IBM SPSS, Fuzzy Logix, open source R and SAS.
The IBM Integrated Analytics System drives the insights needed to increase your competitiveness by matching accelerated development and deployment times for your data scientists with a high-performance, optimized and cloud-ready data platform.
As a unified data science solution, the built-in IBM Watson Studio can be used by your data scientists to connect with your organisation’s data in place. This connection helps data scientists develop machine learning analytics that benefit from a performance-optimised common SQL engine with embedded Apache Spark processing.
From the start, the IBM Integrated Analytics System requires little or no tuning and maintenance to deploy and manage even demanding workloads that require high performance and petabyte-level scalability. The IBM Integrated Analytics System enables machine learning with the Apache Spark processing engine embedded on the system for higher performance analytics. At the same time, this feature can help reduce the complexity of moving analytics and data to separate environments. A common SQL engine shared across the IBM hybrid data management offering family lets you work with your existing on-premises and cloud applications. This flexibility allows you to pick the right environment for the right tasks.
The integrated architecture combines software enhancements, such as asymmetric massively parallel processing (AMPP), with IBM Power technology and flash memory storage hardware. The IBM Integrated Analytics System handles traditional data warehouse workloads and operational mixed workloads. These workloads often require processing queries against large data volumes, quick point queries on small data sets and multiple concurrent operational accesses. As a result, the IBM Integrated Analytics System supports a wide variety of analytics use cases across broad data types and locations on a single solution. This flexibility provides your data scientists with almost endless possibilities.
The IBM Integrated Analytics System helps simplify data scientists’ efforts to train and evaluate predictive models with embedded Apache Spark processing. This feature helps eliminate the need for time-consuming movement and transformation of data to other systems. Once the models are developed using the tools of the data scientists’ choice, the testing, deployment and training can be done where the data resides. With each node containing its own Spark executor process, latency is minimised, which helps speed data access and calculations compared to a stand-alone Spark cluster. In those cases where data scientists need to take the workloads off the system, industry-standard tools and the common SQL engine provide the option to seamlessly move models to a Spark cluster.
In addition to streamlining processes, this ability can also provide advanced performance and flexibility for analytics, including machine learning capabilities. Your data scientists can immediately connect to data in the system and begin building models with the five authorised user licenses included with the IBM Watson Studio.
This interactive, collaborative, cloud-based environment allows data scientists to use multiple tools to activate their insights. Data scientists also have the option of using Python, R or Scala using Jupyter Notebook with a Jupyter Notebook container included on the system. Jupyter can be used to execute interactive code with one-click deployment that transforms the code into a compiled and deployed Spark application.
In addition to prebuilt functions for data mining, prediction, transformations, statistics, geospatial data and data preparation, the Spark capability embedded in the IBM Integrated Analytics System supports open source R and other programming languages like Python, Java, C, C++ and Lua.
IBM helps simplify the deployment and management of the analytics system using a design based on more than 20 years of experience with thousands of clients across multiple industries and regions. The software and hardware arrive at your data centre configured to work together as a single performance-optimised solution. Within hours, you can load data without creating database indexes or struggling to tune and retune the data warehouse once it’s operational.
Clients using the IBM Integrated Analytics System and the included IBM Db2 Warehouse technology should immediately recognise the common SQL engine used across the entire IBM hybrid data management solution portfolio. The IBM Db2 Warehouse is designed for data warehouse and analytics workloads. The common SQL engine uses dynamic in-memory columnar technologies for multi- workloads based on IBM Db2 and IBM BLU Acceleration technology. BLU Acceleration massively parallel processing (MPP) architecture is designed for rapid and deep analysis of data that can scale into the petabytes. With query response times up to 100 times faster than earlier systems,1 BLU Acceleration columnar tables can coexist with traditional row tables in the same schema, storage and memory so you can query both row and BLU Acceleration columnar tables at the same time. Adding BLU acceleration technology to traditional in-memory capabilities can accelerate performance even when data sets exceed the size of the memory. The dynamic in-memory columnar technologies of BLU Acceleration with data skipping offer an efficient method to scan and find relevant data even when the data is compressed.
The IBM Integrated Analytics System leverages IBM Power Systems and IBM FlashSystem technology to improve reliability and performance at the hardware level. Today’s IBM Power architecture enables denser systems that can achieve similar performance with less nodes than previous offerings. As the default storage for the system, IBM FlashSystem offers ultra-low latency and high near-in-memory I/O speeds with outstanding reliability.
While the analytics applications run at peak performance, the IBM Integrated Analytics System also brings new levels of reliability to help you meet or exceed your service level agreements. Power Systems and IBM FlashSystem storage is rated with increased uptime thanks to the fault tolerant design that helps eliminate a single point of failure. According to a 2017 Information Technology Intelligence Consulting (ITIC) survey, IBM Power Systems has the least amount of unplanned downtime — with 2.5 minutes per server per year — of any mainstream Linux server platform.2
Redundancies are built into components throughout the system, helping ensure continued operation in case of a hardware failure. The system also includes additional built-in, high-availability features to provide automated failovers for performance continuity. Monitoring and management for all components — hardware and software — is provided by a built-in console powered by IBM Data Server Manager that’s used across the Db2 family.
When it comes to your data, a one-size-fits-all approach rarely works. The IBM Integrated Analytics System is built on the common SQL engine, a set of shared components and capabilities across the IBM hybrid data management offering family that helps deliver seamlessinteroperability throughout your infrastructure.
For example, a data warehouse that your team has been using might need to be moved to the cloud to meet seasonal capacity demands. Migrating this workload to IBM Db2 Warehouse on Cloud can be done seamlessly with tools like IBM Bluemix Lift. The common SQL engine helps
ensure no application rewrites are required on your part.
The common SQL engine provides a view of your data regardless of where it physically sits or whether it’s unstructured or semi-structured data. The system’s built-in data virtualisation service in the common SQL engine helps unify data access across the logical data warehouse allowing you to federate across Db2, Hadoop and even third-party data sources.
IBM Data Replication for Db2 Continuous Availability, a new, optional service for IBM Integrated Analytics System customers is also available. It supports highly available Db2 Warehouse environments by synchronising data over both row and columnar organised tables and schemas, whether on the same platform, across the data centre, or around the world. This software replication offering also supports active and stand-by replicas for workload balancing, shifting workloads during planned outages while also dramatically reducing the time to recovery for unplanned outages. This offering is pre-integrated into IAS and lets you get started quickly with a 90-Day ‘Try it Now’ license.
Both the software and hardware architecture have been designed to grow and scale as you bring more workloads to support your business onto the system. Compute and storage capacity can be expanded independently, providing almost cloud-like levels of flexibility and elasticity. Hardware expansion is non-disruptive to your business and can be done in place on the system.
The IBM Integrated Analytics System also supports multi-temperature tiered storage to help ensure the highest levels of performance, even with large volumes of data. The system manages the most recently used and active hot data directly on the system storage nodes, while older, less active cooling data resides on more cost-efficient, high-density IBM Storwize storage devices.
The common SQL engine used in the IBM Integrated Analytics System lets you match the right workload with the right deployment platform, while helping ensure that data is accessible regardless of type, location or size. The following are a few use cases to inspire you in getting started:
The IBM Integrated Analytics System integrates and optimises all compute, storage and networking resources with analytics and data warehouse software. It’s available in rack
configurations as shown in the table.
1. Assume up to 4x compression to calculate user data based on approximate uncompressed user data. For example a full rack user data capacity would be 4 x 81 TB resulting in 324 TB
2. Dimensions are given per rack