The Benefits of Simplicity in Data Warehouse Architecture

Written by: Roy Hammett | July 21, 2023 | Yellowbrick Data Warehouse | Data Migration Data Security Disaster Recovery GDPR HIPAA ISO 27001 LDAP PostgreSQL Postgres Proof of Value Sarbane Oxley netezza Business Continuity

In this third in the series of blogs accompanying our joint webinars with Yellowbrick about data warehouse modernization we look at how and why the simplicity of the Yellowbrick database can have a positive impact on many different roles and functions within the organisation.

Administrators

On the surface, you might think that administrators could feel threatened by a system that is so relatively simple for them to maintain but we have experienced a different reaction. In their roles, DBAs have lots of things to do and having one less thing to do is typically, no bad thing. Yellowbrick is designed so that some of those housekeeping tasks such as monitoring, maintaining, and grooming the database, disappear.

What this means for the DBA is that they're now focused on delivering the business service, so increasing uptime, delivering better performance, and focusing on things that sometimes get pushed into the background like Disaster Recovery and Business Continuity. These things are quite hard to do on big data platforms, but you can achieve it when the DBAs have more time to focus on it, and this helps the business get the most out of their data.

Developers

Migrating to a new system usually requires a lot of upskilling of developers but this is not the case with Yellowbrick as it’s a SQL database with a very familiar PostgreSQL interface. Being fully Postgres compatible it allows you to use your tools of choice – whatever you are using will almost certainly work against Yellowbrick.

Yellowbrick has no indexes. Even if you wanted to build them, you don't need them. It’s designed to take away that tuning headache and deliver performance out of the box. Your developers can design SQL queries using local no-code tools, your favourite BI tool or your data science tool to connect to Yellowbrick through standard JDBC, ODBC and ADO interfaces to get at the data in the way that you need to.

The developers don't have to jump through any hoops or do any kind of weird, complicated machinations to get that raw performance, it’s basically just code and go.

Business Users

So why is it that business users like the simplicity that Yellowbrick offers, and what does this simplicity mean for them?

Simplicity means connecting from the tool of their choice, whether it's a low code, no code, or a BI tool. Users should be able to get access to that data in the way that makes most sense for them.

Business users have in modern times developed an expectation of performance. They don't want to be waiting for hours for reports to refresh. Having faster results is important, but even more important is consistent run times. If you’re running some complex analytics and you know it's going to take this long because you're limited by certain resources that your administrators have provided to you, that’s okay. You know that's how long it's going to take every day.

However, if it takes 10 seconds one day and 20 minutes the next day because 10 people are logged on to the platform, then that’s not okay. Nor is it okay when suddenly the capacity of the platform is being taken up by another set of users, such as a data science team or your data ingestion team, and suddenly your performance falls through the floor. Those are the things that really annoy business teams.

Yellowbrick delivers faster results, and consistent performance. The scenario where badly written queries can slow down the system for everybody is a thing of the past. Yellowbrick has a good query optimizer and even the most badly written queries are rewritten behind the scenes and sometimes even brute force is used to deliver them.

Businesses shouldn’t need to be writing SQL or using the command line to get results from their Data Warehouse, but they do. And they do this because the platform fails to deliver faster, consistent results. They must get their job done, so for many it has been a temptation to just suck out the data and put it somewhere else, which results in little shadow IT systems. They may be embedded in a BI tool or in the worst case, an Excel spreadsheet. This isn’t good for anybody – not for data governance and not for the organization. Most of all it’s not great for the end user having to use up valuable hours of their time marshaling and manipulating and shaping this data. Inevitably things go wrong in shadow IT systems because they are not tied into the whole IT development process.

Simplification means picking the right tool for the job and the data warehouse doing its job of delivering the queries quickly and consistently, so that these shadow systems become a thing of the past.

Security

Security is a very important role within most modern enterprises. Customers are increasingly concerned about Information Systems security, particularly given the potentially huge reputational and financial penalties that, for example, a GDPR customer data breach could result in. So, how does Yellowbrick help the Chief Security Officer and his team make sure their data is secure?

Yellowbrick offers simplicity in this area in that it is designed to always keep you in control of your data. Yellowbrick is deployed completely in your domain inside your network whether it be inside your data center or in the Cloud, which is very different to a lot of the other data warehouse vendors that are out there, particularly the SAAS vendors. There is no multi-tenancy, so it's always your own data, always your single purpose installation inside your environment. You have full visibility of how it's running and where it's running.

Yellowbrick does not see any of your data, or any of your users. Providing you have a decent security setup and policies both in Cloud and on-premises, a decent landing zone and policies in place for how network traffic flows through, enforcement of firewall rules and all those types of things, Yellowbrick just sits inside one of those containers and doesn't introduce any new security concerns for architects.

Managing the increasing number of different security standard certifications and frameworks like HIPAA in the health industry, ISO 27001, Sarbane Oxley, etc. that customers must comply with just to be able to operate, becomes a lot simpler. This is because all you’re having to do is to prove that Yellowbrick runs inside a secure environment that you own and maintain.

Disaster Recovery and Business Continuity

As we mentioned previously, Disaster Recovery and Business Continuity often get pushed aside because of the routine tasks that keep DBAs fully occupied.

Yellowbrick has baked-in replication, which means it can synchronise across two or more different environments. This could be between different Cloud environments, from on-prem to the Cloud, or from on-prem to another on-prem data center. This near real-time replication is a simple way to facilitate Disaster Recovery. In the event of a failure, you can have another platform up and running almost on demand.

Replication is unidirectional from one database to another database, so you can't update things on both sides, but if you've got two different sets of databases and you want to use each of them as a Disaster Recovery target for the other one, you will have two active platforms with each backing up the other. You can also use this configuration for load balancing or you can use replication to share subsets of data, so you can pick a particular database and replicate that to a third-party partner. There are several uses for it.

User Management

Chief Security Officers like to have a control over an enterprise-wide repository of users, groups, permissions using Active Directory or something similar. When choosing another platform you don’t want to reinvent the wheel when you have already spent several years building up a decent Active Directory structure or group structure that represents your business hierarchy, your security domains. Likewise, why reinvent your database or your BI tool.

Yellowbrick fully supports Active Directory, LDAP, OKTA, and some other identity management platforms. You log on through single sign-on as your normal network ID through to a start point and then the data you will see within the database will be governed by your group membership privileges. If someone leaves and you want them to stop having access to that data it is done in one place, so it’s one less thing to manage. Ensuring that database security is automatically maintained reduces the threat level from bad actors

In Yellowbrick data is encrypted at rest, data is encrypted in flight on its way back to you through techniques like TLS and SSL. The keys are always in your control, not managed by Yellowbrick.

Integration

So how simple is it to integrate Yellowbrick with all the other systems that relate to the data warehouse, such as ETL, data visualization and analytical tools?

Generally, when people connect to Yellowbrick they use the PostgreSQL connectivity stack which includes all the drivers, so you don't connect as Yellowbrick, but as Postgres. Arguably this is not great marketing for Yellowbrick, but from a simplicity standpoint, it really works well. If there's a good Postgres connector in the tool it just connects out of the box to Yellowbrick.

Customers coming from the Netezza platform will find it very simple, almost a case of just changing the login credentials and tweaking a few things and you are off. This is because in their early start-up phase Yellowbrick went overboard to make sure that they had a high level of compatibility with Netezza SQL functions and even their load utilities mirror those of Netezza.

If you look at an architectural diagram, Yellowbrick sits exactly where Netezza would sit and, like Netezza, they are not trying to be an ETL tool and a BI tool and a data governance tool and a data science tool, they are really a drop-in replacement for a data warehouse platform, specifically the SQL database part of it. This is what Netezza did for customers, but Yellowbrick are taking it to the next level in with a huge step-up of performance.

Migration

Yellowbrick data types are highly compatible with those of Netezza, so migration is quite without obstacles, although you still must validate and reconcile the data that has been migrated.

Smart Associates has developed a tool that automates the end-to-end process. It not only allows you to migrate from Netezza to Yellowbrick, but pretty much anything to Yellowbrick. This is going to be of interest to companies who, for example, have discovered that Snowflake is a lot more expensive than they anticipated and are able to drastically reduce costs by making the switch to Yellowbrick. We have created two demonstration videos that we’d encourage you to watch on how Smart Data Frameworks can undertake a very seamless migration to Yellowbrick, from Netezza and from Snowflake.

That wraps up this blog, but if you’d like to learn more about the benefits of simplicity in data warehouse architecture, why not see the recording of the full webinar (31 mins).

Or, if you’d like to have a chat with us about how we can help facilitate your free Yellowbrick proof of value with your own data, contact us here.

Previous blogs in this series:

A Netezza Customer Journey

Value Drivers in Data Warehouse Selection

Author Bio

Thursday August 10th, 2023