Old World
In the “old world” of on-premise equipment the physical servers were the key element of the IT infrastructure. With or without the virtualization layer, they were limited by the number of CPU cores, data transfer from hard drives and RAM. The machine power was the hard limit of any performance optimisations, one could quite easily navigate between optimising the application code or buying extra hardware. There was no risk of cost explosion, as all the investment was made in advance.
New World
The new world is slightly different. The data centers with thousands of servers still exist, but they are not accessible to the end user. Everything exists in the new layer called the cloud where physical equipment seems absolutely detached from the virtual cloud environment. You can create, drop, scale and restore your services, machines and applications with a simple web interface. The bill will come later.
Hidden Difference
What the major vendors rarely talk about is the fact, that the physical machines are still there, and the extra layer of software is not helping to speed up the system.
Case #1
Our team found it the hard way when we migrated our telco customer physical data centre into Azure, finding big drops in performance even though our VM specifications were better than the original we migrated from. After more thorough testing we found that SSD hard drives connected directly into the mother board of the servers where providing over 1GB/s read/write speeds, while Azure network connected drives could only get to maximum of 250MB/s. There were many more similar nuances which required optimisation of our code and some changes in the systems architecture. Eventually we managed to resolve all the bottlenecks without incurring extra cost, moving some of the services into the PaaS. Still it shown us the path to the new world, where each checkbox of the setup process in Azure might result in extra spend of tens of thousands of Euros per year.
New Approach to Architecture
The goal of big vendors is to make money and they try to make it very easy for the customers to spend money. Scalability of their systems is flexible to the certain extent. Companies need to invest in the overpowered services or try to design their solutions adjusting more to the billing system of the vendor, rather than physical capabilities of the platform
Case #2
When building the new Azure Managed Services platform for another telco customer from scratch on Azure we initially wanted to lift-and-shift our tried and existing ETL framework to the new platform. The experience however suggested that there may be better and more optimal ways to design the data warehouse. We investigated performance and cost of the PaaS Azure SQL DB vs Azure Synapse Data Warehouse vs VM based SQL Server. We found that Synapse really utilised multiple small SQL Server instances managed by the Synapse framework, which was quite limiting. Azure SQL DB on the other hand started to be powerful enough at the P6 level, which was quite expensive for multiple instances of the database, as we needed Staging, DWH and Operational Data Store. Throug try and error, we found some workaround that allowed us to connect many low tier Azure SQL DB’s into the ODS platform that loaded files from BLOB storage and perform T-SQL based processing at the very low cost. We then uploaded data into the Azure SQL DB star schema for reporting. As part of our framework, we established rules that scale the servers down when not used (during out of hours).
Summary
The new world of “cloud everything” is quite mature now and companies rarely consider an on-premise solution. We agree with this trend, however it is crucial to understand what is happening under the hood of the cloud solutions and clash it with the cost model proposed by the vendor. This allows to design the optimal architecture for both performance and costs.