William McKnight gave a keynote presentation on Wednesday at Data Architecture Summit 2018 Conference on creating a modern data architecture using different data platforms.
He started the presentation by saying there is a high correlation between data maturity in the organizations and business success. The more mature their data architectures are, the better success they have. Low maturity organizations tend to have data scattered across different silos. He shared some statistics on what happens in an "Internet Minute" in 2018, where 187 million emails and 18 million text messages are sent and 3.7 million search queries are performed, and so on.
Artificial Intelligence (AI) is a disruptive force and data is the foundation for this revolution. A new data set is emerging in the industry: Bio Data. If you combine bio data with environment data (location etc), you'll have all the information you need to perform data analytics.
He talked about data cultivation and how solutions like data warehouse and Data Lake help with the data efforts. Data architects also need to decide between HDFS and cloud storage. HDFS has better query performance, but cloud storage is more scalable, persistent & available and less expensive.
McKnight discussed the operational big data platform selection choices by comparing relational databases and NoSQL databases in terms of data size and workload complexity.
He also talked to the NewSQL databases which are scalable, ACID compliant and support sharding. These databases are being used for capital market data feeds, financial trade, telco record streams and fraud detection.
Cloud deployment of databases offers several benefits including on-demand and self service data management, broad network access, resource pooling, rapid elasticity, and measured service.
Traditional ETL techniques are insufficient for data platforms operating at and enterprise-wide scale. There are a variety of data sources and the data is being streamed at real time. Data architets should look into stream processing for these requirements.
Enterprise data virtualization provides consistent and timely access to all structured and semi-structured data from various data sources in the organization like data warehouses, marts, cubes, operational data stores (ODS), transactional sources, and file systems.
He suggested that data professionals pick their battles in the data architeture transformation journey. You should play the long game and you need to be willing to lose some battles to win the war. Align your data architecture efforts with application budgets and roadmap projects in order to make progress with implementation of data architecture.
McKnight concluded the presentation by advising data professionals to look into in-memory databases where they need performant data management soutions, and to look for GPU databases and hybrid databases in the future.