This article first appeared in IEEE IT Professional magazine. IEEE IT Professionaloffers solid, peer-reviewed information about today's strategic technology issues. To meet the challenges of running reliable, flexible enterprises, IT managers and technical leads rely on IT Pro for state-of-the-art solutions.
Big data is sweeping through many areas of industry and government. In basic terms, big data refers to datasets that are large in volume, diverse in data sources and types, and created quickly, resulting in greater challenges in harvesting, managing, and processing them via traditional systems and capabilities. 1 Since the term’s recent inception, businesses, government entities, and research institutions (such as universities) have joined the wave of optimism about big data’s potential benefits, 2,3 although some have sounded cautionary notes. 4,5 Big data’s impact looks so significant that the topic has already drawn unprecedented attention from IT and business professionals, management executives, scientists, and policymakers.
Although the discussion of “the promise and peril of big data” 6 will continue, what seems increasingly clear is that big data has moved past the initial conception of datasets and is becoming a strategic imperative in many businesses and governments. Big data’s application areas go beyond “business intelligence” from corporate structured databases and now include intelligence extraction and the creation of actionable knowledge from unstructured databases, social media platforms, intelligent sensors, and digital devices for innovation in services and products. Recent forecasts and estimates of big data markets indicate promising growth in the future.7
In this context, I present a service-oriented and evolutionary picture of big data in which we can view it more appropriately as the latest example of disruptive IT-enabled services (IESs) rather than as datasets. Big data services, as data-focused IESs, emerge from combining diverse resources (such as processing technologies, advanced algorithms, and analytical talents) from an ecosystem of technologies, market needs, social actors, and other institutional contexts. The research I present draws on more than 260,000 big-data-related tweets collected in two phases (March 2013 and June 2014) and processed via natural language processing techniques. I present this data using network visualization.
Big Data as IT-Enabled Services
Services, rather than products, are increasingly viewed as the main driver in business and economic growth. 8 Ever more services are available in industries such as healthcare, finance, education, and marketing. Even manufacturing companies are transforming their businesses to be service-oriented. 9 IT has been a powerful enabler behind these transformations and innovations. For example, the benefits of customer relationship management (CRM), a popular service for customer acquisition and retention, can’t be fully realized without IT. Similarly, contemporary services such as e-healthcare, electronic financial services, and e-logistics aren’t possible without IT’s enabling role.
IESs are unlike traditional products (or goods) in several ways. 10 At a basic level, both manufacturing goods and traditional technological products take a physical form and are generally considered complete when they’re sold from producer to consumer. Clear boundaries usually exist between where products are made and where they are consumed. Also, the interaction between producers and consumers is short (including the time of the sales transaction). On the other hand, IESs such as big data services involve the dynamic integration of tangible (for instance, technology) and intangible (analytic talents or strategies) resources from service providers and customers. They are thus value-creating networks of data-focused resources that need continuous adaptation (or learning) over time due to changes in business needs, technological infrastructures, and institutional contexts (security concerns, privacy regulations, or big data labor markets, for instance).
IT-enabled innovation is distinct from other types of innovation, and big data services would represent a powerful case of disruptive IT-enabled innovation. Although technological innovation in general has affected the nature of work, IT-enabled innovation has led to ever more data creation and acquisition. 11 Moreover, “unlike other innovations, IT analyzes such data (for example, through data mining) and generates even more information. Never before has a technological innovation given businesses the ability to incessantly assess and reinvent themselves.” 11 IT-enabled innovation has thus created both opportunities and challenges.
According to Jacques Bughin and his colleagues, the world’s stock of data is doubling every 20 months. 12 These data are large in size, diverse in format, and fast moving, and have enabled tremendous opportunities for finding patterns and relationships and for predicting future events. However, this also creates unprecedented challenges in acquiring, storing, processing, and governing such data. IT is behind this emerging phenomenon. These opportunities and challenges have led to the emergence of big data services, of which IT remains the core.
Big data services would also differ from other IT-enabled innovation. For example, CRM has focused mainly on customer data through a configuration of relational database and dashboard technologies, and their impact is limited to interactions between an organization and its customers. In contrast, big data services are considered a disruptive innovation in terms of scale and impact. Not only have they already led to some profound changes in how data are managed and decisions are made in many industries, 2 they’ve also created markets for technologies, business models, analytical talents, and education. Still, big data offers more “disruptive possibilities” 13 for business and government decision making, consumer markets, future technologies and computing, and other areas.
Big Data Service Ecosystem
Like other IESs, big data services combine heterogeneous (both tangible and intangible) resources from diverse social actors. 10 They are evolving in an ecosystem in which diverse technologies, market needs, social actors, and institutional contexts intermingle and change over time. We might be able to glimpse the big data ecosystem by observing social media conversations among people and organizations. Twitter is considered suitable for this purpose because its data are now widely used for business applications and academic research. 14
Figure 1. The big data services ecosystem as of March 2013 as reflected in Twitter hashtags. This heterogeneous network includes software, hardware, and networking resources; human capital; industry applications and methodological techniques; and social actors and the new ideas and concepts those actors coin.
In Twitter, hashtags (such as #cloud or #ha doop) mark important themes or topics. Twitter users (individuals and organizations) include #bigdata in their tweets to indicate that their messages are engaging in big-data-related conversations. Figure 1 shows the ecosystem of big data services using Twitter hashtags. The figure illustrates the big data services ecosystem as a heterogeneous network of software, hardware, and networking resources; human capital (such as skills); industry applications and methodological techniques; and social actors (for example, companies or professional associations) and the new ideas and concepts (predictive analytics) those actors coin. The network was built by analyzing and visualizing the most popular hashtags (out of more than 10,000) that coappeared with the hashtag #bigdata, and their levels of popularity based on almost 100,000 tweets containing #bigdata that were harvested in March 2013.
From this network visualization, we can infer that diverse network technologies and computing techniques are major components of big data services. These include cloud computing (#cloud, #cloudcomputing), mobile computing (#mobile, #wireless, #mobility), networkembedded computing techniques (such as #sdn, for software-defined networking), and highperformance computing (#hpc). Also, it isn’t difficult to determine the existence of several software and programming frameworks, which would be at different levels of adoption, for storing, processing, and analyzing large datasets. For example, Hadoop appears to have already taken the position of de facto software framework for big data management, and MapReduce (#mapreduce) showed a strong presence as a big data programming framework in early 2013. Moreover, new services are appearing (#hdinsight, #ibminsight) from big businesses such as Microsoft and IBM, which creatively combine mul- tiple resources (for example, cloud services and Hadoop). On the other hand, the frameworks and technologies—the Internet of Things (#iot, #internetofthings), wearable computing (#wearable, #wearablecomput-ing), and social media analytics (#api, #sentiment [for sentiment analysis], #nlproc [for natural language processing])—appear as relatively new components of big data services.
The ecosystem also shows diverse skills and talents that are intangible but integral to any big data services. Such resources include programming skills (#ruby, #java, #python, #r) and analytical knowledge (#statistics, #datamining, #machinelearning). Thus, a new category of people (called #datascientists) with these skills and talents has emerged as an important resource for any big data service. 2 Diverse applications of big data appear, including marketing (#crm, #marketing, #retail), supply chain management (#supplychain), and healthcare (#healthcare, #mhealth). Many social actors play critical roles in the ecosystem, largely as cocreators of big data services. These include IBM, Google, SAP, Oracle, SAS, and Twitter, among others. Also, we can see that some concepts (#bi [for business intelligence], #predictiveanalytics) are continuously (re)coined in the ecosystem by these social actors and help us envision how new types of resources and services are configured.
As is clear, the big data services ecosystem encompasses many technologies, frameworks, skills, application areas, and social actors. These resources can be combined to help tackle organizational problems and create business value for the cocreators of big data services, as in the example of big data analytic services for healthcare running on mobile and cloud computing. 15 This big data services ecosystem already seemed “big” in March 2013, yet new resources (and large-scale services) are continually being created from combinations of existing resources (and small-scale services), as I examine next.
Big Data Services Evolution
The big data ecosystem is ever-changing, as new resources and services emerge from existing ones; some are selectively retained and have greater influence on even newer generations of resources and services, whereas others become less attractive and fade away. This is the innovation process in IES development, which is similar to the evolutionary process in biology.
The evolutionary process in general involves variation and selective retention. 16 Variation occurs when new resources are created in the ecosystem. It is the outcome of (re)combining existing resources and services and can be incremental or radical. 10 The overall result is more diversity in the ecosystem. Diversity is pivotal in ecosystem development, yet some resources and services are more combinable than others and so get more popular and become the building blocks for new resources and services; this mechanism is generally called selective retention.
IESs and their ecosystems, including big data services, change via this evolutionary process. Some companies are on the cutting edge of understanding this process for developing IESs. Figure 2 shows a big data services ecosystem that’s different from that in Figure 1. Here, the network analysis and visualization are based on popular hashtags found in more than 160,000 tweets containing #bigdata in June 2014 (a 60 percent increase from March 2013). This signifies the increasing popularity of big data in industry. The number of hashtags that coappeared with the hashtag #bigdata was about 18,000. This is an 80 percent increase from March 2013, indicating that much more diversity exists in the big data services ecosystem as of June 2014. 2,17 18
Figure 2. The big data services ecosystem as reflected by Twitter hashtags as of June 2014. The number of hashtags coappearing with #bigdata increased 80 percent since May 2013, and the number of tweets containing #bigdata that were harvested increased by 60 percent in the same time period.
Many variations in technologies, skills, market needs, social actors, and concepts are potential resources for big data services. For example, more diversity seems present in platforms and technologies for big data storage and processing, which now includes Apache Spark (#spark, #apachespark), HP’s big data platform (#hphaven), and Hadoop Yarn (or Hadoop2), among others. Several major variations appear in resources and services, including emerging methodologies and techniques (such as #devops, an overarching methodology for IES production and delivery, and #deeplearning, a new area of machine learning and AI research), new database engines (#mon- godb), and large-scale big data services such as IBM’s bluemix (#bluemix), which integrates a variety of technologies and tools in a cloud.
Such variation also includes emerging big data applications (#hr, #hranalytics, #smartcity, #pharma, #banking, #digitalmarketing, #apm [for application performance management], #highered), big data skills (#hive, #hbase), new social actors (#linkedin, #amazon, #nsa, #cloudera, #qlik), and novel concepts (#newsql, #sddc [for software-defined datacenter]). Other noticeable variations in this new ecosystem are the emergence of potential resources related to big data services, such as social media analytics services (#sentiment, #nlproc, #voc [for voice of customers], #json, #facebook, #internet, #digital, #api), and machine-to-machine (M2M) analytics services (#iot, #internetofthings, #m2m, #quantifiedself [which tracks data about a person’s daily life], #wearabletech, #wireless).
Although variation seems clear in this new ecosystem, selective retention is always part of this evolutionary process. The components in the ecosystem have become more or less “attractive” over time. As the ecosystem has grown, most components have obtained greater stability. Examples include some of the resources and social actors mentioned earlier, including #machinelearning, #nosql, #spark, #internetofthings, #api, #hadoop, #m2m, #amazon, and #google. Some components have become less prevalent. For example, previously popular (or mature) data-driven concepts, technologies, tools, and social actors (#bi, #datacenter, #sql, #datawarehouse) have become less combinable when compared with relatively new ones (#dataanalytics, #datascicence, #sddc, #r, #python, #mongodb, #splunk). Even in this growing ecosystem, certain resources are used significantly less in configuring new generations of big data services. Examples include #erp, #excel, and #sqlserver. Also, some elements, such as #smarteranalytics, appear to be almost “selected out.” In short, the big data services ecosystem never settles.
Big Data Services Coevolution
Ecosystems coevolve. 19 Thus, the big data services ecosystem doesn’t exist in a vacuum. It co-evolves with other ecosystems in the broader business and technology landscape. For example, the cloud services ecosystem includes service- oriented architecture (SOA), hardware, delivery methodologies, process modeling techniques, and people skills, making it different from the big data services ecosystem. However, not only are the boundaries not clear-cut, but they have co-evolved over the years (and decades). Just as cloud computing has been integrated into (ERPbased) enterprise integration services as a service delivery “platform,” this cloud-based innovation has enabled the delivery of big data services from client/server-based to cloud-based. This innovation has brought big data services (for instance, Microsoft Azure, Amazon Web Services, and Google BigQuery) to the masses, which don’t have physical hardware, software, or network infrastructure in place.
Likewise, big data services (and their ecosystem) are changing many services, products, and industries, and serving as a platform for innovation in other ecosystems. In fact, the impact of big data services as an innovation platform could be more profound than that of any preceding IES. For example, while cloud services have changed how technologies and infrastructure are offered, big data services are becoming “a set of capabilities that need to be deeply embedded across functions and operations.” 12 They have already fueled innovative operations and data-driven decision-making in many areas, including supply chain management, marketing, politics, science and technology, healthcare, R&D, security, and public safety.
Also, the latest innovation using big data services as a platform is found in the increasing diffusion of IoT-based services in supply chain management, retailing, healthcare, and other areas. The IoT, which has evolved from mobile computing and wireless network services, creates unprecedented amounts of data. 20 In fact, the business value of IoT-based services, such as smart logistics services, is in the use of this data. Thus, for IoT-based services to be effective in practice, big data capabilities (such as processing and mining data) become a pivotal component. Without such capabilities, these services can’t deliver on their promises. In turn, IoT-based services will create new requirements (for example, greater security or efficient processing of machine data), which will necessitate a novel configuration of big data resources with these new capabilities.
It is increasingly clear now that big data is becoming a strategic imperative for businesses and governments. Businesses, governments, and IT professionals are urged to understand big data resources and the ecosystem from a serviceoriented, evolutionary, and coevolutionary viewpoint. They must update their capabilities to identify opportunities in the big data ecosystem and other IESs and to dynamically reconfigure heterogeneous, data-focused resources from the ever-changing ecosystem of technologies, market needs, social actors, and institutional contexts. The focus should be on actively experimenting with novel big data services for value creation. Such 2,17 capabilities and activities are critically important because big data services are likely to affect more areas (such as human resources and new product development) of organizational activities and business practices, and to become the platform for new generations of IESs (such as the IoT).
There is much room for future research. For example, future research can benefit from a multistage data sampling design, which would include Twitter data from as early as 2010 or 2011 to the present. This research design would reveal a more complete picture of big data, its ecosystem, and their coevolution over time. The findings could offer insights to predict future trends in big data, the ecosystem, and other IESs. Another research stream could focus on the intersection of big data services and other IESs (such as the IoT and the cloud) and the emergence of new IESs. This could provide insights into the evolution of IES in both technological and institutional contexts.
References
- A. Bharadwaj et al., “Digital Business Strategy: Toward a Next Generation of Insights,” MIS Quarterly, vol. 37, no. 2, 2013, pp. 471–482.
- T. Davenport, “Analytics 3.0,” Harvard Business Rev., vol. 91, no. 12, 2013, pp. 64–72.
- G.-H. Kim, S. Trimi, and J.-H. Chung, “Big-Data Applications in the Government Sector,” Comm. ACM, vol. 57, no. 3, 2014, pp. 78–85.
- D. Boyd and K. Crawford, “Critical Questions for Big Data,” Information, Communication & Society, vol. 15, no. 5, 2012, pp. 662–679.
- D. Lazer et al., “The Parable of Google Flu: Traps in Big Data Analysis,” Science, vol. 343, no. 6176, 2014, pp. 1203–1205.
- D. Bollier, The Promise and Peril of Big Data, Aspen Institute, 2010.
- L. Columbus, “Roundup of Analytics, Big Data & Business Intelligence Forecasts and Market Estimates, 2014,” Forbes, 24 June 2014.
- S. Vargo and R. Lusch, “Evolving to a New Dominant Logic for Marketing,” J. Marketing, vol. 68, 2004, pp. 1–17.
- J. Holmstrom and J. Partanen, “Digital Manufacturing-Driven Transformations of Service Supply Chains for Complex Products,” Supply Chain Management: An International J., vol. 19, no. 4, 2014, pp. 421–430.
- B. Chae, “A Complexity Theory Approach to IT- Enabled Services (IESs) and Service Innovation: Business Analytics as an Illustration of IES,” Decision Support Systems, vol. 57, 2014, pp. 1–10.
- R. Kohli and V. Grover, “Business Value of IT: An Essay on Expanding Research Directions to Keep Up with the Times,” J. Assoc. for Information Systems, vol. 9, no. 1, 2008, pp. 23–39.
- J. Bughin, M. Chui, and J. Manyika, “Ten IT-Enabled Business Trends for the Decade Ahead,” McKinsey Quarterly, May 2013, pp. 1–45.
- J. Needham, Disruptive Possibilities: How Big Data Changes Everything, O’Reilly Media, 2013.
- S. Williams, M. Terras, and C. Warwick, “What Do People Study When They Study Twitter? Classifying Twitter Related Academic Papers,” J. Documentation, vol. 69, no. 3, 2013, pp. 384–410.
- H. Demirkan, “A Smart Healthcare Systems Framework,” IEEE IT Professional, vol. 15, no. 5, 2013, pp. 38–45; doi: 10.1109/MITP.2013.35.
- D. Campbell, “Blind Varieties and Selective Retention in Creative Thought and Other Processes,” Psychological Rev., vol. 67, 1960, pp. 380–400.
- H. Chen, D. Chiang, and V. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36, no. 4, 2012, pp. 1–24.
- C. O’Reilly III, J. Harreld, and M. Tushman, “Organizational Ambidexterity: IBM and Emerging Business Opportunities,” California Management Rev., vol. 51, no. 4, 2009, pp. 75–99.
- S. Kauffman, At Home in the Universe: The Search for Laws of Self-Organization and Complexity, Oxford Univ. Press, 1995.
- B. Violino, “The ‘Internet of Things’ Will Mean Really, Really Big Data,” InfoWorl d, vol. 2014, June 2014, pp. 1–7.
About the Author
Bongsug (Kevin) Chae is a professor in information and operations management at Kansas State University. His research interests include big data, supply chain analytics, social media analytics, and IT-enabled service innovation. Chae has published papers in areas such as big data analytics, social media analytics, IT-enabled service innovation, supply chain analytics, and data mining. He received the Ralph Reitz Teaching Award at Kansas State University. Contact him at kevinbschae@gmail.com.
This article first appeared in IEEE IT Professional magazine. IEEE IT Professionaloffers solid, peer-reviewed information about today's strategic technology issues. To meet the challenges of running reliable, flexible enterprises, IT managers and technical leads rely on IT Pro for state-of-the-art solutions.