InfoQ Homepage Data Analysis Content on InfoQ
-
Designing the Jit Analytics Architecture for Scale and Reuse
As a SaaS provider, analytical data at Jit needs to be useful to both their customers and to internal stakeholders. AWS services including EventBridge, Kinesis Data Firehose, and Timestream handle data ingestion and UI platforms from Mixpanel and Segment provide data visualization.
-
Understanding and Applying Correspondence Analysis
Customer segments, personality profiles, social classes, and age generations are examples of effective references to larger groups of people sharing similar characteristics. Correspondence analysis (CA) is a multivariate analysis technique that projects categorical data into a numeric feature space which captures most of the variability in the data by fewer dimensions.
-
Building Latency Sensitive User Facing Analytics via Apache Pinot
At QCon, a virtual conference for senior software engineers and architects covering the trends, Chinmay Soman talked about how you can use Apache Pinot as part of your data pipelines for building rich, external, or site-facing analytics.
-
How Optimizing MLOps Can Revolutionize Enterprise AI
In this article, author Monte Zweben discusses data science architecture, containerization, and how new solutions like Feature Store can help with the full lifecycle of machine learning processes.
-
The Evolution of Precomputation Technology and its Role in Data Analytics
In this article, author Yang Li discusses the importance of precomputation techniques in databases, OLAP and data cubes, and some of the trends in using precomputation in big data analytics.
-
Overcoming Data Scarcity and Privacy Challenges with Synthetic Data
In this article, the author discusses the importance of using synthetic data in data analytics projects, especially in financial institutions, to solve the problems of data scarcity and more importantly data privacy.
-
COVID-19 and Mining Social Media - Enabling Machine Learning Workloads with Big Data
In this article, author Adi Pollock discusses how to enable machine learning workloads with big data to query and analyze COVID-19 tweets to understand social sentiment towards COVID-19.
-
Scalable Cloud Environment for Distributed Data Pipelines with Apache Airflow
In this article, author Lena Hall discusses how to use Apache Airflow to define and execute distributed data pipelines with an example of the workflow framework running on Kubernetes on Azure cloud platform.
-
Easy Interpretation of a Logistic Regression Model with Delta-p Statistics
Delta-p statistics is an easier means of communicating results to a non-technical audience than the plain coefficients of a logistic regression model. In this article, authors Maarit Widmann and Alfredo Roccato discuss how to predict credit eligibility using the Delta-p statistics based solution.
-
Data Leadership Book Review and Interview
Data Leadership book, authored by Anthony Algmin, covers the data leadership topic and how data leaders should manage and govern the data management programs in their organizations. Data Leadership is how organizations choose to apply their energy and resources toward creating data capabilities to influence their business.
-
Innovation Startups Modeling Agile Culture
Innovation is not only about the most advanced technology; management and processes are the new era of startups' innovation. To mix the power of the data and the importance of people to offer business intelligence is a key point nowadays. The result is not only the most important thing; the way you do it is more important. To be agile is to adapt to today's market.
-
2020 State of Testing Report
The 2020 State of Testing report provides insights into the adoption of test techniques, practices, and test automation, and the challenges that testers are facing. It shares results from the 2020 testing survey organized by Joel Montvelisky from PractiTest, and Lalit Bhamare from Tea-Time with Testers.