InfoQ Homepage Data Science Content on InfoQ
-
NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency
Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.
-
Anaconda Publishes 2022 State of Data Science Report
Anaconda, makers of a Python distribution popular among data scientists, recently published a report on the results of their State of Data Science survey. The report summarizes responses from nearly 3,500 students, academics, and professionals from 133 countries, and covers topics about respondent demographics and jobs as well as trends within the community.
-
Alpa: Automating Model Sharding for Distributed Deep Learning
A new open-source library called Alpa aims to automate distributed training and serving of large deep networks. It proposes a compiler where existing model-parallel strategies are combined and the usage of computing resources is optimized according to the deep network architecture.
-
TensorFlow DTensor: Unified API for Distributed Deep Network Training
Recently released TensorFlow v2.9 introduces a new API for the model, data, and space-parallel (aka spatially tiled) deep network training. DTensor aims to decouple sharding directives from the model code by providing higher-level utilities to partition the model and batch parameters between devices.
-
Ten Lessons from Three Generations of Tensor Processing Units
A recent report published by Google’s TPU group highlights ten takeaways from developing three generations of tensor processing units. The authors also discuss how their previous experience will affect the development of future tensor processing units.
-
Evaluating Continual Deep Learning: a New Benchmark for Image Classification
Continual learning aims to preserve knowledge across deep network training iterations. A new dataset entitled "The CLEAR Benchmark: Continual LEArning on Real-World Imagery" has recently been published. The goal of the study is to establish a consistent image classification benchmark with the natural time evolution of objects for a more realistic comparison of continual learning models.
-
Apple Open Sources GCGC: a Tool to Analyze Java GC Logs
Apple has open-sourced GCGC, a tool for Java Garbage Collector (GC) logs visualization based on Python 3 and pandas. GCGC uses a Jupyter notebook to analyze and visualize GC log files.
-
Julia 1.7 Extends its Threading Capabilities, Improves Type Inference, and More
Julia 1.7 brings a number of significant enhancements, including new threading capabilities, new Package Manager features, improved type inference, and new syntactic features. It is also the first release to run natively on Apple Silicon.
-
BasisAI Open Source Boxkite Machine Learning Monitoring Tool
Boxkite is an open source instrumentation library designed to track concept drift in highly available model servers. It integrates with DevOps tools such as Grafana, Prometheus, fluentd and kubeflow, scaling horizontally to multiple replicas without needing changes to code or infrastructure. The project claims to be fast, correct and simple.
-
Pyodide Brings Python and Its Scientific Stack to the Browser with WebAssembly
Mozilla announced that Pyodide, which aims at providing a full Python data science stack running entirely in the browser, has become an independent community-driven project. Pyodide uses the CPython 3.8 interpreter compiled to WebAssembly, and thus allows using Python, NumPy, Pandas, Matplotlib, SciPy, and more in Iodide, an experimental interactive scientific computing environment for the web.
-
AWS Announces a Data Management and Analytics Solution Called Amazon FinSpace
Recently, AWS announced a data management and analytics solution purpose-built for the Financial Services Industry (FSI) called Amazon FinSpace. The service aims to reduce the time it takes for financial analysts to find and access all types of financial data for analysis.
-
Kaggle Publishes 2020 State of Machine Learning and Data Science Report
Kaggle has published a report on the State of Machine Learning and Data Science for 2020. The report is based on survey responses from over two thousand users currently employed as data scientists. The report notes that the "vast majority" of data scientists are under 35 years of age, two-thirds have a graduate degree, and most have less than 10 years coding experience.
-
Using Agile with a Data Science Team
Agile helped a data science team to better collaborate with their stakeholders and increase their productivity. As priorities became clear, the team was able to focus and deliver. Buy-in of the data science team by taking them through a journey of agile was crucial to making it work.
-
NVIDIA Releases a $59 Jetson Nano 2GB Kit to Make AI More Accessible to Developers
With the Jetson series of devices and software SDKs, NVIDIA creates a coherent development environment to learn and develop GPU-based AI applications.
-
Is Julia Production Ready? Q&A with Bogumił Kamiński
On the heels of JuliaCon 2020, SGH Warsaw School of Economics professor and DataFrames.jl maintainer Bogumił Kamiński summarized the status of the language and its ecosystem and stated that Julia is finally production-ready. InfoQ has taken the chance to speak with professor Kamiński.