InfoQ Homepage News Timescale Bolsters AI-Ready PostgreSQL with pgai Vectorizer

Timescale Bolsters AI-Ready PostgreSQL with pgai Vectorizer

This item in japanese

Nov 20, 2024 2 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Timescale recently expanded its PostgreSQL AI offerings with pgai Vectorizer. This update enables developers to create, store, and manage vector embeddings alongside relational data without the need for external tools or additional infrastructure.

TimescaleDB, an open-source extension for PostgreSQL tailored for time-series data, first augmented PostgreSQL with real-time analytics features. Now, Timescale is enhancing AI integration with the pgai suite and the introduction of pgai Vectorizer, enabling developers to conduct AI development seamlessly within PostgreSQL.

Contributors have noted some challenges during the development process. One contributor, Tostino, highlighted issues with the OpenAI API compliance, noting that the current implementation lacks several arguments necessary for using proxy solutions or custom samplers on open-source inference servers. Additionally, Tostino suggested that functions providing a "simple" wrapper should be built on top of raw functions returning JSON, rather than strict data types, to enhance flexibility.

Building AI systems like search engines and AI agents often requires complex workflows. The pgai Vectorizer streamlines this by integrating the entire AI workflow into PostgreSQL, allowing developers to create advanced AI applications quickly and efficiently using familiar SQL commands.

Source

Timescale argues that the standard approach of treating vector embeddings as standalone data leads to synchronization issues and stale data. The Institute for Ethical AI & Machine Learning comments:

TimescaleDB proposes treating embeddings as derived data similar to database indexes, which is interesting given recent extensions from DBs like planetscale to integrate embeddings natively into indexes, similarly through a "native vectorizer" abstraction. In this case however they still leverage the OSS pgai Vectorizer for PostgreSQL which helps automating the synchronization of embeddings with their source data within the database.

The pgvector and pgvectorscale extensions allow you to store vector embeddings in your database and perform fast and efficient vector searches. The pgai Vectorizer builds on top of these extensions to automatically create and synchronize embeddings for any text data in your database.

With one line of code, you can define a vectorizer that creates embeddings for data in a table. Suvarna Kadam, a machine learning consultant comments:

pgai Vectorizer makes it possible to use one SQL command that will manage your vector embeddings "without" the usual engineering challenges to keep it in sync with your source data!

SELECT ai.create_vectorizer( 
    <table_name>::regclass,
    destination => <embedding_table_name>,
    embedding => ai.embedding_openai(<model_name>, <dimensions>),
    chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);

In the same week, Neon Database Labs also introduced Pgrag, an experimental PostgreSQL extension aimed at supporting end-to-end retrieval-augmented-generation (RAG) pipelines, further expanding their own AI capabilities.

In addition to the recent launch of the pgai Vectorizer, there has been community interest in expanding the range of supported embedding models beyond OpenAI. Contributor claudeomusic inquired about the possibility of making the choice of embedding models configurable, highlighting the importance of flexibility for users. In response, alejandrodnm from Timescale confirmed that while the current Vectorizer feature supports only OpenAI models, there are plans to include other providers in the future. The team is open to contributions from the community to help achieve this goal. Another contributor, wang, shared his workaround on How to use with Openrouter.

To quickly try out embeddings using a pre-built Docker developer environment, see the Vectorizer quick start. For more detailed technical specifications, see the Vectorizer API reference.

About the Author

Mohit Palriwal

Mohit Palriwal is a Senior Software Engineer at Netflix, where he is a crucial member of the Netflix Observability Team. Part of the team behind the Netflix Atlas project, an open-source dimensional time series database designed to handle high-scale requirements. Before joining Netflix, Mohit was a Principal Software Engineer at Salesforce, where he collaborates to build the Observability Cloud on AWS. Mohit’s experience also extends to Amazon Web Services (AWS), where he spent over four years developing and launching AWS Pinpoint on serverless architecture.

Show moreShow less

This content is in the DevOps topic

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Timescale Bolsters AI-Ready PostgreSQL with pgai Vectorizer

Write & Win: InfoQ Contest

About the Author

Mohit Palriwal

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter