InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Microsoft Introduces Vector Data Abstractions Library for .NET

.NET

Microsoft Introduces Vector Data Abstractions Library for .NET

Nov 04, 2024 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

On October 29th 2024, Microsoft released Microsoft.Extensions.VectorData.Abstractions library for .NET in preview. It makes it easier to integrate .NET solutions with the AI Semantic Kernel SDK, using abstractions over concrete AI implementations and models.

Microsoft Semantic Kernel is an enterprise-ready SDK that allows developers to plug in different LLM models and different languages, featuring automatic orchestrations for those plugins. As a result of the collaboration between Semantic Kernel and .NET teams at Microsoft, they released several helper and abstraction libraries. The first one was Microsoft.Extensions.AI library to abstract common AI services such as a chat client, for example.

The second package, the Microsoft.Extensions.VectorData.Abstractions library, focuses on abstracting away the vector stores used for LLM embeddings. An embedding is a representation of a data record in a high-dimensional vector space. They allow the LLM model to convert discrete data into a format that can be further processed by the LLM neural networks. Semantically similar records will embed closer to each other in the vector space, allowing for semantic search instead of a simple text match.

The operations that the Microsoft.Extensions.VectorData.Abstractions library supports are CRUD and search operations. Developers use .NET POCO objects annotated with vector attributes such as VectorStoreRecordKey, VectorStoreRecordData and VectorStoreRecordVector, as illustrated in the following example:

public class Movie
{
    [VectorStoreRecordKey]
    public int Key {get;set;}

    [VectorStoreRecordData] 
    public string Title {get;set;}

    [VectorStoreRecordData]
    public string Description {get;set;}

    [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float> Vector {get;set;}
}

Note that the Movie class specifies a key, two string properties as data and one derived property, Vector, that represents the record vector embedding in the LLM (with 384 dimensions and using cosine-similarity distance function).

To store and embed the movie records, the abstraction library uses IEmbeddingGenerator and IVectorStore interfaces. The store is an in-memory vector store provided by Semantic Kernel. As for the embedding, the sample leverages Ollama pre-made LLM package that runs on the developer machine and uses all-minilm small embedding model, although other larger embedding models can be used. This configuration is achieved with the following line of code:

IEmbeddingGenerator<string,Embedding<float>> generator = new OllamaEmbeddingGenerator(new Uri("http://localhost:11434/"), "all-minilm");

To store the records in the vector store, the POCO model Vector property is generated by the IEmbeddingGenerator.GenerateEmbeddingVectorAsync method. The record is then stored in the vector space (movies is a collection of objects in IVectorStore).

movie.Vector = await generator.GenerateEmbeddingVectorAsync(movie.Description);
await movies.UpsertAsync(movie);

Now, these embedded records can be queried. The query is a LLM prompt text, embedded as a vector using the same interface as before:

var query = "A family friendly movie";
var queryEmbedding = await generator.GenerateEmbeddingVectorAsync(query);

The vector data store interface has a VectorizedSearchAsync method that retrieves the items closest to the prompt text provided for the search.

var searchOptions = new VectorSearchOptions()
{
    Top = 1,
    VectorPropertyName = "Vector"
};

var results = await movies.VectorizedSearchAsync(queryEmbedding, searchOptions);

The full code example is provided on the Microsoft blog post, while there are other code samples at the Semantic Kernel learning site.

A direct application of the vector store abstraction library is to extend a LLM with custom data store, using retrieval-augmented generation (RAG) that allows using LLMs to query a specific knowledge base, all without the need to retrain the model. There is a full sample of vector store RAG provided by Microsoft.

The library is released as a preview, and it’s expected to be in preview until .NET 9 is released. Developers can provide feedback via the GitHub repository issue list.

For the future, Microsoft states that the plan is to:

Continue collaborating with Semantic Kernel to build on top of the abstractions as well as Microsoft.Extensions.VectorData to bring more streamlined experiences in RAG scenarios. (..)

Work with vector store partners in the ecosystem that offer client SDKs, library authors, and developers across the .NET ecosystem on Microsoft.Extensions.VectorData adoption.

About the Author

Edin Kapić

Edin Kapic is a Lead Software Engineer working in Vista, based in Barcelona (Spain). Edin started messing with .NET and SharePoint in 2005 and still tinkers with it. He was SharePoint MVP untin 2022. Together with two other like-minded SharePoint addicts he founded and currently acts as a president of the SharePoint User Group Catalonia (SUG.CAT). He writes and speaks about technology in numerous publications and events, in Spain and abroad. When he has some free time, between technical stuff, he enjoys flight simulation, sailing, reading and hiking.

Show moreShow less

This content is in the .NET topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Microsoft Introduces Vector Data Abstractions Library for .NET

Write for InfoQ

About the Author

Edin Kapić

This content is in the .NET topic

Related Topics:

Popular in .NET

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter