Faiss相似向量检索

发表于 2019-03-09 更新于 2023-09-06

Faiss全称Facebook AI Similarity Search，是Facebook AI团队开源的相似向量检索工具。

Faiss简介

Faiss是FaceBook的AI团队针对大规模向量进行TopK相似向量检索的一个工具，使用C++编写，有python接口，对10亿量级的索引可以做到毫秒级检索的性能。

Faiss安装

1	pip install faiss-cpu # faiss-gpu (一般来说cpu足够快了)

使用示例

import pandas as pd
import numpy as np
import faiss
from text2vec import SentenceModel

samples = pd.read_excel('/root/samples/test.xlsx')

doc_list = []
for index, row in samples.iterrows():
    doc_list.append(row['text'])

embedding_model_name = 'roberta-base'
t2v_model = SentenceModel(embedding_model_name)
doc_embeddings = t2v_model.encode(doc_list, show_progress_bar=True)

# 插入向量
index = faiss.IndexFlatL2(768)  # 向量长度768
index.add(doc_embeddings)

# 相似向量查询
query = 'hello world'
query_embeddings = t2v_model.encode([query], show_progress_bar=True)
top_k = 1
res_dist, res_id = index.search(query_embeddings, top_k)
print(res_id[0])

# 删除部分向量
index.remove_ids(np.arange(1000, 1111))

# 保存向量文件
faiss.write_index(index, "/root/db/test.index")
# 加载向量文件
loaded_index = faiss.read_index("/root/db/test.index")