Releases

DocArray 0.32 Update

DocArray is a library for representing, sending and storing multi-modal data, perfect for Machine Learning applications.

Engineering Group

May 17, 2023 • 2 min read

Release Note (`v0.32.0`)

This release contains 4 new features, 5 bug fixes and 4 documentation improvements.

🆕 Features

Subindex for document index (#1428)

The subindex feature allows you to index documents that contain another DocList by automatically creating a separate collection/index for each such DocList:

# create nested document schema
class SimpleDoc(BaseDoc):
    tensor: NdArray[10]
    text: str


class MyDoc(BaseDoc):
    docs: DocList[SimpleDoc]


# create some docs
my_docs = [
    MyDoc(
        docs=DocList[SimpleDoc](
            [
                SimpleDoc(
                    tensor=np.ones(10) * (j + 1),
                    text=f"hello {j}",
                )
                for j in range(10)
            ]
        ),
    )
]

# index them into Elasticsearch
index = ElasticDocIndex[MyDoc](index_name="idx")
index.index(my_docs)  # index with name 'idx' and 'idx__docs' will be generated

# search on the nested level (subindex)
query = np.random.rand(10)
matches_root, matches_nested, scores = index.find_subindex(
    query, search_field="docs__tensor", limit=5
)

OpenAPI and FastAPI tensor shapes (#1510)

We have enabled shaped tensors to be properly represented in OpenAPI/SwaggerUI, both in examples and the schema.

This means that you can now build web APIs using FastAPI where the SwaggerUI properly communicates tensor shapes to your users:

class Doc(BaseDoc):
    embedding_torch: TorchTensor[3, 4]


app = FastAPI()


@app.post("/foo", response_model=Doc, response_class=DocArrayResponse)
async def foo(doc: Doc) -> Doc:
    return Doc(embedding=doc.embedding_np)

Generated Swagger UI:

Save and load in-memory index (#1534)

We added a persist method to the InMemoryExactNNIndex class to save the index to disk.

# Save your existing index as a binary file
doc_index.persist('docs.bin')
# Initialize a new document index using the saved binary file
new_doc_index = InMemoryExactNNIndex[MyDoc](index_file_path='docs.bin')

🐞 Bug Fixes

`search_field` should be optional in hybrid text search (#1516)

We have added a sane default to text_search() for the search_field argument that is now Optional.

Check if file path exists for in-memory index (#1537)

We have added an internal check to see if index_file_path exists when passed to InMemoryExactNNIndex.

Add empty judgement to index search (#1533)

We have ensured that empty indices do not fail when find is called.

Detach torch tensors (#1526)

Serializing tensors with gradients no longer fails.

`Docvec` display fixes (#1522)

We have resolved Docvec display issues.

📗 Documentation Improvements

Remove erroneous info (#1531)
Fix link to documentation in readme (#1525)
Flatten structure (#1520)
Fix links (#1518)

🤘 Contributors

We would like to thank all contributors to this release:

Mohammad Kalim Akram (@makram93)
Johannes Messner (@JohannesMessner)
Anne Yang (@AnneYang720)
Zhaofeng Miao (@mapleeit)
Joan Fontanals (@JoanFM)
Kacper Łukawski (@kacperlukawski)
IyadhKhalfallah (@IyadhKhalfallah)
Saba Sturua (@jupyterjazz)

Release Note (v0.32.0)