DocArray 0.20.1 Update

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc

Announcement slide for docarray with "What's new in 0.20.1?" text, logo, and "SANDBOX PROJECT" subtitle on a starry purple ba

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multi-modal data with a Pythonic API.

๐Ÿ’ก
DocArray was released under the open-source Apache License 2.0 in January 2022. It is currently a sandbox project under LF AI & Data Foundation.

DocArray is the common data layer used in all Jina AI products.
Release ๐Ÿ’ซ Patch v0.20.1 ยท docarray/docarray
Release Note (0.20.1) Release time: 2022-12-12 09:32:37 ๐Ÿž Bug FixesMake Milvus DocumentArray thread safe and suitable for pytest (#904)This bug was causing connectivity issues when using multip...

Release Note (0.20.1)

This release contains 2 bug fixes and 1 documentation improvement.

๐Ÿž Bug Fixes

Make Milvus DocumentArray thread safe and suitable for pytest (#904)

This bug was causing connectivity issues when using multiple DocumentArrays in different threads to connect to the same Milvus instance, e.g. in pytest.

This would produce an error like the following:

E1207 14:59:51.357528591    2279 fork_posix.cc:76]           Other threads are currently calling into gRPC, skipping fork() handlers
E1207 14:59:51.367985469    2279 fork_posix.cc:76]           Other threads are currently calling into gRPC, skipping fork() handlers
E1207 14:59:51.457061884    3934 ev_epoll1_linux.cc:824]     assertion failed: gpr_atm_no_barrier_load(&g_active_poller) != (gpr_atm)worker
Fatal Python error: Aborted

This fix creates a separate gRPC connection for each MilvusDocumentArray instance, circumventing the issue.

Restore backwards compatibility for (de)serialization (#903)

DocArray v0.20.0 broke (de)serialization backwards compatibility with earlier versions of the library, making it impossible to load DocumentArrays from v0.19.1 or earlier from disk:

# DocArray <= 0.19.1
da = DocumentArray([Document() for _ in range(10)])
da.save_binary('old-da.docarray')
# DocArray == 0.20.0
da = DocumentArray.load_binary('old-da.docarray')
da.extend([Document()])
print(da)
AttributeError: 'DocumentArrayInMemory' object has no attribute '_is_subindex'

This fix restores backwards compatibility by not relying on newly introduced private attributes:

# DocArray <= 0.19.1
da = DocumentArray([Document() for _ in range(10)])
da.save_binary('old-da.docarray')
# DocArray == 0.20.1
da = DocumentArray.load_binary('old-da.docarray')
da.extend([Document()])
print(da)
<DocumentArray (length=11) at 140683902276416>

Process finished with exit code 0

๐Ÿ“— Documentation Improvement

๐Ÿค˜ Contributors

We would like to thank all contributors to this release: