Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Kinetica vector DB service #2058

Open
2 tasks done
am-kinetica opened this issue Nov 19, 2024 · 14 comments · May be fixed by #2098
Open
2 tasks done

[FEA]: Kinetica vector DB service #2058

am-kinetica opened this issue Nov 19, 2024 · 14 comments · May be fixed by #2098
Labels
external This issue was filed by someone outside of the Morpheus team feature request New feature or request

Comments

@am-kinetica
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

We at Kinetica would like to provide an implementation of VectorDBService that works with the Kinetica database. The idea is to enable the write to vector db stage of a pipeline output the data to Kinetica DB as it does to Milvus right now.

Describe your ideal solution

A new module similar to milvus_vector_db_service.py.

Additional context

This would enable Kinetica to use the the nv_ingest microservice to be configured to be used with the Kinetica database.

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@am-kinetica am-kinetica added the feature request New feature or request label Nov 19, 2024
@morpheus-bot-test morpheus-bot-test bot added Needs Triage Need team to review and classify external This issue was filed by someone outside of the Morpheus team labels Nov 19, 2024
@morpheus-bot-test
Copy link

Hi @am-kinetica!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the meantime, feel free to add any relevant information to this issue.

@efajardo-nv
Copy link
Contributor

Thanks @am-kinetica. This sounds great. Looking forward to the pull request.

@efajardo-nv efajardo-nv removed the Needs Triage Need team to review and classify label Nov 19, 2024
@am-kinetica
Copy link
Author

am-kinetica commented Nov 21, 2024

@efajardo-nv

class VectorDBServiceFactory:

    @typing.overload
    @classmethod
    def create_instance(
            cls, service_name: typing.Literal["milvus"], *args: typing.Any,
            **kwargs: dict[str,
                           typing.Any]) -> "morpheus_llm.service.vdb.milvus_vector_db_service.MilvusVectorDBService":
        pass

    @classmethod
    @handle_service_exceptions
    def create_instance(cls, service_name: str, *args: typing.Any, **kwargs: dict[str, typing.Any]):
        """
        Factory for creating instances of vector database service classes. This factory allows dynamically
        creating instances of vector database service classes based on the provided service name.
        Each service name corresponds to a specific implementation class.

        Parameters
        ----------
        service_name : str
            The name of the vector database service to create.
        *args : typing.Any
            Variable-length argument list to pass to the service constructor.
        **kwargs : dict[str, typing.Any]
            Arbitrary keyword arguments to pass to the service constructor.

        Returns
        -------
            An instance of the specified vector database service class.

        Raises
        ------
        ValueError
            If the specified service name is not found or does not correspond to a valid service class.
        """
        module_name = f"morpheus_llm.service.vdb.{service_name}_vector_db_service"
        module = importlib.import_module(module_name)
        class_name = f"{service_name.capitalize()}VectorDBService"
        class_ = getattr(module, class_name)
        instance = class_(*args, **kwargs)
        return instance

Why does the create_instance method return an instance of MilvusVectorDBService instead of returning VectorDBService ? Any particular reason ?

@efajardo-nv
Copy link
Contributor

@am-kinetica Thanks for catching that. That's correct. Should be VectorDBService but shouldn't affect anything since it's overloaded. We'll get that updated.

@am-kinetica
Copy link
Author

@am-kinetica Thanks for catching that. That's correct. Should be VectorDBService but shouldn't affect anything since it's overloaded. We'll get that updated.

Thanks, doesn't affect anything, just looks restrictive.

@efajardo-nv
Copy link
Contributor

efajardo-nv commented Nov 21, 2024

@am-kinetica I was mistaken. The use of the typing.overload decorator here is actually to allow for more precise type checking when milvus is passed to create_instance (i.e. expected return type would be MilvusVectorDBService). Looks like we need one for FaissVectorDBService as well.

@am-kinetica
Copy link
Author

@am-kinetica I was mistaken. The use of the typing.overload decorator here is actually to allow for more precise type checking when milvus is passed to create_instance (i.e. expected return type would be MilvusVectorDBService). Looks like we need one for FaissVectorDBService as well.

Alright. That makes sense, I am going to put one in for KineticaVectorDBService as well.

@am-kinetica
Copy link
Author

am-kinetica commented Nov 26, 2024

@efajardo-nv
@bsuryadevara

Could you please provide me with some sample JSON file which would work as an input to the milvus_vector_db_service using write_to_vector_db stage ? Since I am not finding any such example setting up the right input to a pipeline that I am trying to build using Milvus is becoming a challenge.

@am-kinetica
Copy link
Author

am-kinetica commented Nov 26, 2024

@efajardo-nv
@bsuryadevara

I have tried creating a sample input from the zilliz interface that looks like:

(morpheus) root@300-303-u28-vm04-v100:/workspace/examples/sample_milvus_pipeline# cat test.json
{
  "collectionName": "test_collection",
  "data": [
    {
      "id": 81,
      "metadata": "vozltxssn7l",
      "vector": [
        0.2659727795719654,
        0.8355436908247349,
        0.18610434690032607
      ]
    },
    {
      "id": 82,
      "metadata": "vozltxssn7l",
      "vector": [
        0.2659727795719654,
        0.8355436908247349,
        0.18610434690032607
      ]
    }
  ]
}

This works perfectly on zilliz but in the pipeline throws an error saying Unable to upload dataframe entries to vector database: 'id'.

Any help would be much appreciated.

Image

@efajardo-nv
Copy link
Contributor

efajardo-nv commented Dec 6, 2024

Could you please provide me with some sample JSON file which would work as an input to the milvus_vector_db_service using write_to_vector_db stage ? Since I am not finding any such example setting up the right input to a pipeline that I am trying to build using Milvus is becoming a challenge.

@am-kinetica This test is an example of a pipeline using WriteToVectorDBStage and MilvusVectorDBService:
https://github.com/nv-morpheus/Morpheus/blob/branch-25.02/tests/morpheus_llm/stages/test_milvus_write_to_vector_db_stage_pipe.py#L53-L130

@am-kinetica
Copy link
Author

Could you please provide me with some sample JSON file which would work as an input to the milvus_vector_db_service using write_to_vector_db stage ? Since I am not finding any such example setting up the right input to a pipeline that I am trying to build using Milvus is becoming a challenge.

@am-kinetica This test is an example of a pipeline using WriteToVectorDBStage and MilvusVectorDBService: https://github.com/nv-morpheus/Morpheus/blob/branch-25.02/tests/morpheus_llm/stages/test_milvus_write_to_vector_db_stage_pipe.py#L53-L130

Thanks

@am-kinetica
Copy link
Author

@efajardo-nv

I am getting the following error while building the Morpheus release container.

#29 51.84 + cmake -S . -B build -GNinja -DCMAKE_MESSAGE_CONTEXT_SHOW=ON -DMORPHEUS_USE_CLANG_TIDY=OFF -DMORPHEUS_PYTHON_INPLACE_BUILD=ON -DMORPHEUS_PYTHON_PERFORM_INSTALL=ON -DMORPHEUS_USE_CCACHE=ON -DMORPHEUS_USE_CONDA=ON -DMORPHEUS_SUPPORT_DOCA=OFF -DMORPHEUS_BUILD_MORPHEUS_CORE=ON -DMORPHEUS_BUILD_MORPHEUS_LLM=ON -DMORPHEUS_BUILD_MORPHEUS_DFP=ON -DCMAKE_AR=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/opt/conda/envs/morpheus/bin/x86_64-conda-linux-gnu-strip -DMORPHEUS_BUILD_DOCS=ON -DMORPHEUS_PYTHON_BUILD_STUBS=OFF -DMORPHEUS_CUDA_ARCHITECTURES=RAPIDS
#29 51.87 CMake Error at CMakeLists.txt:76 (include):
#29 51.87   include could not find requested file:
#29 51.87 
#29 51.87     morpheus_utils/load
#29 51.87 
#29 51.87 
#29 51.87 CMake Error at CMakeLists.txt:78 (morpheus_utils_initialize_package_manager):
#29 51.87   Unknown CMake command "morpheus_utils_initialize_package_manager".
#29 51.87 
#29 51.87 
#29 51.87 -- Configuring incomplete, errors occurred!
#29 ERROR: process "/bin/bash -c source activate morpheus &&    CONDA_ALWAYS_YES=true /opt/conda/bin/mamba install -n morpheus         -c local         -c conda-forge         -c huggingface         -c rapidsai         -c rapidsai-nightly         -c nvidia         -c nvidia/label/dev         -c pytorch         -c defaults         morpheus &&     cd ${MORPHEUS_ROOT_HOST} &&    CMAKE_CONFIGURE_EXTRA_ARGS=\"-DMORPHEUS_BUILD_DOCS=ON -DMORPHEUS_PYTHON_BUILD_STUBS=OFF -DMORPHEUS_CUDA_ARCHITECTURES=RAPIDS\"         ./scripts/compile.sh --target morpheus_docs" did not complete successfully: exit code: 1

#28 [runtime_conda_create 2/2] RUN --mount=type=bind,from=conda_bld_morpheus,source=/opt/conda/conda-bld,target=/opt/conda/conda-bld     --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked     python -m pip uninstall -y pip &&     source activate morpheus &&    CONDA_ALWAYS_YES=true /opt/conda/bin/mamba install -n morpheus         -c local         -c conda-forge         -c huggingface         -c rapidsai         -c rapidsai-nightly         -c nvidia         -c nvidia/label/dev         -c pytorch         -c defaults         morpheus &&     /opt/conda/bin/conda env update --solver=libmamba -n morpheus --file         conda/environments/runtime_cuda-125_arch-x86_64.yaml
#28 52.36 runc run failed: container process is already dead
#28 CANCELED
------
 > [build_docs 2/2] RUN --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked     --mount=type=bind,from=conda_bld_morpheus,source=/opt/conda/conda-bld,target=/opt/conda/conda-bld     --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked     source activate morpheus &&    CONDA_ALWAYS_YES=true /opt/conda/bin/mamba install -n morpheus         -c local         -c conda-forge         -c huggingface         -c rapidsai         -c rapidsai-nightly         -c nvidia         -c nvidia/label/dev         -c pytorch         -c defaults         morpheus &&     cd . &&    CMAKE_CONFIGURE_EXTRA_ARGS="-DMORPHEUS_BUILD_DOCS=ON -DMORPHEUS_PYTHON_BUILD_STUBS=OFF -DMORPHEUS_CUDA_ARCHITECTURES=RAPIDS"         ./scripts/compile.sh --target morpheus_docs:
51.87   include could not find requested file:
51.87 
51.87     morpheus_utils/load
51.87 
51.87 
51.87 CMake Error at CMakeLists.txt:78 (morpheus_utils_initialize_package_manager):
51.87   Unknown CMake command "morpheus_utils_initialize_package_manager".
51.87 
51.87 
51.87 -- Configuring incomplete, errors occurred!
------
Dockerfile:285
--------------------
 284 |     
 285 | >>> RUN --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked \
 286 | >>>     --mount=type=bind,from=conda_bld_morpheus,source=/opt/conda/conda-bld,target=/opt/conda/conda-bld \
 287 | >>>     --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked \
 288 | >>>     source activate morpheus &&\
 289 | >>>     CONDA_ALWAYS_YES=true /opt/conda/bin/mamba install -n morpheus \
 290 | >>>         -c local \
 291 | >>>         -c conda-forge \
 292 | >>>         -c huggingface \
 293 | >>>         -c rapidsai \
 294 | >>>         -c rapidsai-nightly \
 295 | >>>         -c nvidia \
 296 | >>>         -c nvidia/label/dev \
 297 | >>>         -c pytorch \
 298 | >>>         -c defaults \
 299 | >>>         morpheus && \
 300 | >>>     # Change to the morpheus directory and build the docs
 301 | >>>     cd ${MORPHEUS_ROOT_HOST} &&\
 302 | >>>     CMAKE_CONFIGURE_EXTRA_ARGS="-DMORPHEUS_BUILD_DOCS=ON -DMORPHEUS_PYTHON_BUILD_STUBS=OFF -DMORPHEUS_CUDA_ARCHITECTURES=RAPIDS"\
 303 | >>>          ./scripts/compile.sh --target morpheus_docs
 304 |     
--------------------
ERROR: failed to solve: process "/bin/bash -c source activate morpheus &&    CONDA_ALWAYS_YES=true /opt/conda/bin/mamba install -n morpheus         -c local         -c conda-forge         -c huggingface         -c rapidsai         -c rapidsai-nightly         -c nvidia         -c nvidia/label/dev         -c pytorch         -c defaults         morpheus &&     cd ${MORPHEUS_ROOT_HOST} &&    CMAKE_CONFIGURE_EXTRA_ARGS=\"-DMORPHEUS_BUILD_DOCS=ON -DMORPHEUS_PYTHON_BUILD_STUBS=OFF -DMORPHEUS_CUDA_ARCHITECTURES=RAPIDS\"         ./scripts/compile.sh --target morpheus_docs" did not complete successfully: exit code: 1

Any suggestions on what to do ?

@efajardo-nv
Copy link
Contributor

@am-kinetica You can try running the following and then building again:

git submodule update --init --recursive

Alternatively, our latest pre-built release container was just published last week. You can pull it from here:
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/containers/morpheus/tags

If you're developing new features for 25.02, you might want to use a development container or conda environment instead. You can find details here:
https://github.com/nv-morpheus/Morpheus/blob/branch-25.02/docs/source/developer_guide/contributing.md

@am-kinetica
Copy link
Author

@efajardo-nv

I am having some confusion around this code in write_to_vector_db.py.

def preprocess_vdb_resources(service, recreate: bool, resource_schemas: dict):
    for resource_name, resource_schema_config in resource_schemas.items():
        has_object = service.has_store_object(name=resource_name)

        if (recreate and has_object):
            # Delete the existing resource
            service.drop(name=resource_name)
            has_object = False

        # Ensure that the resource exists
        if (not has_object):
            # TODO(Devin)
            import pymilvus
            schema_fields = []
            for field_data in resource_schema_config["schema_conf"]["schema_fields"]:
                if "dtype" in field_data:
                    field_data["dtype"] = DATA_TYPE_MAP.get(field_data["dtype"])
                    field_schema = pymilvus.FieldSchema(**field_data)
                    schema_fields.append(field_schema.to_dict())
                else:
                    schema_fields.append(field_data)

            resource_schema_config["schema_conf"]["schema_fields"] = schema_fields
            # function that we need to call first to turn resource_kwargs into a milvus config spec.

            service.create(name=resource_name, **resource_schema_config)

While creating the resource why is it being specific to using pymilvus ? Does it mean that this was designed to support only milvus and nothing else ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external This issue was filed by someone outside of the Morpheus team feature request New feature or request
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

2 participants