Skip to content

Commit

Permalink
feat: support i18n
Browse files Browse the repository at this point in the history
  • Loading branch information
zhanshuyou committed Aug 26, 2024
1 parent b1d3982 commit 46499c3
Show file tree
Hide file tree
Showing 19 changed files with 156 additions and 99 deletions.
57 changes: 57 additions & 0 deletions .github/workflows/ci-i18n.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: ci-i18n

on:
push:
branches:
- feat/v2.4.x-i18n
# paths:
# - "site/zh/**"
release:
types: [released]
workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build "
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
- name: Check out Git repository
uses: actions/checkout@v1
with:
ref: feat/v2.4.x-i18n

- name: Extract branch name
shell: bash
run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})"
id: extract_branch

- name: Md2md
run: |
cp site/en/Variables.json ./
mv site doc_from
sudo npm install @zilliz/mdtomd -g
goover
rm -rf doc_from/*
rm check-link.js
mv doc_to site
- name: Delete And Push
run: |
sudo apt-get update
sudo apt-get install jq
cd ../
git clone -b feat/localization https://.:${{ secrets.P_GITHUB_TOKEN }}@github.com/milvus-io/web-content.git target
git config --global user.email "[email protected]"
git config --global user.name "Milvus-doc-bot"
cp ./milvus-docs/version.json ./target
cd target
rm -rf `cat version.json | jq -r .version`
mkdir `cat version.json | jq -r .version`
cp -avr ../milvus-docs/** ./`cat version.json | jq -r .version`
git add .
git commit -m "Release new docs "
git push -f origin feat/localization
2 changes: 1 addition & 1 deletion site/en/about/comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Although both serve similar functions as vector databases, the domain-specific t
| Deployment Modes | SaaS-only | Milvus Lite, On-prem Standalone & Cluster, Zilliz Cloud Saas & BYOC |
| Embedding Functions | Not available | Support with <a href="https://github.com/milvus-io/milvus-model">pymilvus[model]</a> |
| Data Types | String, Number, Bool, List of String | String, VarChar, Number (Int, Float, Double), Bool, Array, JSON, Float Vector, Binary Vector, BFloat16, Float16, Sparse Vector |
| Metric and Index Types | Cos, Dot, Euclidean<br>P-family, S-family | Cosine, IP (Dot), L2 (Euclidean), Hamming, Jaccard<br>FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, HNSW, SCANN, GPU Indexes |
| Metric and Index Types | Cos, Dot, Euclidean<br/>P-family, S-family | Cosine, IP (Dot), L2 (Euclidean), Hamming, Jaccard<br/>FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, HNSW, SCANN, GPU Indexes |
| Schema Design | Flexible mode | Flexible mode, Strict mode |
| Multiple Vector Fields | N/A | Multi-vector and hybrid search |
| Tools | Datasets, text utilities, spark connector | Attu, Birdwatcher, Backup, CLI, CDC, Spark and Kafka connectors |
Expand Down
32 changes: 16 additions & 16 deletions site/en/about/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,28 +22,28 @@ Welcome to the Milvus Roadmap! Join us on our continuous journey to enhance and
</thead>
<tbody>
<tr>
<td><strong>AI-developer Friendly</strong><br><i>A developer-friendly technology stack, enhanced with the latest AI innovations</i></td>
<td><strong>Multi-Vectors & Hybrid Search</strong><br><i>Framework for multiplex recall and fusion</i><br><br><strong>GPU Index Acceleration</strong><br><i>Support for higher QPS and faster index creation</i><br><br><strong>Model Library in PyMilvus</strong><br><i>Integrated embedding models for Milvus</i></td>
<td><strong>Sparse Vector (GA)</strong><br><i>Local feature extraction and keyword search</i><br><br><strong>Milvus Lite (GA)</strong><br><i>A lightweight, in-memory version of Milvus</i><br><br><strong>Embedding Models Gallery</strong><br><i>Support for image and multi-modal embeddings and reranker models in model libraries</i></td>
<td><strong>Original Data-In and Data-Out</strong><br><i>Support for Blob data types</i><br><br><strong>Data Clustering</strong><br><i>Data co-locality</i><br><br><strong>Scenario-oriented Vector Search</strong><br><i>e.g. Multi-target search & NN filtering</i><br><br><strong>Support Embedding & Reranker Endpoint</strong></td>
<td><strong>AI-developer Friendly</strong><br/><i>A developer-friendly technology stack, enhanced with the latest AI innovations</i></td>
<td><strong>Multi-Vectors & Hybrid Search</strong><br/><i>Framework for multiplex recall and fusion</i><br/><br/><strong>GPU Index Acceleration</strong><br/><i>Support for higher QPS and faster index creation</i><br/><br/><strong>Model Library in PyMilvus</strong><br/><i>Integrated embedding models for Milvus</i></td>
<td><strong>Sparse Vector (GA)</strong><br/><i>Local feature extraction and keyword search</i><br/><br/><strong>Milvus Lite (GA)</strong><br/><i>A lightweight, in-memory version of Milvus</i><br/><br/><strong>Embedding Models Gallery</strong><br/><i>Support for image and multi-modal embeddings and reranker models in model libraries</i></td>
<td><strong>Original Data-In and Data-Out</strong><br/><i>Support for Blob data types</i><br/><br/><strong>Data Clustering</strong><br/><i>Data co-locality</i><br/><br/><strong>Scenario-oriented Vector Search</strong><br/><i>e.g. Multi-target search & NN filtering</i><br/><br/><strong>Support Embedding & Reranker Endpoint</strong></td>
</tr>
<tr>
<td><strong>Rich Functionality</strong><br><i>Enhanced retrieval and data management features</i></td>
<td><strong>Support for FP16, BF16 Datatypes</strong><br><i>These ML datatypes can help reduce memory usage</i><br><br><strong>Grouping Search</strong><br><i>Aggregate split embeddings</i><br><br><strong>Fuzzy Match and Inverted Index</strong><br><i>Support for fuzzy matching and inverted indexing for scalar types like varchar and int</i></td>
<td><strong>Inverted Index for Array & JSON</strong><br><i>Indexing for array and partial support JSON</i><br><br><strong>Bitset Index</strong><br><i>Improved execution speed and future data aggregation</i><br><br><strong>Truncate Collection</strong><br><i>Allows data clearance while preserving metadata</i><br><br><strong>Support for NULL and Default Values</strong></td>
<td><strong>Support for More Datatypes</strong><br><i>e.g. Datetime, GIS</i><br><br><strong>Advanced Text Filtering</strong><br><i>e.g. Match Phrase</i><br><br><strong>Primary Key Deduplication</strong></td>
<td><strong>Rich Functionality</strong><br/><i>Enhanced retrieval and data management features</i></td>
<td><strong>Support for FP16, BF16 Datatypes</strong><br/><i>These ML datatypes can help reduce memory usage</i><br/><br/><strong>Grouping Search</strong><br/><i>Aggregate split embeddings</i><br/><br/><strong>Fuzzy Match and Inverted Index</strong><br/><i>Support for fuzzy matching and inverted indexing for scalar types like varchar and int</i></td>
<td><strong>Inverted Index for Array & JSON</strong><br/><i>Indexing for array and partial support JSON</i><br/><br/><strong>Bitset Index</strong><br/><i>Improved execution speed and future data aggregation</i><br/><br/><strong>Truncate Collection</strong><br/><i>Allows data clearance while preserving metadata</i><br/><br/><strong>Support for NULL and Default Values</strong></td>
<td><strong>Support for More Datatypes</strong><br/><i>e.g. Datetime, GIS</i><br/><br/><strong>Advanced Text Filtering</strong><br/><i>e.g. Match Phrase</i><br/><br/><strong>Primary Key Deduplication</strong></td>
</tr>
<tr>
<td><strong>Cost Efficiency & Architecture</strong><br><i>Advanced systems emphasizing stability, cost efficiency, scalability, and performance</i></td>
<td><strong>Support for More Collections/Partitions</strong><br><i>Handles over 10,000 collections in smaller clusters</i><br><br><strong>Mmap Optimization</strong><br><i>Balances reduced memory consumption with latency</i><br><br><strong>Bulk Insert Optimazation</strong><br><i>Simplifies importing large datasets</i></td>
<td><strong>Lazy Load</strong><br><i>Data is loaded on-demand through read operations</i><br><br><strong>Major Compaction</strong><br><i>Re-distributes data based on configuration to enhance read performance</i><br><br><strong>Mmap for Growing Data</strong><br><i>Mmap files for expanding data segments</i></td>
<td><strong>Memory Control</strong><br><i>Reduces out-of-memory issues and provides global memory management</i><br><br><strong>LogNode Introduction</strong><br><i>Ensures global consistency and addresses the single-point bottleneck in root coordination</i><br><br><strong>Storage Format V2</strong><br><i>Universal format design lays the groundwork for disk-based data access</i></td>
<td><strong>Cost Efficiency & Architecture</strong><br/><i>Advanced systems emphasizing stability, cost efficiency, scalability, and performance</i></td>
<td><strong>Support for More Collections/Partitions</strong><br/><i>Handles over 10,000 collections in smaller clusters</i><br/><br/><strong>Mmap Optimization</strong><br/><i>Balances reduced memory consumption with latency</i><br/><br/><strong>Bulk Insert Optimazation</strong><br/><i>Simplifies importing large datasets</i></td>
<td><strong>Lazy Load</strong><br/><i>Data is loaded on-demand through read operations</i><br/><br/><strong>Major Compaction</strong><br/><i>Re-distributes data based on configuration to enhance read performance</i><br/><br/><strong>Mmap for Growing Data</strong><br/><i>Mmap files for expanding data segments</i></td>
<td><strong>Memory Control</strong><br/><i>Reduces out-of-memory issues and provides global memory management</i><br/><br/><strong>LogNode Introduction</strong><br/><i>Ensures global consistency and addresses the single-point bottleneck in root coordination</i><br/><br/><strong>Storage Format V2</strong><br/><i>Universal format design lays the groundwork for disk-based data access</i></td>
</tr>
<tr>
<td><strong>Enterprise Ready</strong><br><i>Designed to meet the needs of enterprise production environments</i></td>
<td><strong>Milvus CDC</strong><br><i>Capability for data replication</i><br><br><strong>Accesslog Enhancement</strong><br><i>Detailed recording for audit and tracing</i></td>
<td><strong>New Resource Group</strong><br><i>Enhanced resource management</i><br><br><strong>Storage Hook</strong><br><i>Support for Bring Your Own Key (BYOK) encryption</i></td>
<td><strong>Dynamic Replica Number Adjustment</strong><br><i>Facilitates dynamic changes to the number of replicas</i><br><br><strong>Dynamic Schema Modification</strong><br><i>e.g., Add/delete fields, modify varchar lengths</i><br><br><strong>Rust and C# SDKs</strong></td>
<td><strong>Enterprise Ready</strong><br/><i>Designed to meet the needs of enterprise production environments</i></td>
<td><strong>Milvus CDC</strong><br/><i>Capability for data replication</i><br/><br/><strong>Accesslog Enhancement</strong><br/><i>Detailed recording for audit and tracing</i></td>
<td><strong>New Resource Group</strong><br/><i>Enhanced resource management</i><br/><br/><strong>Storage Hook</strong><br/><i>Support for Bring Your Own Key (BYOK) encryption</i></td>
<td><strong>Dynamic Replica Number Adjustment</strong><br/><i>Facilitates dynamic changes to the number of replicas</i><br/><br/><strong>Dynamic Schema Modification</strong><br/><i>e.g., Add/delete fields, modify varchar lengths</i><br/><br/><strong>Rust and C# SDKs</strong></td>
</tr>
</tbody>
</table>
Expand Down
2 changes: 1 addition & 1 deletion site/en/adminGuide/deploy_etcd.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Run the following command to start Milvus that uses the etcd configurations.
docker compose up
```

<div class="alert note">Configurations only take effect after Milvus starts. See <a href=https://milvus.io/docs/install_standalone-docker.md#Start-Milvus>Start Milvus</a> for more information.</div>
<div class="alert note">Configurations only take effect after Milvus starts. See <a href="https://milvus.io/docs/install_standalone-docker.md#Start-Milvus">Start Milvus</a> for more information.</div>

## Configure etcd on K8s

Expand Down
2 changes: 1 addition & 1 deletion site/en/adminGuide/deploy_pulsar.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Run the following command to start Milvus that uses the Pulsar configurations.
docker compose up
```

<div class="alert note">Configurations only take effect after Milvus starts. See <a href=https://milvus.io/docs/install_standalone-docker.md#Start-Milvus>Start Milvus</a> for more information.</div>
<div class="alert note">Configurations only take effect after Milvus starts. See <a href="https://milvus.io/docs/install_standalone-docker.md#Start-Milvus">Start Milvus</a> for more information.</div>


## Configure Pulsar with Helm
Expand Down
2 changes: 1 addition & 1 deletion site/en/adminGuide/deploy_s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Run the following command to start Milvus that uses the S3 configurations.
```shell
docker compose up
```
<div class="alert note">Configurations only take effect after Milvus starts. See <a href=https://milvus.io/docs/install_standalone-docker.md#Start-Milvus>Start Milvus</a> for more information.</div>
<div class="alert note">Configurations only take effect after Milvus starts. See <a href="https://milvus.io/docs/install_standalone-docker.md#Start-Milvus">Start Milvus</a> for more information.</div>

## Configure S3 on K8s

Expand Down
12 changes: 6 additions & 6 deletions site/en/adminGuide/rbac.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ client.update_password(
```python
client.list_users()

# output:
# output
# ['root', 'user_1']
```

Expand All @@ -67,7 +67,7 @@ client.list_users()
```python
client.describe_user(user_name='user_1')

# output:
# output
# {'user_name': 'user_1', 'roles': ()}
```

Expand All @@ -88,7 +88,7 @@ After creating a role, you can:
```python
client.list_roles()

# output:
# output
# ['admin', 'public', 'roleA']
```

Expand Down Expand Up @@ -120,7 +120,7 @@ client.describe_role(
role_name='roleA'
)

# output:
# output
# {'role': 'roleA',
# 'privileges': [{'object_type': 'User',
# 'object_name': 'user_1',
Expand Down Expand Up @@ -150,8 +150,8 @@ client.describe_user(
user_name='user_1'
)

# output:
# {'user_name': 'user_1', 'roles': ('roleA',)}
# output
# {'user_name': 'user_1', 'roles': ('roleA')}
```

## 6. Revoke privileges
Expand Down
44 changes: 22 additions & 22 deletions site/en/getstarted/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ data = [
print("Data has", len(data), "entities, each with fields: ", data[0].keys())
print("Vector dim:", len(data[0]["vector"]))
```

Dim: 768 (768,)
Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject'])
Vector dim: 768

```
Dim: 768 (768,)
Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject'])
Vector dim: 768
```

## [Alternatively] Use fake representation with random vectors
If you couldn't download the model due to network issues, as a walkaround, you can use random vectors to represent the text and still finish the example. Just note that the search result won't reflect semantic similarity as the vectors are fake ones.
Expand All @@ -130,10 +130,10 @@ data = [
print("Data has", len(data), "entities, each with fields: ", data[0].keys())
print("Vector dim:", len(data[0]["vector"]))
```

Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject'])
Vector dim: 768

```
Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject'])
Vector dim: 768
```

## Insert Data
Let's insert the data into the collection:
Expand All @@ -144,9 +144,9 @@ res = client.insert(collection_name="demo_collection", data=data)

print(res)
```

{'insert_count': 3, 'ids': [0, 1, 2], 'cost': 0}

```
{'insert_count': 3, 'ids': [0, 1, 2], 'cost': 0}
```

## Semantic Search
Now we can do semantic searches by representing the search query text as vector, and conduct vector similarity search on Milvus.
Expand All @@ -169,9 +169,9 @@ res = client.search(

print(res)
```

data: ["[{'id': 2, 'distance': 0.5859944820404053, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}}, {'id': 1, 'distance': 0.5118255615234375, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0}

```
data: ["[{'id': 2, 'distance': 0.5859944820404053, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}}, {'id': 1, 'distance': 0.5118255615234375, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0}
```

The output is a list of results, each mapping to a vector search query. Each query contains a list of results, where each result contains the entity primary key, the distance to the query vector, and the entity details with specified `output_fields`.

Expand Down Expand Up @@ -205,9 +205,9 @@ res = client.search(

print(res)
```

data: ["[{'id': 4, 'distance': 0.27030569314956665, 'entity': {'text': 'Computational synthesis with AI algorithms predicts molecular properties.', 'subject': 'biology'}}, {'id': 3, 'distance': 0.16425910592079163, 'entity': {'text': 'Machine learning has been used for drug design.', 'subject': 'biology'}}]"] , extra_info: {'cost': 0}

```
data: ["[{'id': 4, 'distance': 0.27030569314956665, 'entity': {'text': 'Computational synthesis with AI algorithms predicts molecular properties.', 'subject': 'biology'}}, {'id': 3, 'distance': 0.16425910592079163, 'entity': {'text': 'Machine learning has been used for drug design.', 'subject': 'biology'}}]"] , extra_info: {'cost': 0}
```

By default, the scalar fields are not indexed. If you need to perform metadata filtered search in large dataset, you can consider using fixed schema and also turn on the [index](https://milvus.io/docs/scalar_index.md) to improve the search performance.

Expand Down Expand Up @@ -256,10 +256,10 @@ res = client.delete(

print(res)
```

[0, 2]
[3, 4, 5]

```
[0, 2]
[3, 4, 5]
```

## Load Existing Data
Since all data of Milvus Lite is stored in a local file, you can load all data into memory even after the program terminates, by creating a `MilvusClient` with the existing file. For example, this will recover the collections from "milvus_demo.db" file and continue to write data into it.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ In addition to a single GPU device, you can also assign multiple GPU devices to
<ul>
<li>The release name should only contain letters, numbers and dashes. Dots are not allowed in the release name.</li>
<li>The default command line installs cluster version of Milvus while installing Milvus with Helm. Further setting is needed while installing Milvus standalone.</li>
<li>According to the <a href="https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25">deprecated API migration guide of Kuberenetes</a>, the <b>policy/v1beta1</b> API version of PodDisruptionBudget is not longer served as of v1.25. You are suggested to migrate manifests and API clients to use the <b>policy/v1</b> API version instead. <br>As a workaround for users who still use the <b>policy/v1beta1</b> API version of PodDisruptionBudget on Kuberenetes v1.25 and later, you can instead run the following command to install Milvus:<br>
<li>According to the <a href="https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25">deprecated API migration guide of Kuberenetes</a>, the <b>policy/v1beta1</b> API version of PodDisruptionBudget is not longer served as of v1.25. You are suggested to migrate manifests and API clients to use the <b>policy/v1</b> API version instead. <br/>As a workaround for users who still use the <b>policy/v1beta1</b> API version of PodDisruptionBudget on Kuberenetes v1.25 and later, you can instead run the following command to install Milvus:<br/>
<code>helm install my-release milvus/milvus --set pulsar.bookkeeper.pdb.usePolicy=false,pulsar.broker.pdb.usePolicy=false,pulsar.proxy.pdb.usePolicy=false,pulsar.zookeeper.pdb.usePolicy=false</code></li>
<li>See <a href="https://artifacthub.io/packages/helm/milvus/milvus">Milvus Helm Chart</a> and <a href="https://helm.sh/docs/">Helm</a> for more information.</li>
</ul>
Expand Down
Loading

0 comments on commit 46499c3

Please sign in to comment.