noobaa · Neon-White · Jan 13, 2025 · Jan 13, 2025
diff --git a/docs/bucket-replication.md b/docs/bucket-replication.md
@@ -0,0 +1,90 @@
+[NooBaa Core](../README.md) /
+
+# Bucket Replication
+Bucket replication is a NooBaa feature that allows a user to set a replication policy for all or some objects. The goal of replication policies is simple - to define a target bucket for objects to be copied to.
+
+To utilize bucket replication, we need to first decide what will be our source bucket and what will be our target bucket. The source bucket is the bucket that contains the objects that we want to replicate, and the target bucket is the bucket that will contain the replicated objects. The replication policy is set on the source bucket, and it defines the target bucket(s) and the rules for replication.
+
+In general, a replication policy is a JSON-compliant string which defines an array containing at least one rule  -
+  - Each rule is an object containing a `rule_id`, a `destination_bucket`, and an optional `filter` key that contains a `prefix` field.
+  - When a filter with prefix is provided - only objects keys that match the prefix will be replicated
+
+Behind the scenes, bucket replication esstentialy works by comparing object lists. NooBaa lists all objects on the source and target buckets, and checks which objects are missing on the target bucket. It then copies the missing objects from the source to the target bucket (while adhering to any provided rules).
+
+It is possible to accelerate replication by utilizing logs - at the time of writing this document, AWS S3 server access logging or Azure Monitor. This mechanism allows NooBaa to copy only objects that have been created or modified since the feature was turned on, while the rest replicate in the background. This allows users to get up to speed with recent objects, while the classic replication mechanism catches up with the rest.
+
+## Bucket Class Replication
+Bucket replication policies can also be applied to a bucketclasses. In those cases, the policy will automatically be 'inherited' by all bucket claims that utilize the bucketclass in the future.
+
+## Replication Policy Parameters
+As stated above, a replication policy is a JSON-compliant array of rules (examples are provided at the bottom of this section)
+  - Each rule is an object that contains the following keys:
+    - `rule_id` - a unique ID which is used to identify the rule. The rule should utilize classic alphanumeric characters (a-zA-Z0-9) and is chosen by the user. Note that is not possible to create several rules with the same ID.
+    - `destination_bucket` - which dictates the target NooBaa buckets that the objects will be copied to
+    - (Optional) `{"filter": {"prefix": <>}}` - if the user wishes to filter the objects that are replicated, the value of this field can be set to a prefix string
+    - (Optional, AWS-only, log-based optimization) `sync_deletions` - can be set to a boolean value to indicate whether deletions should be replicated (i.e. objects that were deleted on the source bucket should be deleted on the target bucket)
+    - (Optional, AWS-only, log-based optimization) `sync_versions` - can be set to a boolean value to indicate whether object versions should be replicated (i.e. if the source bucket has versioning enabled, the target bucket will also have versioning enabled, and all object versions will be synced)
+
+In addition, when the bucketclass is backed by namespacestores, each policy can be set to optimize replication by utilizing logs (configured and supplied by the user, currently only supports AWS S3 and Azure Blob):
+  - (Optional, only supported on namespace buckets) `log_replication_info` - an object that contains data related to log-based replication optimization -
+    - (Necessary on Azure) `endpoint_type` - this field can be set to an appropriate endpoint type (currently, only AZURE is supported)
+    - (Necessary on AWS) `{"logs_location": {"logs_bucket": <>}}` - this field should be set to the location of the AWS S3 server access logs
+
+## Examples
+There are two ways to apply a bucket/bucketclass replication policy:
+
+The first is with the NooBaa CLI (requires the policies to be saved as a separate JSON file and passed to the CLI) - for example:
+#### Namespace bucketclass creation via the NooBaa CLI with replication to first.bucket:
+```shell
+noobaa -n app-namespace bucketclass create namespace-bucketclass single bc --resource azure-blob-ns --replication-policy=/path/to/json-file.json
+```
+>/path/to/json-file.json is the path to a JSON file which defines the replication policy, e.g. -
+```json
+{"rules":[{ "rule_id": "rule-1", "destination_bucket": "first.bucket", "filter": {"prefix": "d"}} ]}
+```
+
+The second is by applying a YAML file containing the policy.
+It's also possible to apply a replication policy to OBCs even after their creation (although the same thing is not possible with bucketclasses).
+For OBCs, the policy needs to be provided under the `spec.additionalConfig.replicationPolicy` property. For example:
+
+#### ObjectBucketClaim creation via YAML file with replication to first.bucket:
+```yaml
+apiVersion: objectbucket.io/v1alpha1
+kind: ObjectBucketClaim
+metadata:
+  name: my-bucket-claim
+  namespace: appnamespace
+spec:
+  generateBucketName: my-bucket
+  storageClassName: noobaa.noobaa.io
+  additionalConfig:
+    replicationPolicy: '[{ "rule_id": "rule-2", "destination_bucket": "first.bucket", "filter": {"prefix": "bc"}}]'
+```
+
+For bucketclasses, the policy should be provided under `spec.replicationPolicy`- for example:
+#### Bucketclass creation via YAML file with replication to first.bucket:
+```yaml
+apiVersion: noobaa.io/v1alpha1
+kind: BucketClass
+metadata:
+  name: bc
+  namespace: app-namespace
+spec:
+  namespacePolicy:
+    type: Single
+    single: 
+      resource: azure-blob-ns
+  replicationPolicy: '[{ "rule_id": "rule-1", "destination_bucket": "first.bucket", "filter": {"prefix": "ba"}}]'
+```
+
+A few more rules for example:
+
+#### AWS replication policy with a prefix filter:
+`'{"rules":[{"rule_id":"aws-rule-1", "destination_bucket":"first.bucket", "filter": {"prefix": "a."}}]}'`
+
+#### AWS replication policy with log optimization, deletion and version sync:
+`'{"rules":[{"rule_id":"aws-rule-1", "sync_deletions": true, "sync_versions": true, "destination_bucket":"first.bucket"}], "log_replication_info": {"logs_location": {"logs_bucket": "logsarehere"}}}'`
+
+
+#### Azure replication policy with log optimization:
+`'{"rules":[{"rule_id":"azure-rule-1", "destination_bucket":"first.bucket"}], "log_replication_info": {"endpoint_type": "AZURE"}}'`
diff --git a/docs/bucket-types.md b/docs/bucket-types.md
@@ -0,0 +1,45 @@
+[NooBaa Core](../README.md) /
+
+# Bucket Types in NooBaa
+NooBaa supports two types of buckets - most commonly referred to as `namespace` and `data`. While all NooBaa buckets are accessible via the S3 API and appear the same to the user, they have different uses and implementations under the hood.
+This document will explaing the meaning of each, as well as the differences between the two.
+
+## Data Buckets
+Data buckets are the 'classic' type of bucket in NooBaa. In a classic NooBaa deployment, the product creates several resources by default - a backingstore (`noobaa-default-backing-store`), a bucketclass (`noobaa-default-bucket-class`) and a bucket (`first.bucket`) as part of the product's initialization process. When data is written to data buckets, it is first processed by NooBaa - the data is compressed, deduplicated, encrypted, and split into chunks. These chunks are then stored in the backingstore that the bucket is connected to.
+The only way to access the objects on data buckets is through NooBaa - the chunks cannot be used or deciphered without the system.
+
+## Namespace Buckets
+Namespace buckets work differently than data buckets, since they try¹ to not apply any processing on the objects that are uploaded to them, and act as more of a 'passthrough'. In cases where a non-S3 storage provider API is used, NooBaa takes care of bridging potential gaps between S3 and the provider's API - no user action is required. This also means that as long as NooBaa supports a certain provider's API, users can use the S3 API to manage it (while NooBaa takes care of the 'translation').
+
+The objects can be accessed both from within NooBaa, as well as from whatever storage providers hosts them.
+Namespace buckets also support having multiple 'read' sources - for example, a single namespace bucket can show the objects from several Azure Blob containers, AWS S3 buckets, Google Cloud Platform buckets, all at the same time.
+Another feature of namespace buckets is cache, which can be configured to keep frequently-accessed objects in the system's memory, effectively reducing access time as well as egress and access costs.
+
+_1. In some cases, object metadata (such as tags) might have to be modified in order to comply with a cloud provider's limits_
+
+## Supported Storage Providers by Bucket Type
+| Service                                           | Data Buckets | Namespace Buckets |
+|---------------------------------------------------|     :---:    |        :---:      |
+| Amazon Web Services S3 (AWS)                      | ✅          | ✅
+| S3 Compatible Services                            | ✅          | ✅
+| IBM Cloud Object Storage (COS)                    | ✅          | ✅
+| Azure Blob Storage                                | ✅          | ✅
+| Google Cloud Platform Object Storage (GCP)        | ✅          | ❌
+| Filesystem (NSFS)                                 | ❌          | ✅
+| Persistent Volume Pool (PV)                       | ✅          | ❌
+
+For specific API calls (including non-S3 calls, such as IAM), you can check [AWS API Compatibility](design/AWS_API_Compatibility.md)
+
+_Please note that the table above might be out of date, or have additional support that is not present in older versions - the best way to check if a storage provider is supported is to run the NooBaa CLI tool (that fits the NooBaa version installed on the cluster) with a command such as `noobaa namespacestore create` or `noobaa backingstore create` and see which providers are listed in the help message._
+
+## Buckets in Relation to Bucketclasses and Stores
+In order to connect NooBaa to a storage provider (regardless of whether it's a cloud provider or an on-premises storage system), a store must be created - either a [backingstore](https://github.com/noobaa/noobaa-operator/blob/master/doc/backing-store-crd.md), or a [namespacestore](https://github.com/noobaa/noobaa-operator/blob/master/doc/namespace-store-crd.md). A store signifies a connection to a storage provider - it requires credentials, and sometimes additional configuartion such as a custom endpoint (when utilizing S3 compatible services that use the S3 API but aren't AWS. For example, Google's S3-compatible Cloud Storage URI is https://storage.googleapis.com).
+The type of store chosen determines what type of buckets will be created on top of it -
+- Namespacestores are used to create namespace buckets
+- Backingstores are used to create data buckets
+
+Once a store is created, it's necessary to create a [bucketclass](https://github.com/noobaa/noobaa-operator/blob/master/doc/bucket-class-crd.md) resource in order to utilize it. Bucketclasses allow users to define bucket policies relating to data placement and replication, and are used as a middle layer between buckets and stores.
+
+After the bucketclass has been created, it's now finally possible to create an [object bucket claim](https://github.com/noobaa/noobaa-operator/blob/master/doc/obc-provisioner.md), which is then reconciled by NooBaa and creates a bucket in the system. The created bucket can then be interacted with via the S3 API. 
+
+For further reading on S3 API compatibility, see the [S3 Compatibility](s3-compatibility.md) document.
diff --git a/docs/s3-compatibility.md b/docs/s3-compatibility.md
@@ -0,0 +1,12 @@
+[NooBaa Core](../README.md) /
+
+# S3 Compatibility in NooBaa
+S3 (also known as Simple Storage Service) is an object storage service provided by Amazon. However, S3 is often colloquially used to refer to the S3 API - the RESTful interface for interaction with AWS S3. Over time, the S3 API has reached a point where many consider it the de facto standard API for object storage, and is supported by many cloud providers and storage vendors - even ones like Microsoft Azure and Google Cloud Platform, which also offer their own APIs alongside S3 compatibility.
+
+## API Compatibility
+Due to the wide adoption of the S3 API, NooBaa has been designed to be S3 compatible and adherent. NooBaa buckets and objects can be managed with most S3 clients without a need for proprietary tools or workarounds. All a user needs in order to interact with NooBaa through an S3 client is the S3 endpoint of the NooBaa system, and a set of fitting credentials.
+The endpoint can be found by checking the `routes` and `services` on a cluster, and the default admin credentials can be found in the same namespace that NooBaa was installed in, inside the `noobaa-admin` secret. 
+For further reference of supported API calls in NooBaa, you can check out [AWS API Compatibility](design/AWS_API_Compatibility.md)
+
+## Utilization of S3 Compatible Storage Services
+NooBaa can also be used as a gateway to other storage services that are S3 compatible. This means that NooBaa can be used to store and manage data in certain storage services (as long as they provide an S3 compatible API), even if they are not natively supported by the product. This is done by creating a [backingstore](https://github.com/noobaa/noobaa-operator/blob/master/doc/backing-store-crd.md) or [namespacestore](https://github.com/noobaa/noobaa-operator/blob/master/doc/namespace-store-crd.md) of type `s3-compatible` and providing the the appropriate endpoint.