IBM · touma-I · Nov 27, 2024 · Nov 11, 2024 · Nov 13, 2024 · Nov 13, 2024
diff --git a/transforms/language/doc_quality/python/README.md b/transforms/language/doc_quality/python/README.md
@@ -1,13 +1,25 @@
 # Document Quality Transform 
+
 Please see the set of
 [transform project conventions](../../../README.md#transform-project-conventions)
 for details on general project conventions, transform configuration,
 testing and IDE set up.
 
-## Summary 
-This transform will calculate and annotate several metrics related to document, which are usuful to see the quality of document. 
+## Contributors
+
+- Daiki Tsuzuku ([email protected])
+
+## Description 
+This transform will calculate and annotate several metrics which are useful to assess the quality of the document.
+The document quality transform operates on text documents only
+
+### Input 
 
-In this transform, following metrics will be included:
+| input column name | data type | description |
+|-|-|-|
+| the one specified in _doc_content_column_ configuration | string | text whose quality will be calculated by this transform |
+
+### Output columns annotated by this transform
 
 | output column name | data type | description | supported language |
 |-|-|-|-|
@@ -27,7 +39,7 @@ In this transform, following metrics will be included:
 
 You can see more detailed backgrounds of some columns in [Deepmind's Gopher paper](https://arxiv.org/pdf/2112.11446.pdf)
 
-## Configuration and command line Options
+## Configuration
 
 The set of dictionary keys holding [DocQualityTransform](src/doc_quality_transform.py) 
 configuration for values are as follows:
@@ -36,13 +48,19 @@ configuration for values are as follows:
 * _doc_content_column_ - specifies column name that contains document text. By default, "contents" is used.
 * _bad_word_filepath_ - specifies a path to bad word file: local folder (file or directory) that points to bad word file. You don't have to set this parameter if you don't need to set bad words.
 
-## Running
+Example
+```
+{
+    text_lang_key: "en",
+    doc_content_column_key: "contents",
+    bad_word_filepath_key: os.path.join(basedir, "ldnoobw", "en"),
+}
+```
+
+## Usage
 
 ### Launched Command Line Options 
-When running the transform with the Ray launcher (i.e. TransformLauncher),
-the following command line arguments are available in addition to 
-the options provided by 
-the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).
+The following command line arguments are available
 ```
   --docq_text_lang DOCQ_TEXT_LANG   language used in the text content. By default, "en" is used.
   --docq_doc_content_column DOCQ_DOC_CONTENT_COLUMN   column name that contain document text. By default, "contents" is used.
@@ -70,14 +88,37 @@ ls output
 ```
 To see results of the transform.
 
+### Code example
+
+TBD (link to the notebook will be provided)
 
 ### Transforming data using the transform image
 
 To use the transform image to transform your data, please refer to the 
 [running images quickstart](../../../../doc/quick-start/run-transform-image.md),
 substituting the name of this transform image and runtime as appropriate.
 
+## Testing
+
+Following [the testing strategy of data-processing-lib](../../../../data-processing-lib/doc/transform-testing.md)
+
+Currently we have:
+- [Unit test](test/test_doc_quality_python.py)
+- [Integration test](test/test_doc_quality.py)
+
+
+## Further Resource
+
+- For those who want to learn C4 heuristic rules
+  - https://arxiv.org/pdf/1910.10683.pdf
+- For those who want to learn Gopher statistics
+  - https://arxiv.org/pdf/2112.11446.pdf
+- For those who want to see the source of badwords used by default
+  - https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
+
+
+## Consideration
 
-## Troubleshooting guide
+### Troubleshooting guide
 
 For M1 Mac user, if you see following error during make command, `error: command '/usr/bin/clang' failed with exit code 1`, you may better follow [this step](https://freeman.vc/notes/installing-fasttext-on-an-m1-mac)