Skip to content

Commit

Permalink
Merge branch 'scribe-org:main' into docker
Browse files Browse the repository at this point in the history
  • Loading branch information
mhmohona authored Nov 1, 2024
2 parents ea15720 + 0068350 commit dabe56a
Show file tree
Hide file tree
Showing 491 changed files with 13,636 additions and 7,448 deletions.
32 changes: 32 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: 📝 Documentation
description: Suggest improvements or updates to the documentation of Scribe-Data.
labels: ["documentation"]
projects: ["scribe-org/1"]
body:
- type: checkboxes
id: doc-enhancement
attributes:
label: Terms
options:
- label: I have searched all [open documentation issues](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3Adocumentation)
required: true
- label: I agree to follow Scribe-Data's [Code of Conduct](https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md)
required: true
- type: textarea
attributes:
label: Current Documentation
placeholder: |
Provide a brief description or link to the current documentation you want to enhance.
validations:
required: true
- type: textarea
attributes:
label: Suggested Enhancement
placeholder: |
Describe the improvements or changes you'd like to see in the documentation.
validations:
required: true
- type: markdown
attributes:
value: |
Thanks for helping improve our documentation!
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Thank you for your pull request! 🚀
<!-- Please replace the empty checkboxes [] below with checked ones [x] accordingly. -->

- [] This pull request is on a [separate branch](https://docs.github.com/en/get-started/quickstart/github-flow) and not the main branch
- [] I have tested my code with the `pytest` command as directed in the [testing section of the contributing guide](https://github.com/scribe-org/Scribe-Data/blob/main/CONTRIBUTING.md#testing)

---

Expand Down
44 changes: 44 additions & 0 deletions .github/workflows/check_project_metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Check Project Metadata
on:
push:
branches: [main]
pull_request:
branches: [main]
types: [opened, reopened, synchronize]

jobs:
structure-check:
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
python-version:
- "3.9"

runs-on: ${{ matrix.os }}

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run check_project_metadata.py
working-directory: ./src/scribe_data/check
run: python check_project_metadata.py

- name: Post-run status
if: failure()
run: echo "Project metadata check failed. Please fix the reported errors."
44 changes: 44 additions & 0 deletions .github/workflows/check_project_structure.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Check Project Structure
on:
push:
branches: [main]
pull_request:
branches: [main]
types: [opened, reopened, synchronize]

jobs:
structure-check:
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
python-version:
- "3.9"

runs-on: ${{ matrix.os }}

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run check_project_structure.py
working-directory: ./src/scribe_data/check
run: python check_project_structure.py

- name: Post-run status
if: failure()
run: echo "Project structure check failed. Please fix the reported errors."
46 changes: 46 additions & 0 deletions .github/workflows/check_query_forms.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Check Query Forms
on:
push:
branches: [main]
pull_request:
branches: [main]
types: [opened, reopened, synchronize]

jobs:
format_check:
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
python-version:
- "3.9"

runs-on: ${{ matrix.os }}

name: Run Check Query Forms

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run check_query_forms.py
working-directory: ./src/scribe_data/check
run: python check_query_forms.py

- name: Post-run status
if: failure()
run: echo "Project SPARQL query forms check failed. Please fix the reported errors."
46 changes: 46 additions & 0 deletions .github/workflows/check_query_identifiers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Check Query Identifiers
on:
push:
branches: [main]
pull_request:
branches: [main]
types: [opened, reopened, synchronize]

jobs:
format_check:
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
python-version:
- "3.9"

runs-on: ${{ matrix.os }}

name: Run Check Query Identifiers

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run check_query_identifiers.py
working-directory: ./src/scribe_data/check
run: python check_query_identifiers.py

- name: Post-run status
if: failure()
run: echo "Project SPARQL queries check failed. Please fix the reported errors."
2 changes: 1 addition & 1 deletion .github/workflows/pr_ci.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: pr_ci
name: CI
on:
push:
branches: [main]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr_maintainer_checklist.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: pr_maintainer_checklist
name: PR Maintainer Checklist
on:
pull_request_target:
branches:
Expand Down
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/).

- Scribe-Data is now a fully functional CLI.
- Querying Wikidata lexicographical data can be done via the `--query` command ([#159](https://github.com/scribe-org/Scribe-Data/issues/159)).
- The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
- Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
- The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
- Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
- Total Wikidata lexemes for languages and data types can be derived with the `--total` command ([#147](https://github.com/scribe-org/Scribe-Data/issues/147)).
- Commands can be used via an interactive mode with the `--interactive` command ([#158](https://github.com/scribe-org/Scribe-Data/issues/158)).
- Articles are removed from machine translations so they're more directly useful in Scribe applications ([#96](https://github.com/scribe-org/Scribe-Data/issues/96)).
Expand Down
17 changes: 17 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ If you have questions or would like to communicate with the team, please [join u
- [First steps as a contributor](#first-steps)
- [Learning the tech stack](#learning-the-tech)
- [Development environment](#dev-env)
- [Testing](#testing)
- [Issues and projects](#issues-projects)
- [Bug reports](#bug-reports)
- [Feature requests](#feature-requests)
Expand Down Expand Up @@ -162,9 +163,25 @@ pip install . # or pip install scribe-data
python setup.py egg_info
```

Note that you may need to run this command every time you make any change to the code to have them be reflected in the development Scribe-Data:

```bash
pip install -e .
```

> [!NOTE]
> Feel free to contact the team in the [Data room on Matrix](https://matrix.to/#/#ScribeData:matrix.org) if you're having problems getting your environment setup!
<a id="testing"></a>

## Testing [``](#contents)

In addition to the [pre-commit](https://pre-commit.com/) hooks that are set up during the [development environment section](#dev-env), Scribe-Data also includes a testing suite that should be ran before all pull requests and subsequent commits. Please run the following in the project root:

```bash
pytest
```

<a id="issues-projects"></a>

## Issues and projects [``](#contents)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Check out Scribe's [architecture diagrams](https://github.com/scribe-org/Organiz

The CLI commands defined within [scribe_data/cli](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/cli) and the notebooks within the various [scribe_data](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data) directories are used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) once they're active.

The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.
The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.

<a id="cli-usage"></a>

Expand Down Expand Up @@ -197,7 +197,7 @@ See the [contribution guidelines](https://github.com/scribe-org/Scribe-Data/blob

# Supported Languages [``](#contents)

Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).
Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).

The following table shows the supported languages and the amount of data available for each on [Wikidata](https://www.wikidata.org/) and via [Unicode CLDR](https://github.com/unicode-org/cldr) for emojis:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/_static/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Contents
- `First steps as a contributor <#first-steps-as-a-contributor>`__
- `Learning the tech stack <#learning-the-tech-stack>`__
- `Development environment <#development-environment>`__
- `Issues and projects <#issues-projects>`__
- `Issues and projects <#issues-and-projects>`__
- `Bug reports <#bug-reports>`__
- `Feature requests <#feature-requests>`__
- `Pull requests <#pull-requests>`__
Expand Down
11 changes: 6 additions & 5 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
import sphinx_rtd_theme


sys.path.insert(0, os.path.abspath("../../src"))

Expand All @@ -36,7 +35,7 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"m2r2",
# "m2r2",
"sphinx.ext.autodoc",
"numpydoc",
"sphinx.ext.viewcode",
Expand All @@ -63,7 +62,7 @@
"pytest-cov",
"ruff",
"SPARQLWrapper",
"tqdm"
"tqdm",
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -80,6 +79,7 @@
# source_suffix = ['.rst', '.md']
source_suffix = ".rst"


# The master toctree document.
master_doc = "index"

Expand All @@ -91,7 +91,8 @@

html_theme = "sphinx_rtd_theme"

html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# html_theme_path = [sphinx_rtd_theme]
# html_theme_path = []

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down
4 changes: 2 additions & 2 deletions docs/source/notes.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. mdinclude:: _static/CONTRIBUTING.rst
.. include:: _static/CONTRIBUTING.rst

License
=======

.. literalinclude:: ../../LICENSE.txt
:language: text

.. mdinclude:: ../../CHANGELOG.md
.. include:: ../../CHANGELOG.md
Loading

0 comments on commit dabe56a

Please sign in to comment.