Add script to generate roadmap from GitHub issues (#780)

Add a Python script to generate a roadmap document from GitHub project board issues. * Add `scripts/generate_roadmap.py` to fetch issues from the GitHub project board, extract the required information, and generate `content/roadmap.md`. * Update `README.md` with instructions on how to run the script. * Regenerate the Roadmap Resolves #779 --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/jaegertracing/documentation/pull/780?shareId=1ee122f8-54db-4ea2-9edd-5c0af96dc246). Signed-off-by: Yuri Shkuro <[email protected]>
jaegertracing · Nov 15, 2024 · 1c9fbb9 · 1c9fbb9
1 parent 0fc07b0
commit 1c9fbb9
Show file tree

Hide file tree

Showing 3 changed files with 177 additions and 43 deletions.
diff --git a/README.md b/README.md
@@ -77,6 +77,16 @@ You can check internal links by running `make check-internal-links` and all link
 
 When new pages are added to the documentation, please add a corresponding entry to [themes/jaeger-docs/layouts/index.redirects](./themes/jaeger-docs/layouts/index.redirects).
 
+## Generating Roadmap page
+
+To generate the `content/roadmap.md` document, run the script
+
+```bash
+python3 scripts/generate_roadmap.py
+```
+
+This script fetches issues from the [GitHub project board](https://github.com/orgs/jaegertracing/projects/4/views/1?layout=table), extracts the required information, and generates the roadmap document. Make sure to set the `GITHUB_TOKEN` environment variable with your GitHub API token before running the script, or save the token in `~/.github_token` file (protect the file so only you can read it: `chmod 0600 <file>`). Personal tokens can be created at https://github.com/settings/tokens/.
+
 ## License
 
 [Apache 2.0 License](./LICENSE).

diff --git a/content/roadmap.md b/content/roadmap.md
@@ -2,67 +2,63 @@
 title: Roadmap
 ---
 
-The following is only a selection of some of the major features we plan to implement, some of which are near term and some are longer term. We have tried to put these in rough priority as well as having a wishlist at the end. To get a more complete overview of planned features and current work, see the issue trackers for the various repositories, for example, the [Jaeger backend](https://github.com/jaegertracing/jaeger/issues/).
+The following is a summary of the major features we plan to implement.
+For more details, see the [Roadmap on GitHub](https://github.com/orgs/jaegertracing/projects/4/views/1?layout=table).
 
-## Support for ClickHouse as a native datasource
+## Implement Storage API v2 that operates on OTLP batches
 
-Backend storage support for [Clickhouse](https://github.com/ClickHouse/ClickHouse) which is an open-source column-oriented database for OLAP use cases. It is highly efficient and performant for high volumes of ingestion and search making it a good database for tracing and logging data specifically. It can also do aggregates very quickly which will come in handy for several features in Jaeger. [[[Feature]]: ClickHouse as a core storage backend](https://github.com/jaegertracing/jaeger/issues/4196)
+In OTEL-based Jaeger-v2 we don't want to force OTLP trace data to go through unnecessary transformation into Jaeger's current data model. We want to define a V2 version of the storage API so that the OTLP data can be passed through the receiver-exporter pipeline without additional conversions, for better efficiency.
 
-## Integration with OpenTelemetry collector
+For more information see the [issue description](https://github.com/jaegertracing/jaeger/issues/5079).
 
-[OpenTelemetry collector](https://opentelemetry.io/docs/collector/getting-started/) is a vendor-agnostic service for receiving, processing and exporting telemetry data. We have decided to rebuild the Jaeger backed components (agent, collector, ingester, all-in-one) on top of OpenTelemetry collector which has several benefits:
+## Helm Chart to support Jaeger-v2 
 
-* automatic compatibility with OpenTelemetry SDKs
-* forward compatibility with OpenTelemetry native data model
-* tail-based sampling
-* attribute processors
-* leverage a larger community
+Develop a comprehensive Helm chart for Jaeger v2 that allows for easy deployment and management of Jaeger v2 components in Kubernetes environments. This chart should provide flexibility in configuration, support various deployment scenarios, and integrate well with the new architecture of Jaeger v2.
 
-More can be found in the blog post [Jaeger embraces OpenTelemetry collector](https://medium.com/jaegertracing/jaeger-embraces-opentelemetry-collector-90a545cbc24), and the earlier post [Jaeger and OpenTelemetry](https://medium.com/jaegertracing/jaeger-and-opentelemetry-1846f701d9f2) that laid out the project strategy. This work will occur after the Collector and associated APIs are more stable, towards the end of 2021.
+For more information see the [issue description](https://github.com/jaegertracing/helm-charts/issues/610).
 
-The current progress can be tracked via [issues tagged as `area/otel`](https://github.com/jaegertracing/jaeger/issues?q=is%3Aissue+is%3Aopen+label%3Aarea%2Fotel).
+## Kubernetes Operator to support Jaeger-v2 
 
-# Wish List or Longer Term Goals
-## Data Pipeline
+Develop a new operator for [Jaeger-v2](https://github.com/jaegertracing/jaeger/issues/4843) that achieves feature parity with the v1 operator while introducing improvements and new capabilities. This new operator will leverage the [OpenTelemetry operator](https://github.com/open-telemetry/opentelemetry-operator) for Jaeger-v2 deployment while maintaining and enhancing the storage management features from the v1 operator.
 
-Post-collection data pipeline for trace aggregation and data mining based on Apache Flink. Some of this work has been done and can be found in [jaeger-analytics-flink/](https://github.com/jaegertracing/jaeger-analytics-flink)
+For more information see the [issue description](https://github.com/jaegertracing/jaeger-operator/issues/2717).
 
-## AI/ML platform for Jaeger
+## ClickHouse as a core storage backend
+
+Build first-class support for [ClickHouse ](https://github.com/ClickHouse/ClickHouse) as an official Jaeger backend. ClickHouse is an open-source column-oriented database for OLAP use cases. It is highly efficient and performant for high volumes of ingestion and search making it a good database for tracing and logging data specifically. It can also do aggregates very quickly which will come in handy for several features in Jaeger. 
+
+Benefits to the users:
+
+* Efficient backend
+* Powerful search
+* Analytics capability, e.g. the possibility to support the APM function (Monitoring tab in Jaeger) directly from ClickHouse
 
-* Community/SIG for doing ML/AI with tracing/telemetry data.
-* ML/AI integration with Jaeger to make it easy for data scientists write and evaluate models
-  (e.g Jupyter notebooks).
-* A registry of models/post-processing pipelines which derive useful information out of tracing data.
+For more information see the [issue description](https://github.com/jaegertracing/jaeger/issues/4196).
 
-See issue tracker for more info: [jaeger/issues/1639](https://github.com/jaegertracing/jaeger/issues/1639).
+## Renovate Streaming Support
 
-## Trace Quality Metrics
+Bring streaming analytics support directly into Jaeger backend, instead of requiring separate Spark/Flink data pipelines.
+
+For more information see the [issue description](https://github.com/jaegertracing/jaeger/issues/5910).
+
+## AI/ML platform for Jaeger
 
-When deploying a distributed tracing solution like Jaeger in large organizations
-that utilize many different technologies and programming languages,
-there are always questions about how much of the architecture is integrated
-with tracing, what is the quality of the instrumentation, are there microservices
-that are using stale versions of instrumentation libraries, etc.
+At the moment doing ML/AI analysis with Jaeger is hard. There is no direct integration with ML/AI platforms and we do not have much knowledge on what models we could build.
 
-Trace Quality engine ([jaeger/issues/367](https://github.com/jaegertracing/jaeger/issues/367))
-runs analysis on all traces collected in the backend, inspects them for known completeness
-and quality problems, and provides summary reports to service owners with suggestions on
-improving the quality metrics and links to sample traces that exhibit the issues.
+* Create Community/SIG for doing ML/AI with tracing/telemetry data.
+* Build ML/AI integration with Jaeger to make it easy for data scientists write and evaluate models (e.g Jupyter notebooks).
+* Create a registry of models/post-processing pipelines which derive useful information out of tracing data.
 
-## Dynamic Configuration
+For more information see the [issue description](https://github.com/jaegertracing/jaeger/issues/1639).
 
-We need a dynamic configuration solution ([jaeger/issues/355](https://github.com/jaegertracing/jaeger/issues/355))
-that comes in handy in various scenarios:
+## Dynamic configuration support
 
-  * Blacklisting services,
-  * Overriding sampling probabilities,
-  * Controlling server-side downsampling rate,
-  * Black/whitelisting services for adaptive sampling,
+We need a dynamic configuration solution that comes in handy in various scenarios:
+  * blacklisting services
+  * overriding sampling probabilities
+  * controlling server-side downsampling rate
+  * black/whitelisting services for adaptive sampling
   * etc.
 
-## Ideation
+For more information see the [issue description](https://github.com/jaegertracing/jaeger/issues/355).
 
-* Multi-Tenancy ([mailgroup thread](https://groups.google.com/forum/#!topic/jaeger-tracing/PcxftflO4_o))
-* Cloud and Multi-DC strategy
-* Flagging of anomalous traces
-* Alerting capabilities to complement operational use cases
diff --git a/scripts/generate_roadmap.py b/scripts/generate_roadmap.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+
+# Copyright (c) 2024 The Jaeger Authors.
+# SPDX-License-Identifier: Apache-2.0
+
+# This script generates the roadmap.md file from the issues in the "Roadmap" GitHub Project.
+
+import json
+import logging
+import os
+import urllib.request
+
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# GitHub API token
+GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
+if not GITHUB_TOKEN:
+    try:
+        with open(os.path.expanduser("~/.github_token"), "r") as token_file:
+            GITHUB_TOKEN = token_file.read().strip()
+    except FileNotFoundError:
+        logger.error(
+            "GITHUB_TOKEN environment variable not set and ~/.github_token file not found"
+        )
+        exit(1)
+
+QUERY = """
+{
+  organization(login: "jaegertracing") {
+    projectV2(number: 4) {
+      id
+      title
+      items(first: 100) {
+        nodes {
+          id
+          type
+          content {
+            ... on Issue {
+              title
+              state
+              url
+              body
+            }
+          }
+        }
+      }
+    }
+  }
+}
+"""
+
+
+def fetch_issues():
+    url = "https://api.github.com/graphql"
+    headers = {
+        "Authorization": f"Bearer {GITHUB_TOKEN}",
+        "Content-Type": "application/json",
+    }
+    data = json.dumps({"query": QUERY}).encode("utf-8")
+    req = urllib.request.Request(url, data=data, headers=headers)
+    with urllib.request.urlopen(req) as response:
+        result = json.loads(response.read().decode("utf-8"))
+        issues = result["data"]["organization"]["projectV2"]["items"]["nodes"]
+        return [
+            {
+                "title": issue["content"]["title"],
+                "state": issue["content"]["state"],
+                "url": issue["content"]["url"],
+                "body": issue["content"]["body"],
+            }
+            for issue in issues
+            if issue["type"] == "ISSUE"
+        ]
+
+
+def extract_summary(body):
+    summary_index = body.find("## Summary")
+    if summary_index == -1:
+        logger.info("summary not found")
+        return None
+    summary_start = summary_index + len("## Summary")
+    next_section_index = body.find("##", summary_start)
+    if next_section_index == -1:
+        return body[summary_start:].strip()
+    return body[summary_start:next_section_index].strip()
+
+
+def generate_roadmap(issues):
+    roadmap_content = "---\n"
+    roadmap_content += "title: Roadmap\n"
+    roadmap_content += "---\n\n"
+    roadmap_content += (
+        "The following is a summary of the major features we plan to implement.\n"
+    )
+    roadmap_content += "For more details, see the [Roadmap on GitHub](https://github.com/orgs/jaegertracing/projects/4/views/1?layout=table).\n\n"
+    for issue in issues:
+        logger.info(issue["title"])
+        roadmap_content += f"## {issue['title']}\n\n"
+        summary = extract_summary(issue["body"])
+        if summary:
+            roadmap_content += f"{summary}\n\n"
+        else:
+            roadmap_content += f"{issue['body']}\n\n"
+        roadmap_content += (
+            f"For more information see the [issue description]({issue['url']}).\n\n"
+        )
+    return roadmap_content
+
+
+def save_roadmap(content):
+    with open("content/roadmap.md", "w") as f:
+        f.write(content)
+
+
+def main():
+    try:
+        issues = fetch_issues()
+        roadmap_content = generate_roadmap(issues)
+        save_roadmap(roadmap_content)
+        logger.info("Roadmap generated successfully")
+    except Exception as e:
+        logger.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    main()