-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Job Similarity Visualization Feature with New API Routes and Page #167
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 eslint
apps/registry/app/api/job-similarity/route.jsOops! Something went wrong! :( ESLint: 8.55.0 ESLint couldn't find the config "next" to extend from. Please check that the name of the config is correct. The config "next" was referenced from the config file in "/packages/eslint-config-custom/index.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team. WalkthroughThis pull request introduces a comprehensive job similarity visualization feature in the registry application. It adds new API routes for fetching job and resume similarity data, creates a new Changes
Sequence DiagramsequenceDiagram
participant Client
participant JobSimilarityAPI
participant Supabase
Client->>JobSimilarityAPI: Request job similarity data
JobSimilarityAPI->>Supabase: Query job records
Supabase-->>JobSimilarityAPI: Return job data
JobSimilarityAPI->>JobSimilarityAPI: Process and parse job embeddings
JobSimilarityAPI-->>Client: Return processed job similarity data
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
apps/registry/app/api/similarity/route.js (2)
4-5
: Consider placing the Supabase URL in an environment variable.
Exposing the full Supabase URL in code may be acceptable, but it could also create a dependency on the code level. Moving it to an environment variable would ensure better maintainability and a clear separation of configuration.
41-64
: Potential performance concern with parsing large embeddings.
When users request a large 'limit' of resumes, the array of embeddings could become quite large. Parsing and returning every embedding for each record might cause high memory usage. Consider sending minimal data or adding pagination to further limit memory overhead.apps/registry/app/api/job-similarity/route.js (2)
4-5
: Keep Supabase URL configurable.
Same as in similarity/route.js, managing URLs in environment variables could improve maintainability and security posture.
41-66
: Assess the scaling implications of returning high-volume embeddings.
As with the resume similarity endpoint, consider returning partial or paginated data to avoid loading large embeddings into memory on a single request.apps/registry/app/job-similarity/page.js (1)
488-509
: Consolidate color management for consistent theming.
You have a large static array of colors. Consider using a coloring library or consistently applying a site-wide color palette for better maintainability across the app.apps/registry/app/components/Menu.js (1)
47-56
: LGTM! Consider adding path check for nested routes.The new "Similarity" menu item follows the established patterns for styling and active states. However, for consistency with the Jobs link which checks for nested routes (
pathname.startsWith('/jobs/')
), consider extending the active state check to include potential nested routes under/job-similarity/
.- isActive('/job-similarity') + isActive('/job-similarity') || pathname.startsWith('/job-similarity/')
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (6)
apps/registry/app/api/job-similarity/route.js
(1 hunks)apps/registry/app/api/resumes/route.js
(1 hunks)apps/registry/app/api/similarity/route.js
(1 hunks)apps/registry/app/components/Menu.js
(1 hunks)apps/registry/app/job-similarity/page.js
(1 hunks)apps/registry/package.json
(3 hunks)
🔇 Additional comments (5)
apps/registry/app/api/similarity/route.js (1)
9-16
: Prevent possible undefined behavior for supabaseUrl.
Although the code checks for SUPABASE_KEY, it doesn't verify that supabaseUrl is valid or non-empty, which may result in unexpected behavior if the constant is changed or if the environment changes.
apps/registry/app/api/job-similarity/route.js (1)
9-16
: Validate supabaseUrl.
Similar to the resume similarity endpoint, ensure supabaseUrl is non-empty to avoid potential runtime errors if the URL or environment changes.
apps/registry/app/job-similarity/page.js (1)
611-619
: Ensure valid navigation links for node clicks.
When opening job or resume links in a new tab, the code relies on the first UUID or username. If multiple records are associated with that node, only the first is used. Confirm whether you want to show multiple links or handle them differently.
apps/registry/app/api/resumes/route.js (1)
22-22
: Confirm the new limit.
Lowering the default limit from 3000 to 2000 may reduce memory usage and load times but might also impact client-facing queries that expect more data. Ensure that this aligns with your current performance and data retrieval goals.
apps/registry/package.json (1)
34-34
: Consider optimizing visualization package imports.
The addition of multiple visualization libraries (d3-force, plotly.js, react-force-graph) could significantly increase the bundle size. Consider:
- Using dynamic imports to load these packages only when needed
- Evaluating if all visualization packages are necessary or if some functionality could be consolidated
- Using the core plotly.js package instead of plotly.js-dist if you're only using specific plot types
Let's analyze the usage of these packages:
Also applies to: 97-97, 105-106, 108-108
const algorithms = { | ||
mst: { | ||
name: 'Minimum Spanning Tree', | ||
compute: (nodes, minSimilarity = 0.3) => { | ||
// Kruskal's algorithm for MST | ||
const links = []; | ||
const parent = new Array(nodes.length).fill(0).map((_, i) => i); | ||
|
||
function find(x) { | ||
if (parent[x] !== x) parent[x] = find(parent[x]); | ||
return parent[x]; | ||
} | ||
|
||
function union(x, y) { | ||
parent[find(x)] = find(y); | ||
} | ||
|
||
// Create all possible edges with weights | ||
const edges = []; | ||
for (let i = 0; i < nodes.length; i++) { | ||
for (let j = i + 1; j < nodes.length; j++) { | ||
const similarity = cosineSimilarity( | ||
nodes[i].avgEmbedding, | ||
nodes[j].avgEmbedding | ||
); | ||
if (similarity > minSimilarity) { | ||
edges.push({ i, j, similarity }); | ||
} | ||
} | ||
} | ||
|
||
// Sort edges by similarity (descending) | ||
edges.sort((a, b) => b.similarity - a.similarity); | ||
|
||
// Build MST | ||
edges.forEach(({ i, j, similarity }) => { | ||
if (find(i) !== find(j)) { | ||
union(i, j); | ||
links.push({ | ||
source: nodes[i].id, | ||
target: nodes[j].id, | ||
value: similarity, | ||
}); | ||
} | ||
}); | ||
|
||
return links; | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor repeated similarity checks across algorithms.
Many algorithms re-compute cosine similarities. You could avoid repeated calculations by preprocessing a distance matrix or similarity matrix once. This can significantly reduce repeated work for large node sets.
// Helper function to compute cosine similarity | ||
const cosineSimilarity = (a, b) => { | ||
if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) return 0; | ||
|
||
const dotProduct = a.reduce((sum, _, i) => sum + a[i] * b[i], 0); | ||
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); | ||
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); | ||
|
||
return dotProduct / (magnitudeA * magnitudeB); | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure divisor non-zero guard in cosineSimilarity.
Though the function checks array format, consider handling the case if one (or both) of the magnitude calculations is zero, which can lead to NaN or Infinity.
const cosineSimilarity = (a, b) => {
if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) return 0;
const dotProduct = a.reduce((sum, _, i) => sum + a[i] * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
+ if (magnitudeA === 0 || magnitudeB === 0) {
+ return 0;
+ }
return dotProduct / (magnitudeA * magnitudeB);
};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Helper function to compute cosine similarity | |
const cosineSimilarity = (a, b) => { | |
if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) return 0; | |
const dotProduct = a.reduce((sum, _, i) => sum + a[i] * b[i], 0); | |
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); | |
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); | |
return dotProduct / (magnitudeA * magnitudeB); | |
}; | |
// Helper function to compute cosine similarity | |
const cosineSimilarity = (a, b) => { | |
if (!Array.isArray(a) || !Array.isArray(b) || a.length !== b.length) return 0; | |
const dotProduct = a.reduce((sum, _, i) => sum + a[i] * b[i], 0); | |
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); | |
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); | |
if (magnitudeA === 0 || magnitudeB === 0) { | |
return 0; | |
} | |
return dotProduct / (magnitudeA * magnitudeB); | |
}; |
@@ -14,10 +14,10 @@ | |||
"dependencies": { | |||
"@ai-sdk/openai": "^0.0.14", | |||
"@faker-js/faker": "^8.0.2", | |||
"@jsonresume/jsonresume-theme-professional": "workspace:*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate theme package entry.
The package @jsonresume/jsonresume-theme-professional
appears to be listed twice in the dependencies. According to the AI summary, one entry was supposed to be removed.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Dependencies