Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Term Cleanup Feature #815

Merged
merged 32 commits into from
Dec 12, 2024
Merged

Add a Term Cleanup Feature #815

merged 32 commits into from
Dec 12, 2024

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Oct 9, 2024

Description of the Change

This PR adds in a new Feature, Term Cleanup, that uses the power of embeddings, to compare all terms against each other and then recommending similar terms that could be merged together if desired. This comparison process can be run entirely in WordPress but once you reach around 500 terms, performance really starts to slow down. Similar to the Smart 404 Feature, support for Elasticsearch (using ElasticPress) has been added to significantly speed things up.

This Feature requires two services in order to function:

  1. A Provider that can generate embeddings (right now we support OpenAI and Azure OpenAI)
  2. A data storage engine that can also run vector queries (currently supports doing this in WordPress only or using Elasticsearch powered by ElasticPress)

Once the Feature is setup, you can start the comparison process which will send all terms off to the embeddings Provider to generate vector data. If configured, this data is then sent to Elasticsearch and vector queries are run there. If not configured, this data remains in WordPress only and comparisons are done there using that data.

Once the process is complete, you'll be given the opportunity to see all the results and decide which terms you want to merge together (or which terms you want to skip). This process can be re-run whenever needed (i.e. as new terms are added).

Closes #795

How to test the Change

  1. Enable the Feature with at least one taxonomy turned on
  2. Go to the Term Cleanup page and start the process for whatever taxonomy you want
  3. Ensure the process completes and results show
  4. Ensure you can merge and skip terms
  5. If desired, set up Elasticsearch/ElasticPress and test using that

Changelog Entry

Added - New Feature, Term Cleanup, which provides the ability to compare all terms from specific taxonomies and showing recommendations for similar terms that can be merged together

Credits

Props @dkotter, @iamdharmesh, @berkod

Checklist:

@dkotter dkotter added this to the 3.2.0 milestone Oct 9, 2024
@dkotter dkotter self-assigned this Oct 9, 2024
@dkotter
Copy link
Collaborator Author

dkotter commented Oct 9, 2024

@iamdharmesh I believe I have this feature fully moved over to ClassifAI but would appreciate if you could take a look since you're most familiar with how this functions. I moved the background processing to use Action Scheduler since we are already using that in ClassifAI but I believe everything else has been left the same.

Note @jeffpaul mentioned there are some UI changes we're hoping to make so we can either leave this PR until those are ready or can merge this in (once it looks good) and open a new PR for those changes, but leaving this in draft for now

iamdharmesh
iamdharmesh previously approved these changes Oct 10, 2024
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkotter Thanks for working on this. The PR looks great. I’ve fixed a few issues related to the in-progress job status, canceling the running process, and similarity score for database-based comparison. Please take a quick look at those additional changes when you have time. Otherwise, everything looks good.

@jeffpaul
Copy link
Member

I would prefer to get a quick UX review to see if there are some quick iterations to improve here, but if we're otherwise near ready to ship a major release then I'm fine merging this in to ride along there

@iamdharmesh
Copy link
Member

@jeffpaul @dkotter I worked on UX items from https://github.com/10up/classifai-cleanup/issues/7, https://github.com/10up/classifai-cleanup/issues/20, https://github.com/10up/classifai-cleanup/issues/19 and https://github.com/10up/classifai-cleanup/issues/18 and listed status on each below. Some of these need discussion.

image

  • Right now we show a Count field, which represents how many times this term has been used. I know this is common terminology in WordPress so maybe is fine but I think there could easily be some confusion here thinking this is the count of how many matching terms were found. Could maybe update this to Used: 621 times to be more clear?
  • Often terms don't have a description associated with them. I think we should remove this field if it's empty, rather than showing the Description: label and never having a description shown.
  • I'm assuming when we display results we have whatever score was calculated to determine the match. Could we show that score in these results? That way you know how high of a match it actually was? Something like Score: 79% or Match: 80%
  • Since we only have a single action (Skip) we could probably change the name of that column to Action or just remove that label all together
  • I have some confusion in regards to the merge buttons. At a glance, hard to know what each button does; which term is getting merged? What happens to the other? I think we can make this more clear by either updating the button text or adding some helper text (Need help with finalizing a better wording copy here, if we want to update this.)
  • More personal opinion but I think we could benefit from a slight bump across the board with font sizes. Things seem fairly small to me at the moment
  • In addition to the above, could also maybe use a slight bump in padding between things
  • UX - Adding a link to the posts-by-tag
  • UX - An option to sort the terms alphabetically (This is a quick enhancement but as we are showing all the information together in a column, we need to figure out how we can switch between sort by alphabetically and sort by count.)
  • Add ability to search for specific term results

Thanks

@dkotter
Copy link
Collaborator Author

dkotter commented Oct 11, 2024

I would prefer to get a quick UX review to see if there are some quick iterations to improve here, but if we're otherwise near ready to ship a major release then I'm fine merging this in to ride along there

Things look cleaner to me from the last updates made by @iamdharmesh but I agree could use a more formal review, so I'm fine holding this until then

@dkotter dkotter marked this pull request as ready for review October 11, 2024 18:55
@dkotter dkotter requested review from jeffpaul and a team as code owners October 11, 2024 18:55
@github-actions github-actions bot added the needs:code-review This requires code review. label Oct 11, 2024
…ng third parties to hook in and do things like logging
@github-actions github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Nov 19, 2024
@dkotter dkotter requested review from Sidsector9 and iamdharmesh and removed request for a team and jeffpaul November 20, 2024 16:48
@dkotter
Copy link
Collaborator Author

dkotter commented Nov 20, 2024

@Sidsector9 @iamdharmesh I've updated this PR to add support for the new React settings page for this new Feature. Would appreciate a review to ensure I did this properly. I also noticed some flaky tests around the Azure Image Processing that I think I've gotten to be more reliable here.

One question I did have when working on this is do we have a way to support filters in this new React based approach? As an example, this Feature (and other Features have similar things) has a method to get the allowed taxonomies and it passes that through a PHP filter, allowing others to modify the list of taxonomies we output on the settings page. That no longer works since we're outputting this via JS so wondering if there's something I missed to still support that?

@dkotter
Copy link
Collaborator Author

dkotter commented Dec 6, 2024

One question I did have when working on this is do we have a way to support filters in this new React based approach? As an example, this Feature (and other Features have similar things) has a method to get the allowed taxonomies and it passes that through a PHP filter, allowing others to modify the list of taxonomies we output on the settings page. That no longer works since we're outputting this via JS so wondering if there's something I missed to still support that?

Bumping this question up again @Sidsector9 @iamdharmesh. Similar question for the Classification Feature. Previously we were using the get_supported_taxonomies method which would filter all taxonomies through is_taxonomy_viewable and then through our custom filter. Seems in the new approach, we not only lose the custom filter but we lose the is_taxonomy_viewable check as well.

@Sidsector9
Copy link
Member

I'll pick this up for review and respond to the conversation first thing tomorrow.

@Sidsector9
Copy link
Member

Sidsector9 commented Dec 10, 2024

@dkotter I've discussed this with @iamdharmesh today, and for the first iteration, we don't have a way to filter taxonomies.

We can implement this in a JS way:

// This will allow devs to filter taxonomies.
const options = wp.hooks.applyFilters( 'classifai-filter-taxonomies', options );

Filtering values:

wp.hooks.addFilter( 'classifai-filter-taxonomies', 'Classifai', function( options ) {
    // remove taxonomies here.
    return options;
} );

@iamdharmesh what do you think about using this approach?

@Sidsector9
Copy link
Member

@iamdharmesh there is a way to still use PHP filters, but this will require some changes to the current implementation.

  • Fetch array of post types (Make this filterable via PHP).
  • Each post type fetched should contain all taxonomies associated to it (Make this filterable via PHP).
  • On the JS side, using the taxonomy data inside the post types, render the fields. (Note: A taxonomy can be registered for multiple post types, so while rendering make sure only unique taxonomy items are rendered.)

iamdharmesh
iamdharmesh previously approved these changes Dec 12, 2024
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dkotter.

I have updated the PR to retrieve taxonomies from window.classifAISettings and made some minor updates.

@iamdharmesh iamdharmesh merged commit 930692f into develop Dec 12, 2024
18 checks passed
@iamdharmesh iamdharmesh deleted the feature/795 branch December 12, 2024 09:27
@jeffpaul
Copy link
Member

@iamdharmesh I note that there are still a couple checkboxes open in your comment farther above here... if those are still todo would you please get those into a new issue so we can track those separately?

@iamdharmesh
Copy link
Member

Sure thing @jeffpaul. Opened #833

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to do a "Term Cleanup"
4 participants