-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(docs) Explain the 1st party vs 3rd party situation a little better #2823
chore(docs) Explain the 1st party vs 3rd party situation a little better #2823
Conversation
…pectations --- Signed-off-by: Brad Chamberlain <[email protected]>
We do try to explain this is several different places. Something like this could be merged, but I'll edit it to remove a bit of the negativity. Also, turning a 1st party grammar you mistakenly added to the main repo into a 3rd party grammar really just requires moving the files... so the biggest problem here is misalignment of expectations.
I'll look carefully but I don't see why this needs to be said at all... making a language is literally the exact same process 1st party or 3rd party (designed to make it equally easy), it's just a matter of which subdirectory the files go into... |
@allejo @egor-rogov Thoughts? |
Could you elaborate more on this - and perhaps how the PR template could be improved? We merge new PRs all the time, just not new language grammar PRs. Although at the point someone is making a PR it's kind of a little bit too late to tell them.... It's no secret that we don't merge new language grammars but I also think there is a problem with hitting users over the head with that fact every 5 minutes. A lot of people have still found lots of amazing ways to contribute - 3rd party languages, plugins, first party bug fixes, etc. More than a few people originally "burned" by the fact that we don't merge new language PRs have then gone on to contribute by publishing their grammar as a 3rd party language modules (with very little effort)... it's not as terrible as it seems at first thought. :-) So I think how this issue is communicated is pretty important. That said I'm sure we can currently do a better job of setting expectation. I just worry about placing a warning everywhere we can possibly think of - because it sets the wrong tone. |
We could probably also do more to explain to people why getting their language merged often isn't what they think it will be. Some of the most upset people (early on) mostly wanted their language to magically work on StackOverflow/Discourse/[insert big site]... and since you can't talk to those places (so easily) they just thought if they could only get their language merged upstream... boom magic... But that's not how it works at all. These large sites typically (despite using Highlight.js) have a very tight list of curated languages that they support, bandwidth and CDN costs to keep in mind, etc. They only support a small fraction of our 1st party languages and are often quite hesitant to add new ones, and they have many reasons for doing so. That said Discourse now allows injection of custom JS/HTML though, so it's now very easy to add a 3rd party grammar (or 1st) that Discourse doesn't support out of the box. :-) So things are slowly improving. StackExchange will be a harder nut to crack. |
That's completely fine of course. It wasn't my intention to have it read negatively so much as directly/bluntly/unambiguously. After I left work yesterday, I realized that a better title for step 1 might be something like "manage your expectations".
I think making that clearer would be beneficial. I.e., saying something like "for core developers, the paths that follow are relative to
My first question when I found Now that I understand the state of things better, I can see that a literal reading of the Language Contributor Checklist says to create a PR only to add your language to the list of languages, not to add the rest of the code developed in the previous steps, but my reading of it was "of course you'd commit all the code developed in the previous steps in the PR as well" and I viewed adding the language to the list as simply being the icing on the cake. Specifically, nothing seemed to preclude that one could/would open a PR to introduce a new language definition. The PR template's text around adding tests being a pre-requisite for merging also suggested to me that it was possible to add a new language so long as one added tests for it as well. Now I understand that text to be for changes to existing grammars which need tests to lock in those changes and make sure they don't regress, but at the time it didn't seem to rule out checking in code defining a new language and code to do it. One way to finesse this might be to take advantage of GitHub's support for multiple PR templates to have a distinct template for "add a new language" that only talked about adding it to the list and CHANGES.md and a second for "general code improvements" that talked about adding tests, CHANGES entries, etc. Overall, I'd be inclined to combine the "Language Contributor Checklist" and 3rd party README into a single document, because a lot of my confusion was around wondering why both of these documents existed when they seemed to describe a lot of the same steps. My (incorrect) conclusion was that the former was for people who wanted to submit their code back to the project as a PR and the latter was for those who didn't or couldn't. This current PR was my quick attempt at adding text that would've put me onto a better track to begin with (or encourage someone else to do similarly). I thought about a PR to try merging the two docs into one, but that felt more significant than I had the time/interest for (and also isn't something I'd start into without getting your team's go-ahead, obviously).
Where do you say it now? (Because again, I don't feel like I saw that message until you responded to my issue). I suppose the "On requesting new languages" page might be considered such a place, but I didn't read it this way to me. I read it as saying "we're not going to do the language for you; you're welcome to do it yourself; you can do it as a 3rd party package if you want" but again, nothing in it suggested that if someone did show up with a new language implementation it couldn't / wouldn't be considered for inclusion in the project proper. I.e., it read to me as "if you're someone with gumption and can put such a highlighter together, that's great" but nothing said "don't expect it to be merged to the main repo, though." [This "On requesting new languages" document also feels as though it could be integrated into, or at least linked from, the language contributor checklist, perhaps in the "managing expectations" bullet).
Yeah, that'd be another good message to try and get across in the "managing expectations" bullet. It's another one that I didn't fully understand until our exchange on my issue (though I was pleased that you turned out to be wrong about our Discourse site :) , which is the main thing that finally got me off my butt and into this effort; StackOverflow having been a previous one, but one that I knew would be a heavy lift in any case). Speaking of which, if you have the interest and I have the time, I could draft some "How to get your language enabled on a Discourse site that you administrate" instructions for consideration if you have suggestions for where to place that in a PR. Speaking of StackOverflow, since highlight.js is still "new" over there, I posted a question about third-party highlight.js languages yesterday to see whether it would get any traction, and was surprised that the main concern (though only in the comments, granted) was not one of the performance impact of supporting additional languages (which had been my guess) so much as the security concerns. I realize that it's a longshot that they'll do anything here in any case, but didn't see any harm in asking. Anyway that made me curious whether highlight.js does anything to try and ensure that grammars don't contain malicious code (where I could imagine many third-party grammars wouldn't need to support any executable code at all, so much as lists of keywords and regexps). Is it OK if I were to open an issue here asking about this topic? Sorry this turned out to be such a long response. Hopefully I addressed the questions you asked. |
Probably not a bad idea. The second document only exists as a stop-gap to get people started on 3rd party development. It was always intended that over time it could be much improved.
I'm not sure why this distinction matters though... "core developers" already know this, and I think it just muddles the water - creating confusion where none needs to exist... if 3rd party grammars is the happy path, the the docs should focus on that path and just explaining it. What "core" does shouldn't really matter.
As you've said (and noticed) it's several places, perhaps it still is just not explicit enough.
That would be awesome, I just dunno where we'd put it either... I really think we need an "Awesome Highlight.js" type site/docs and I think that might be the landing place for a lot of that kind of thing.
Security concerns are a real concern - putting Highlight.js (and language grammars) on your website is injecting code into a site... you need to trust us and any 3rd party grammars you're using... a malicious grammar could potentially do anything a user could do on a website...
The core team reviews all PRs that go into core, of course. I'm not sure how we could do anything about 3rd party grammars since (by design) grammars are executable code. Any site with serious security concerns using Highlight.js should run it in a background worker thread to isolate it from their primary website code.
What would the question be exactly? I think I've covered a bit above and some of this is also covered in the long, long discussion on why we no longer accept language PRs (you can find it in the long-lived discussion issue). |
That would be fine / clearer to me as well. The challenge I had was that by not saying which root they were relative to (and, again, being under the impression up until our interaction that I could contribute a new language back to the repo), the vagueness had me continue going back and forth between "wait, should it be in
Yeah, it definitely was not clear enough for me, more like hinted at from a few angles. And granted, I'm not always the brightest or most patient of readers, but I think if the first bullet were to have addressed it, it would've saved some time, churn, and eventual disappointment.
If I should read this as "yes, please write this up and we'll figure out where to put it", I will attempt to do so. If I should read it as "it'd be great if you wrote it up, but then we'll probably drop it on the floor because we honestly don't know where we'd put it" then I probably won't. (If it were me, I'd make the first bullet about managing expectations, explaining that you'll be developing a third-party package, and talking about the uphill battle that you might face getting a StackExchange-like site to start using their package; but then to say that integrating into a Discourse site that you admin can easily be done and linking off to wherever else in the docs hierarchy this were to land).
Understood. My thinking was (as I think you implied on the SO question) that it seems like a human can scan most grammars and clearly determine that they are "inert" without much effort (e.g., it defines some key-value pairs that define some keywords and regexes). This made me wonder whether Highlight.js did (or could) do something similar (if javascript's code reflection capabilities were rich enough?) to distinguish between a more 'inert/static' vs. 'active/dynamic' grammar, if you take my meaning. I understand now that it doesn't, and am not particularly surprised / disappointed, but that's what I was going to ask about.
I think you've addressed my first-level curiosities here and on the SO question, thanks. |
@allejo @egor-rogov @binyamin Any idea where something like that should perhaps live? |
Yes.
You could of course easily (at run-time) determine if a grammar TELLS you it's using callbacks or some of the more dynamic features... but it could still contain a lot of code doing who knows what and then still return the simplest actual grammar definition. I think protecting ourself from malicious 3rd party grammars is out of scope... to do so we'd really need static-code analysis (to read the JS WITHOUT running it)... or we'd need to make the grammars entirely static... but even then if we aren't in charge of the WHOLE loading process, users can still shoot themselves in the foot. We'd have to go out of our way to |
Yeah, I was imagining that such cases would be flagged as potentially dangerous by any sort of automated security scanner. So sufficiently simple third-party grammars would pass and anything more complicated would not even if it could be proven to benign.
That's definitely reasonable. If you'd already done something like this and it gave external sites more confidence in using third-party grammars, that'd be cool, but since not, I agree that this is a big project to take on. |
How about creating a new section in RTD along the lines of "Highlight.js Integrations"? It can have two sections "Services Using Highlight.js" and "Self-Hosted Applications" "Services Using Highlight.js" Here we can briefly summarize that SO + Discord use highlight.js but they don't use all of the languages. Here we can point them to resources (if we have any) on how to request them to add support for a language. "Self-Hosted Applications" This is where the Discourse integration can be discussed 😄 |
You mean the readme? I think it's already a bit long , but perhaps if we add a TOC... or a heading that only says "Hey go read out docs on Integrations and then a matching
Why, just for marketing? I'm not sure this is our support burden to handle...
I don't think there is a way (for Discourse), or no useful way... for SO there is a process, and it's documented on SO... and true we could point them there... but I think that'd be a pretty round-about route for someone to take... it's far more likely they just ask on SO and never find our README or docs at all. So I guess the first use case (how do I use custom languages with Discourse) sounds a lot more helpful to me personally... though I don't know that I'm opposed to the latter, just not sure I see the usefulness yet. |
@allejo Do you find my changes an improvement to this file though? If so we should merge it until we have something better. |
Personally, I think the "how to use custom languages with Discourse" should live in the Discourse documentation. Same with SO's and Discord's integration. But since the question was brought up on where it'd live in highlight.js' docs, I'd think it belongs in RTD and not the README. Like you said, the README is long enough and it's not something the average user who wants to highlight their code would care about.
I think this PR as-is is good and clears up enough 👍 |
Ah, Read the Docs, that just clicked with me, LOL. @bradcray Does Discourse have any type of place to perhaps naturally put information like that, wiki, etc? |
I don't really know Discourse well enough to know for sure. The only things I've found are their discussion forums on meta.discourse.org (which is where I figured out it was possible for an admin to enable to new highlight.js languages in a given discourse instance, though the instructions were not particularly clear, particularly for discourse n00bs) and their API documentation (which is a lot lower level from what I've seen). To be clear, I'm fine with dropping the knowledge on the floor as well. I've got it in my README, so I'm good and was only offering it up in case you wanted to make it available to others through your docs. |
Where is your README? |
Here, though note that it's written up as notes to myself / my team rather than anything intended for consumption by the general public: |
Changes
As someone interested in contributing a new language to Highlight.js, I ended up very confused by the distinct messages in
language-contribution.rst
and the GitHub PR template (each of which seemed to be saying "sure you can contribute new languages, no problem, come on in!") versusextra/3RD_PARTY_QUICK_START.md
which seemed to be all about "here's how you can add your own language without contributing it back to us" (which led me to wonder "Why would anyone prefer to do that?" Only after some interaction with the team, did I learn that this is the only option that's really available.To that end, I drafted some text updating item 1 in
language-contribution.rst
to try and clarify the situation for future developers from the outset rather than having people be confused and requiring clarification from the team to settle things out. I'm sure it could be improved further, but I think this would have set my expectations and understanding better from the very start.