Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support globbing in imports #2376

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

neunenak
Copy link
Contributor

@neunenak neunenak commented Sep 19, 2024

Implements shell-style globs in import statements using the Rust glob crate. Addresses #1641

@neunenak neunenak force-pushed the import-glob branch 2 times, most recently from 55a246a to 94fd369 Compare September 19, 2024 09:02
@neunenak neunenak marked this pull request as ready for review September 19, 2024 09:02
Copy link
Owner

@casey casey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. The main one is that we have to handle non-unicode paths, so we should look at a different glob crate.

src/compiler.rs Outdated
@@ -75,14 +75,39 @@ impl Compiler {
.join(Self::expand_tilde(&relative.cooked)?)
.lexiclean();

if import.is_file() {
let import_base_str = import.to_string_lossy();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just is expected to work with non-unicode paths, so we can't do a lossy conversation here. I'd suggest checking out the globset crate, which I believe handles non-unicode paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globset unfortunately doesn't have a method to iterate over a file system entry and select only paths that match the glob, like the iterator that glob_with returns with the glob crate. Looking at how glob_with is implemented in glob, it seems like this code is a bit tricky to write and I'd rather avoid it (or if I did write that code, I think it might work out to just implementing path-globbing functionality from scratch, which is also undesirable).

There's also this crate: https://crates.io/crates/wax , which is utf-8 only but claims to reasonably handle sane non-utf-8 text internally, and also seems to have the file system iterator I want. It also hasn't been updated in a bit and claims to be experimental, so I'm kind of iffy about this crate.

Maybe I'm missing something obvious that would make this easier to do.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's lame. A concern about wax, any syntax that wax supports becomes part of the public-facing interface of just, and so can't be added and removed without a backwards-incompatible change. wax supports some pretty exotic syntax, like repetitions <a*/:0,> and alternatives {a?c,x?z,foo}, which I doubt would be incredibly useful for use in import paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like even globset expects to be given the glob as a string: https://docs.rs/globset/latest/globset/struct.GlobBuilder.html#method.new . Also see this 2019 comment from the maintainer of globset on glob: rust-lang/glob#78 (comment) and this globset issue: BurntSushi/ripgrep#1250 .

I'm starting to think that there's no principled way to support globbing on non-utf8 filesystems without basically submitting a patch to some upstream glob project to do it, but maybe few enough people are using these tools with non-utf8 globs that no one has previously run into it?

The to_string_lossy function turns any non-utf8 sequence into the unicode replacement character �. I wonder if there is some other function that could be called to try to convert a PathBuf into a meaningful Rust utf-8 string (maybe doing something smart with guessing the original encoding). The quick solution would be to move this conversion into the has_glob block, which would mean that people using glob imports and file paths with non-utf8 names would run into issues but at least no existing just behavior would break.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit lame, but we could just throw an error if you try to glob with a path which can't be converted to unicode. Non-unicode paths are very rare and basically just crazy, so not supporting them is probably fine, especially if they're only unsupported if you use a pretty niche feature.

I'm tempted to land this feature as unstable initially, both because of the unicode path issue, but also because there's some choice about which glob syntaxes we support, and it might be good to make compare them all and make a principled choice about what to support.

for import in import_paths {
let Ok(import) = import else { continue };

if import.is_file() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a glob matching a directory should be an error. Just because we can backwards-compatibly make it not an error, but not backwards-compatibly turn it into an error.

@neunenak neunenak requested a review from casey October 2, 2024 08:39
@neunenak
Copy link
Contributor Author

neunenak commented Oct 2, 2024

Alright, I gated glob imports behind unstable, went back to using the glob crate, and explicitly error out if conversion to a utf-8 native rust string doesn't work.

Copy link
Owner

@casey casey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some initial review, see comments.

src/analyzer.rs Outdated
if let Some(absolute) = absolute {
stack.push(asts.get(absolute).unwrap());
Item::Import { absolute_paths, .. } => {
for p in absolute_paths {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for p in absolute_paths {
for path in absolute_paths {

@@ -3,6 +3,7 @@ use super::*;
#[derive(Copy, Clone, Debug, PartialEq, Ord, Eq, PartialOrd)]
pub(crate) enum UnstableFeature {
FormatSubcommand,
GlobImport,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GlobImport,
ImportPathGlob,

@@ -11,6 +12,7 @@ impl Display for UnstableFeature {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
match self {
Self::FormatSubcommand => write!(f, "The `--fmt` command is currently unstable."),
Self::GlobImport => write!(f, "Glob imports are currently unstable"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Self::GlobImport => write!(f, "Glob imports are currently unstable"),
Self::ImportPathGlob => write!(f, "Globs in import paths are currently unstable."),

src/parser.rs Outdated
@@ -365,7 +365,7 @@ impl<'run, 'src> Parser<'run, 'src> {
let optional = self.accepted(QuestionMark)?;
let (path, relative) = self.parse_string_literal_token()?;
items.push(Item::Import {
absolute: None,
absolute_paths: vec![],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
absolute_paths: vec![],
absolute_paths: Vec::new(),

(No real reason, I just like avoiding macros.)

@@ -7,7 +7,7 @@ pub(crate) enum Item<'src> {
Assignment(Assignment<'src>),
Comment(&'src str),
Import {
absolute: Option<PathBuf>,
absolute_paths: Vec<PathBuf>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually just keep this called absolute. It's a vec of paths, so we know that they're paths and there's potentially more than one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument for changing it is that "absolute" is a pretty general word out of context, so it's hard to guess what it might mean when you see it as a random property access somewhere in code. I don't feel super strongly about it however.

@@ -75,14 +86,65 @@ impl Compiler {
.join(Self::expand_tilde(&relative.cooked)?)
.lexiclean();

if import.is_file() {
let has_glob = import.as_os_str().as_encoded_bytes().contains(&b'*');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because glob supports more meta-characters than *. It supports ?, *, [, !, and ]. I suppose we should check for all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that none of those characters will show up in a legitimate justfile? I think that's safe to assume for * but maybe not for characters like [ and ].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there was a slightly different version of the import keyword that signaled that shell globs might be used, or an argument to it? import glob "my/path/*.just" or something like that?

src/compiler.rs Outdated
})?;
let glob_options = glob::MatchOptions {
case_sensitive: true,
require_literal_separator: false,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be true. If false, then / can be matched by *, which is the bash default and zsh default. ** can be used to match an arbitrary directory path, including *.

src/compiler.rs Outdated
let import_base_str =
import
.as_os_str()
.to_str()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do to_str on the path without first converting to an OsStr.

src/compiler.rs Outdated
}
})?;
for import in import_paths {
let Ok(import) = import else { continue };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This swallows I/O errors. If we encounter an I/O error we should fail.

});
}
}
} else if import.is_file() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this circular import test duplicated with the one above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but I think that trying to de-duplicate those lines of code would add more complexity than it's worth, since the logic here is just constructing the CircularImport error if necessary and then manipulating some local variables.

@neunenak neunenak requested a review from casey October 4, 2024 21:39
@casey
Copy link
Owner

casey commented Dec 22, 2024

I keep coming back to this. I have these worries:

  • I worry about delegating glob parsing to a dependency we don't control, and thus may not understand the syntax.

  • I worry about the backwards compatibility impact on existing justfiles. Since although it is crazy to use */?/** in a path name, it does happen, and maybe especially on windows or GUI systems. This might also be extra confusing because the globbing character wouldn't appear in the path literal in the justfile.

  • I worry about the restriction that it doesn't work with non-unicode paths.

I wonder if maybe it would be better if there was some syntax to enable the glob at the import, and it was that syntax which was experimental, instead of a setting which turns it on for all imports

Something like:

glob import '*/mod.just'

@woutervh
Copy link

looking forward to see something like this merged.

@casey
Copy link
Owner

casey commented Jan 10, 2025

@neunenak What are you thinking about this? I think we should do this, and I think the combination of both making it unstable, and opting into explicitly for each import (syntax tbd, but glob import "foo" or import glob "foo" both seem fine), make me much more comfortable about dealing with the shortcomings later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants