Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More named characters and (possibly) unicode characters #1

Open
rljacobson opened this issue Aug 5, 2020 · 4 comments
Open

More named characters and (possibly) unicode characters #1

rljacobson opened this issue Aug 5, 2020 · 4 comments

Comments

@rljacobson
Copy link
Collaborator

My deep dive into Mathematica's operators and characters occurred after the last commit to the lexer. I would not trust the Letter and Letterlike lists to be 100% complete, nor do I think I included all named characters. I might be missing a small number of obscure operators, too.

More information that you probably want: What is really amazing is, determining membership in the set of Mathematica LetterLike characters is an open problem. Or more accurately, which of the valid Unicode characters Mathematica accepts as a valid variable name is an open question. There are a few reasons for this:

  1. The notebook frontend insists on re-interpreting input as the user writes it and can reinterpret characters that the kernel would otherwise treat as a letter/identifier as implicit multiplication.
  2. There are various built-in functions that should tell you which characters can be used as identifiers, but it appears there are counterexamples to all of these methods.

One strategy is to use the undocumented Mathematica system file UnicodeCharacters.tr, which contains data about most characters. Another strategy is to use the output of one of the built-in functions.

An example of inconsistent behavior:

In[1]:= isLetter[char_]:= StringMatchQ[char, LetterCharacter]

In[2]:= isLetter["\[RawEscape]"]
Out[2]= True

In[3]:= LetterLikeQ["\[RawEscape]"]
Out[3]= False

In[4]:= isLetter["\[BlackKing]"]
Out[4]= False

In[5]:= LetterLikeQ["\[BlackKing]"]
Out[5]= LetterLikeQ[♚]
(*  ?!? Remains unevaluated. *)

In[8]:= \[BlackKing] = 7
Out[8]= 7

In[9]:= \[BlackKing]^2
Out[9]= 49

So LetterLike and "can be used as a variable" are distinct concepts. The output of these functions might change from version to version as well.

@rocky
Copy link
Member

rocky commented Aug 6, 2020

Thanks for the detailed information.

There is no such thing as too much information - only too little information. ;-)

And what makes FoxySheep awesome in my opinion is that you have both thought about these issues and have made an attempt to identify them if not resolve them.

My cleverness is noticing this fact, or when somone else does good work and make use of it and promote it, especially when no one else seems to have done so, so far. (Psst.. I think Stephen Wolfram has this trait too; the difference is that I am more transparent about it and openly acknowledge that this is what I've done and try to be sure to give enough thanks where thanks are due.)

Some other little things which I hope will explain why greek and special letters were even noticed, and things that are likely to be noticed...

While FoxySheep project has been focused on perfection, FoxySheep2 has not. I am letting bugs reports and need drive what gets improved. (And to date there haven't been few bug reports, so that's great, since I am not in a position to fix any for a while!)

So until such time as someone cares about \[BlackKing], neither does that codebase. I have simply been going through tutorials and running through the examples and how those get converted to Python/Numpy/SciPy. And even here, there are many examples that I am aware don't work. But that's okay.

For some of these, I have a story in my mind as for how I'd fix. But fixing isn't going to happen in a while unless someone else does that.

And even for the Greek letters where the parsing has now been corrected, I don't have a complete translation done to Python. This is not a conceptual problem so much as I just haven't coded it. In concept, the Greek letters get converted to Unicode and Python variables are allowed have unicode names. But right now this only works for single-letter Greek letters and symbols, partly because that's the use case I came across. The other part is because I had previously a mechanism for translating one full name to another. For example Cos is translated to math.cos or sympy.cos or numpy.cos, depending on the output selection.

The last thing I should mention is that handling different releases of Mathematica I view as a problem that should be dealt with head on. Just as there is a way to specifiy which kind of output you want to produce, I envision the same kind of way to specify which kind of input is allowed and that includes things like Mathematica release, and could include notebook frontend versus non-notebook front end. Not only FoxySheep2, but FoxySheep already provides for InputForm and FullForm input parsing. This is just adding more possibilities like that.

However the first step in providing something like this is identifying this. So if that is done in this code base or in FoxySheep2, we will respect that.

But in sum, again my thanks for the great work, care and attention that you've already put into this project!

@rljacobson
Copy link
Collaborator Author

What's funny is that I made a typo! I meant to say, "More information THAN you probably want." :)

It is really nice to finally have someone notice my hard work. Your warm words mean a lot, especially coming from you.

I am letting bugs reports and need drive what gets improved. ...many examples ...don't work. But that's okay."

Yes, I think that's exactly what you should do. It's far more important to get a solid framework that will be usable. And you have the ability to know when and how to not box yourself in design-wise, which is part of why you are so good at what you do.

Just as there is a way to specify which kind of output you want to produce, I envision the same kind of way to specify which kind of input is allowed and that includes things like Mathematica release, and could include notebook frontend versus non-notebook front end.

YES! That was my vision, too. I have spent a lot of time thinking about how to accurately curate that data. Having a publicly available database of this information would be valuable to other people working on Mathematica tools, also. Given that one has a database, the implementation of this kind of feature should be pretty easy. (Famous last words.)

Back to the named characters, I found my data set mapping long names to Unicode code points. It is likely to be complete for at least Mathematica v11.2. Would you like me to send it to you?

A general answer to handling named characters is to have the lexer recognize the lexical form \[...] and just hand the parser whatever the character represents to the parser (Unicode characters, operators, whatever), presumably by consulting a lookup table. It's probably pretty easy to do. (Again, famous last words.) Note that Mathematica makes use of Unicode code points marked for "private use."

@rljacobson
Copy link
Collaborator Author

Given that one has a database, the implementation of this kind of feature should be pretty easy.

I should probably qualify this. If one starts with the right abstractions from the beginning, it should be pretty easy. That's why I make a big deal about keeping operator precedence easy to change, for example.

@rocky
Copy link
Member

rocky commented Aug 6, 2020

I found my data set mapping long names to Unicode code points. It is likely to be complete for at least Mathematica v11.2. Would you like me to send it to you?

Yes, please do. Just add it as a file of some suitable type in this repository or in FoxySheep classic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants