Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby 2.0.0: invalid multibyte escape: /^\xFF\xFE/ (SyntaxError) #5

Open
pjg opened this issue Apr 26, 2013 · 14 comments · May be fixed by fairtilizer/vpim#1
Open

Ruby 2.0.0: invalid multibyte escape: /^\xFF\xFE/ (SyntaxError) #5

pjg opened this issue Apr 26, 2013 · 14 comments · May be fixed by fairtilizer/vpim#1

Comments

@pjg
Copy link

pjg commented Apr 26, 2013

In .bundle/gems/vpim-0.695/lib/vpim/vcard.rb:678.

And, well, "Application could not be started".

@henrycatalinismith
Copy link
Contributor

Hey @pjg

There are a few forks of vpim here and there which claim to add Ruby 2.0.0 compatibility. For example, a recent pull request by @gdjamerson went into this.

Until @sam-github gets around to handling this in the core vpim gem, I'd recommend trying out some of the forked versions in the meantime.

@pjg
Copy link
Author

pjg commented Apr 30, 2013

Yeah, thanks! I've already found the vcard gem, which does exactly what I need.

@dylanjha
Copy link

any update on this? I'm currently experiencing this issue

@henrycatalinismith
Copy link
Contributor

There are a bunch of forks of vpim that attempt to fix this problem in a myriad of ways, and I've written a summary of them. A lot of people, such as @pjg are doing fine with the vcard gem, which is a vpim fork that removes all the non-vCard functionality and simply removes the code causing the error. I've also rolled my own fork which replaces the erroring code with string = string.force_encoding("utf-8"), and it's been working just fine.

The downside is that folks generally cut out all the non-vCard functionality, such as the iCal stuff, in their forks. And some of them abandon support for older versions of Ruby. So watch out if either of those things is a relevant constraint in your case. You're probably going to need to shop around for a fork, though, because the guy who built this seems to have some other stuff going on at the moment and there's not really much movement on this issue. I have projects that I prioritise lower than others too so I know the feel.

On that topic: hey @sam-github, if you can describe your ideal pull request for this issue - in terms of solution approach, minimum Ruby version support, etc - I will be happy to put in the time to make it happen. Sound good?

@sam-github
Copy link
Owner

I think it should work back to ruby1.8, which is still standard in many
environments, would you agree?

And I know the tests don't all pass anymore, but at least the ones that
test this aspect should pass.

Shockingly, I've never received a pull request to fix this...

I shopped around through the forks for an hour once, thinking surely
someone must have fixed it... but no, just awful hacks, despite that what
the code does is described in a great big comment block just above the
regex, and it seemed to be completely ignored on all forks I saw

Basically, its common to get strings loaded from file where you don't know
what the encoding is. Its a huge pain. Doesn't show up when you get vcards
over the wire, usually, because http and smtp have charset encoding in the
wrapper, so you know what you have. But Apple in particular writes vcards
to disk as 2-byte unicode, with no BOM at the beginning, endian-ness
depends on the machine (ppc and intel are opposite). Luckily, its pretty
easy to tell what the encoding is:

starts with 1 byte 'B', its ascii or utf-8
starts with 2 bytes '0' and 'B', ucs-2/utf-16, convert to utf-8
as LE or BE, depending on order

Whats right in ruby1.9? As I understand 1.9, in theory the string coming in
is already the right encoding... so converting it to utf-8 is would be
right thing as first step, a no-op if already utf-8 or ascii. But then, you
need to check, was it really utf-8? And if it wasn't, but was ucs-2, set
the encoding and convert it correctly to utf-8.

mikezter added a commit to mikezter/vpim that referenced this issue Jul 26, 2013
This fixes sam-github#5. Ruby 2.0 assumes UTF-8 sourcefile encoding.
@grosser
Copy link

grosser commented Sep 13, 2013

FYI it think all new apps are ruby 2 and most older apps are ruby 1.9 by now, so dropping 1.8 support is not a big deal and 1.8 users can still use the old versions of the gem, so please merge those fixed + drop 1.8 & release :)

@statianzo
Copy link

Also, 1.8.7 was retired in June.

@sam-github
Copy link
Owner

OK, I'm cool with only 1.9/2.0 support.

@henrycatalinismith
Copy link
Contributor

I think this is a good call. I spent a bit of time trying to make that BOM removal code play nice with 1.8, 1.9 and 2.0, and it wasn't very fun.

@grosser
Copy link

grosser commented Sep 25, 2013

So how about merging the 1.9/2.0 compatibility pulls ?

On Wed, Sep 25, 2013 at 1:57 AM, Henry Smith [email protected]:

I think this is a good call. I spent a bit of time trying to make that BOM
removal code play nice with 1.8, 1.9 and 2.0, and it wasn't very fun.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-25071242
.

@sam-github
Copy link
Owner

None of them work :-( In particular, they ignore the really nice comment block on the top explaining how to detect various encoding forms. I didn't add that because I was bored... address books I interact with write vcards as ucs-2, with no BOM (OS X, looking at you). just forcing the input encoding to utf-8 doesn't cut it, if it isn't utf-8. There must be a better way to do this in ruby 2, but I've never seen a PR that tried. h2s almost got this fixed, though, I'll try to finish it.

@mikezter
Copy link

IMHO Vpim should not be concerned with encodings in any way.
However, it might help to move those detection sequences to a byte Array,
instead of using a String.

On Thu, Sep 26, 2013 at 4:42 AM, Sam Roberts [email protected]:

None of them work :-( In particular, they ignore the really nice comment
block on the top explaining how to detect various encoding forms. I didn't
add that because I was bored... address books I interact with write vcards
as ucs-2, with no BOM (OS X, looking at you). just forcing the input
encoding to utf-8 doesn't cut it, if it isn't utf-8. There must be a better
way to do this in ruby 2, but I've never seen a PR that tried. h2s almost
got this fixed, though, I'll try to finish it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-25140277
.

@henrycatalinismith
Copy link
Contributor

I see there's been some movement on this in the commit log. I'm gonna spend some time testing it over the weekend, and hopefully start using vanilla vpim once again! Nice one Sam :)

@graudeejs
Copy link

Fast forward to year 2015, and issue still persists, sad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants