Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp parse error makes entire id3 tag unreadable #462

Closed
milesegan opened this issue Sep 15, 2024 · 10 comments · Fixed by #463
Closed

Timestamp parse error makes entire id3 tag unreadable #462

milesegan opened this issue Sep 15, 2024 · 10 comments · Fixed by #463
Labels
bug Something isn't working
Milestone

Comments

@milesegan
Copy link

milesegan commented Sep 15, 2024

Reproducer

use std::path::Path;

use lofty::config::{ParseOptions, ParsingMode};
use lofty::file::TaggedFileExt;
use lofty::probe::Probe;
use lofty::tag::Accessor;

fn main() {
    // the error seems to occur with strict or relaxed parsing mode
    let parsing_options = ParseOptions::new().parsing_mode(ParsingMode::Relaxed);
    let path = Path::new("sample.mp3");
    let file = Probe::open(&path)
        .expect("can't probe")
        .options(parsing_options)
        .read()
        .expect("can't read");
    let tag = file.primary_tag().expect("no tag");
    println!("{:?}", tag.artist());
}

Summary

Trying to parse the attached file gives this error:

can't read: BadTimestamp("Expected a separator")

Expected behavior

If the timestamp is invalid could it be skipped in relaxed parsing mode?

ffprobe output:

Input #0, mp3, from '/Users/miles/tmp/sample.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2015-07-08 16:53:04
    encoder         : Lavf54.20.4
    title           : Freedom
    artist          : Pharrell Williams
    album           : Freedom - Single 
    title-sort      : Freedom
    album-sort      : Freedom - Single 
    artist-sort     : Pharrell Williams
  Duration: 00:02:44.49, start: 0.000000, bitrate: 130 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 44100 Hz, stereo, fltp, 128 kb/s
  Stream #0:1: Video: mjpeg (Baseline), yuvj444p(pc, bt470bg/unknown/unknown), 600x600 [SAR 72:72 DAR 1:1], 90k tbr, 90k tbn (attached pic)
      Metadata:
        comment         : Other

Assets

sample.mp3.zip

@milesegan milesegan added the bug Something isn't working label Sep 15, 2024
@Serial-ATA
Copy link
Owner

If the timestamp is invalid could it be skipped in relaxed parsing mode?

That was what I originally intended to happen, and completely forgot to implement it. 😄

2015-07-08 16:53:04

I've never seen a timestamp like this before, though (should be "2015-07-08T16:53:04"). Do you know where the file came from? The parser can already handle multiple common spec violations, and this may be one that should be supported if it comes from some popular software/platform.

Besides that though, yeah I do need to fix errors for BestAttempt/Relaxed parsing.

@Serial-ATA Serial-ATA added this to the 0.22.0 milestone Sep 15, 2024
@milesegan
Copy link
Author

I’ll try to find out the origin. A user sent it to me.

@milesegan
Copy link
Author

Unfortunately I don’t know the origin of this file. I’m not sure if it’s likely to be a common occurrence.

@Serial-ATA
Copy link
Owner

In that case, I don't know if I should include an exception for that mistake. I'll have it just return what it can parse. So 2015-07-08 16:53:04 will just return a timestamp with year, month, day and discard the rest (when not using ParsingMode::Strict).

@milesegan
Copy link
Author

That sounds like a perfectly reasonable best effort result. Thanks for taking a look at it!

@hasezoey
Copy link

i also just ran into this (BadTimestamp("Expected a separator")), using the default BestAttempt mode and also trying Relaxed, the value in question (to my knowledge, the only one in the file ffmpeg shows):
TYER : 20180923
this value comes from a old version of yt-dl (like 2021 or before)
(mp3 file, only concerning the id3v2 frame, not id3v1)

the git version (as of 4a2bcf5) does not error out, but also does not parse the timestamp

@Serial-ATA
Copy link
Owner

Hello @hasezoey!

TYER : 20180923

That's not a valid TYER frame, it should contain only the year. In ID3v2.3, you should have a combination of TYER (year) + TDAT (day/month).

does not parse the timestamp

let Frame::Timestamp(year_frame) = year_frame else {
log::warn!("TYER frame is not a timestamp frame, retaining.");
tag.insert(year_frame);
return;
};
// This is not a TYER frame
if year_frame.timestamp.month.is_some() {
return;
}

Right now, when upgrading ID3v2.3 to ID3v2.4 I have a check to discard the frame if it isn't actually a valid TYER. I guess I could ignore it for BestAttempt/Relaxed, never seen this before.

@hasezoey
Copy link

I guess I could ignore it for BestAttempt/Relaxed, never seen this before.

on newer versions of yt-dl(p) it is replaced with the TDRC tag, a reproduction can be done with:

# yt-dlp version 2024.08.06
yt-dlp -x --audio-format mp3 --embed-metadata -o "out.mp3" https://www.youtube.com/watch\?v\=XCZcbUcaxoM

ffprobe out.mp3

hexdump -C out.mp3 | less

00000040  20 55 6e 69 74 79 00 54  44 52 43 00 00 00 0a 00  | Unity.TDRC.....|
00000050  00 03 32 30 31 38 30 37  32 36 00 54 58 58 58 00  |..20180726.TXXX.|

(PS: one of my uploaded videos)

this also currently triggers the lofty error.

i dont actually care that much about that tag, only really came here because of the whole probe failing, but wanted to provide more information.

@Serial-ATA
Copy link
Owner

20180726

Dates without separators should be supported since I fixed #452 (not published yet).

I haven't had much time to work on Lofty lately, I'd like to get #218 in before getting 0.22.0 released, since it'll probably be my last major update for a bit.

Related to that, if you could help test that PR at all it'd be much appreciated. See #218 (comment)

@hasezoey
Copy link

Dates without separators should be supported since I fixed #452 (not published yet).

sorry, didnt re-test with the git version; can confirm that the TDRC is correctly parsed there, though the TYER is still not present (i guess to be expected as its invalid - at least in the id3v2 section).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants