Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add a config to avoid rendering images as base 64 #397

Open
alkampfergit opened this issue Dec 18, 2024 · 3 comments
Open

Request: Add a config to avoid rendering images as base 64 #397

alkampfergit opened this issue Dec 18, 2024 · 3 comments

Comments

@alkampfergit
Copy link

I have html files that contains inline image data, when I convert to Markdown I got the image embedded in Base64, I'd like an option to avoid this. This is a problem because the resulting markdown is really big.

An option that prevent this would be nice.

image
@mysticmind
Copy link
Owner

Acknowledge seeing this, will see what best we can do. I am currently traveling so will be able to revert on this only in couple of days.

@mysticmind
Copy link
Owner

mysticmind commented Dec 18, 2024

In the interim, you can do pre-processing of your HTML content using HtmlAgilityPack as below to remove that img elements:

string html = @"
    <html>
        <body>
            <img src='data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA' />
            <img src='https://example.com/image.jpg' />
        </body>
    </html>";

// Load HTML document
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

// Select all <img> tags
var imageNodes = document.DocumentNode.SelectNodes("//img");

if (imageNodes != null)
{
    foreach (var img in imageNodes)
    {
        string src = img.GetAttributeValue("src", string.Empty);

        if (src.StartsWith("data:image/"))
        {
            // Remove the <img> node from the HTML
            img.Remove();
        }
    }
}

// Save or display the cleaned HTML
string cleanedHtml = document.DocumentNode.OuterHtml;
Console.WriteLine(cleanedHtml);

@alkampfergit
Copy link
Author

Actually is my actual solution :) preprocessing with HtmlAgilityToolkit and then converting to markdown :).

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants