blip2_qformer train question #779

Codingfarmer-hkl · 2025-01-02T08:04:36Z

##================= Image Captioning ========================##
decoder_input_ids = text_tokens.input_ids.clone()
decoder_input_ids[:, 0] = self.tokenizer.bos_token_id
labels = decoder_input_ids.masked_fill(
decoder_input_ids == self.tokenizer.pad_token_id, -100
)

    query_atts = torch.ones(query_tokens.size()[:-1], dtype=torch.long).to(
        image.device
    )
    attention_mask = torch.cat([query_atts, text_tokens.attention_mask], dim=1)
    lm_output = self.Qformer(
        decoder_input_ids,
        attention_mask=attention_mask,
        past_key_values=query_output.past_key_values,
        return_dict=True,
        labels=labels,
    )

    loss_lm = lm_output.loss

    return BlipOutput(
        loss=loss_itc + loss_itm + loss_lm,
        loss_itc=loss_itc,
        loss_itm=loss_itm,
        loss_lm=loss_lm,
    )
    attention_mask = torch.cat([query_atts, text_tokens.attention_mask], dim=1)，
    attention_mask should be a mask, text attention mask should not be bidirectional？？？？？？？

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blip2_qformer train question #779

blip2_qformer train question #779

Codingfarmer-hkl commented Jan 2, 2025 •

edited

Loading

blip2_qformer train question #779

blip2_qformer train question #779

Comments

Codingfarmer-hkl commented Jan 2, 2025 • edited Loading

Codingfarmer-hkl commented Jan 2, 2025 •

edited

Loading