Marco Mistretta

About

I'm a PhD student in Artificial Intelligence at MICC, University of Florence, advised by Prof. Andrew D. Bagdanov and Prof. Marco Bertini.

I recently completed an Applied Scientist Internship at Amazon (RufusX Team, London), developing large-scale multimodal generative models for the Amazon Rufus initiative, impacting millions of users worldwide.

Research Interests

My research focuses on Multimodal Vision-Language Models (VLMs) like CLIP and their practical applications. I work on fundamental problems in:

Multimodal Representation Learning: Understanding and bridging modality gaps in vision-language models
Prompt Learning & Knowledge Distillation: Improving zero-shot generalization without labeled data
Continual Learning: Enabling models to learn incrementally while preventing catastrophic forgetting
Few-Shot & Test-Time Adaptation: Efficient model adaptation with minimal supervision

Key Contributions

I have published 3 first-author papers at top-tier AI conferences:

ICLR 2025 (main conference) — Exposing intra-modal misalignment in CLIP via modality inversion
ECCV 2024 (main conference) — Unsupervised prompt learning via knowledge distillation
NeurIPS 2023 (workshop) — Incremental fine-tuning for biomedical vision-language models

My work advances the state-of-the-art in vision-language understanding and has practical applications in medical imaging, open-vocabulary recognition, and continual learning systems.

News

Jul 2025 – Dec 2025

Internship at the Amazon RufusX team in London.
Jan, 2025

A paper on multimodal VLMs representation is accepted at ICLR 2025.
Sep, 2024

Presented a paper on prompt learning at ECCV 2024.
Aug, 2024

KDPL source code is finally available!
Dec, 2023

Presented a paper on continual learning at NeurIPS 2023 (workshop).

Recent Publications

Work Experience

Applied Scientist Intern – Amazon (RufusX Team, London)
July 2025 — December 2025
Worked on Generative AI and Multimodal Large Language Models (MLLMs) as part of the Amazon Rufus initiative. Fine-tuned, evaluated, and deployed large-scale multimodal models impacting millions of customers, advancing multimodal reasoning and generation at scale.

Education

PhD student in Artificial Intelligence
Nov, 2023 — Present
University of Florence, Florence, Italy

Topic: Multimodal Vision-Language Models, Incremental Learning, Prompt Learning.
M.S. in Artificial Intelligence
Sep, 2021 — Jul 2023
University of Florence, Florence, Italy

Thesis: "RE-Tune - Incremental Fine-Tuning of Biomedical Vision-Language Models"
B.S. in Computer Science and Engineering
Sep, 2018 — Sep, 2021
University of Florence, Florence, Italy

Thesis: "Scarlatti-Gen - AI-Driven Sonata Generation Using Weighted Graphs and CNNs"

Teaching and Mentoring

Teaching Assistant, University of Florence

Delivering interactive lessons on C/C++ and Python to over 200 bachelor students.
Jan 2024, Jan 2025
Thesis Co-Supervisor, University of Florence
Apr 2024, Sep 2024
"Mitigating Catastrophic Zero-shot Forgetting in CLIP via Distillation of Low-Rank Adapters from Learned Prompts", Proposed a novel method to efficiently few-shots fine-tune CLIP models that mitigates catastrophic forgetting and preserves zero-shot capabilities, based on distilling learned prompts in LoRa adapters.
Student Ambassador, University of Florence
Jan 2020, Dec 2020
Mentoring students on exams projects, internships, and career development.

Curriculum Vitae

Browse my curriculum vitae below or download the PDF version .

Contact

Institutional Email
marco.mistretta@unifi.it
Personal Email
marcomistretta99@gmail.com

LinkedIn
Marco Mistretta
GitHub
marcomistretta
Google Scholar
Marco Mistretta
Instagram
marcomistre99
Twitter
mistretta_marco

About

Research Interests

Key Contributions

News

Jul 2025 – Dec 2025

Jan, 2025

Sep, 2024

Aug, 2024

Dec, 2023

Recent Publications

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

KDPL: Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

Work Experience

Applied Scientist Intern – Amazon (RufusX Team, London)

Education

PhD student in Artificial Intelligence

M.S. in Artificial Intelligence

B.S. in Computer Science and Engineering

Teaching and Mentoring

Teaching Assistant, University of Florence

Thesis Co-Supervisor, University of Florence

Student Ambassador, University of Florence

Publications

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

KDPL: Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

Curriculum Vitae

Contact

Location

Research Interests

Key Contributions

News

Jul 2025 – Dec 2025

Jan, 2025

Sep, 2024

Aug, 2024

Dec, 2023

Recent Publications

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

KDPL: Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

Work Experience

Applied Scientist Intern – Amazon (RufusX Team, London)

Education

PhD student in Artificial Intelligence

M.S. in Artificial Intelligence

B.S. in Computer Science and Engineering

Teaching and Mentoring

Teaching Assistant, University of Florence

Thesis Co-Supervisor, University of Florence

Student Ambassador, University of Florence

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

KDPL: Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

BibTeX Citation