Microsoft's New AI Vasa App Makes Photos Talk and Sing

Microsoft’s New AI Vasa App Makes Photos Talk and Sing

April 20, 2024 Entrepreneur

Microsoft published a research paper this week highlighting a new AI model called VASA-1 that can transform a single picture and audio clip of a person into a realistic video of them lip-syncing — with facial expressions, head movements, and all.

The AI model was trained on AI-generated images from generators like DALL·E-3, which the researchers then layered with audio clips. The results are images-turned-videos of talking faces.

The researchers built on technology from competitors such as Runway and Nvidia, but state in the paper that their method of doing things is higher-quality, more realistic, and “significantly outperforms” existing methods.

The researchers said the model can take in audio of any length and generate a talking face in accordance with the clip.

The only image that wasn’t AI-generated that the researchers experimented with was the Mona Lisa. They made the iconic image lip-sync to Anne Hathaway’s “Paparazzi,” which starts with the lines “Yo I’m a paparazzi, I don’t play no yahtzee.”
^{A screenshot of the video mid-frame. Credit: Entrepreneur}

The Mona Lisa was one example of a photo input that the AI model was not trained on — but could manipulate anyway. The model could also transform artistic photos, take in singing audios, and handle speech in languages that weren’t English.

The researchers emphasized that the model could work in real-time with a demo video that showed the model instantly animating images with head movements and facial expressions.

Deepfakes, or digitally altered media of a person that could spread misinformation or take someone’s likeness without permission, are a risk posed by advanced AI that can generate digital media with relatively few reference points.

Microsoft addressed that concern generally in the paper, with the researchers stating, “We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection.”

The researchers stated that their technique had potentially positive applications too, like improving accessibility and enhancing educational efforts.

Google demoed a similar research project last month, showcasing an AI capable of taking a photo and creating a video from it that the user can then control with their voice. The AI was able to add head movements, blinks, and hand gestures.

Source link

Beaking News

Preity Zinta’s Cannes First Look In Sparkling White Gown Is Breathtaking | People News

Today’s IPL Match: SRH vs RR – who’ll win Hyderabad vs Rajasthan playoff on May 24? Fantasy team, pitch report, more

‘Singham Again’ Kashmir Schedule Wrapped, Rohit Shetty Shares Pics Of Ajay Devgn From Sets | Movies News

India’s Sunflowers Were The First Ones to Know bags La Cinef award – India TV

Mamaearth’s Ghazal Alagh thanks PM Modi: ‘Aapke aashirvaad aur sarkaar ke support se…’

July 27, 2024

Microsoft’s New AI Vasa App Makes Photos Talk and Sing

Latest News

STAR Hospitals – Celebrating World CPR Day: Stories of Life, Hope, and the Power of Knowledge

Award-Winning Creativity: Zzeeh Productions’ Journey to EEMA Spotlight Success

Meyer Vitabiotics successfully launches National Calcium Day Event Held on 20th July 2024 at Sofitel, BKC Mumbai

Star Housing Finance Ltd gave strong momentum with growth of 2x in PAT QoQ in FY 24-25

Tadoba-Andhari Tiger Reserve (TATR) launches “Gallery for a Cause”: An Art and Nature Initiative to Boost Conservation Efforts

Categories

Microsoft’s New AI Vasa App Makes Photos Talk and Sing

Latest News

Categories

Tags