How “real” is artificial intelligence (AI) in the creative world? That seems to be the question everyone is attempting to answer. Avid illuminated varied possibilities at the recent IBC gathering, including with Avid MediaCentral and AI, demonstrating both existing capabilities and new areas of exploration designed to elevate the creative experience by minimizing time spent on redundant tasks.
In the spirit of openness and collaboration, we have delivered published papers such as the SMPTE Journal on the subject of “Machine Learning Applied to Media Libraries for Insights, Search, and Segmentation,” and are participating in industry conferences including the SMPTE Media Technology Summit in Hollywood in October. At the event, Avid staff will present a variety of papers on ongoing developments and also demonstrate some of the latest AI features in products such as Media Composer and MediaCentral.
Avid customers have used AI for several years within the Avid portfolio, with integrations with cognitive services providers to deliver metadata enrichment around:
- Facial recognition
- Scene detection
- Speech-to-text transcription
- Optical Character Recognition (OCR)
- Content moderations
- Audio effects
- Frame patterns
This metadata enrichment can be triggered manually or performed automatically, depending on system configurations.
In addition, capabilities in products such as Media Composer ScriptSync and PhraseFind, and phonetic searching in MediaCentral, have also been AI powered. But all of this has been around for years, so, so far, so what?
Here’s what: Avid engineers established a framework of research and experimentation on which to build AI-enhanced capabilities across a range of our video, audio, and asset management solutions. Our Research and Advanced Development Lab (RADLab) is bringing potential benefits to light not just for us at Avid but also to our customers. Through this effort a tidal wave of possibilities has opened up as was showcased in Amsterdam at IBC 2023 in September.
First, we unveiled two web-based MediaCentral | Cloud UX proofs of concept:
- A recommendation engine, bringing semantic content discovery capabilities where the AI engine can automatically generate suggestions for footage and B-roll to use with stories.
- A summary and transcription engine, which provides an overview of the spoken content of interviews, edited sequences, or even whole shows. It can also generate full transcripts of interviews, including automatic translation, and storing the transcription with timecoded information as markers on the clip.
- An integrated chatbot able to respond to user queries with suggestions on how to complete tasks, and also links to product help and other documentation.
Let’s delve into each example in a bit more detail, first the recommendation engine.
Typing text into a note, the system automatically offers related video suggestions.
It is important to understand what is happening where and when.
As we type text in the note, a local semantic AI engine reads it and understands it. It then compares that to the footage we have on our system which has already been analyzed and indexed by our own local AI image semantic indexer. So, the images are being suggested based not only on the straight text, but also on an understanding of what the text means in context.
The A-roll is suggested as AI analyzes the text using semantic content discovery technologies– it is not just looking for literal words but uses contextual meaning from the text to find related media.
For the B-roll, it is slightly different. Our own AI model generates alternative sentences based on the script and these are used to widen results and offer more footage of relevance. Secondly, for the summary and transcription engine proof of concept, in this case, we are creating an audio proxy which we then send to our own on-premises transcription service.
We then upload that transcript to a Microsoft cloud service that leverages Open AI technology to create the summary.
The summary engine can produce short, medium, and comprehensive summaries.
A local service then generates the transcript, with timecode information. In addition, the transcript can be automatically translated into other languages. As the timecode information is also stored, this means that the transcription can be easily added as markers to the source clip itself. These markers are not only searchable, but also available in Avid Media Composer through the MediaCentral | Cloud UX panel.
The transcript can be added as markers to the clip.
It is critically important to understand that this is done securely.
Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
- are NOT available to other customers
- are NOT available to OpenAI
- are NOT used to improve OpenAI models
- are NOT used to improve any Microsoft or third-party products or services
- are NOT used to automatically improve Azure OpenAI models for your use in your resource (The models are stateless, unless you explicitly fine-tune models with your training data)
Your fine-tuned Azure OpenAI models are available exclusively for your use. The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).
Avid also demonstrated an AI-powered chatbot at the show, able to answer questions about how to carry out user tasks, drawing on information from the Avid Knowledgebase and from user documentation. The chatbot can suggest step-by-step solutions to problems, as well as guide users to the relevant chapter or paragraph within the documents.
The Avid Ada chatbot in MediaCentral | Cloud UX.
In addition to these Avid MediaCentral and AI developments, Avid also showcased AI capabilities in Media Composer and in Pro Tools.
Artificial Intelligence is undoubtedly an exciting – and intimidating – area for development in the creative space. Avid will be at the forefront of fostering creative solutions to enable the media production community to deliver the highest quality work audiences demand.
Interested in Avid and AI, Past, Present and FutureDiscover More