Generate podcast video snippets using Node.js, AssemblyAI, and Editframe
With nearly 400 million podcast listeners around the globe, creating and distributing a podcast can be a great way to get your message in front of a wide audiences and create awareness for your brand. Producing a podcast is no small feat, however, so you do decide to take on this project, you’ll want to make sure you’re maximizing your return on investment.
One technique for effectively distributing podcasts is to trim each episode into segments based on the topics discussed, and generate a title and synopsis for each segment. This strategy goes a long way with helping with indexing and discoverability in places like YouTube, allows you to distribute your podcast more effectively on social networks, and helps your audience easily find relevant content.
As you might imagine, though, trimming multiple hours-long podcast episodes into individual segments then writing a synopsis for each clip can be extremely tedious. But don’t go looking for a digital intern just yet––in this video, we’ll show you how to use Node.js, AssemblyAI, and Editframe to do this programmatically.
Let’s get started!
Required tools
For this project, you’ll need:
- Node.js installed on your machine (v16+)
- An AssemblyAI account (create one here)
- An Editframe API Token (create one here)
- An Ngrok account (create one here)
Set up a Node.js project
- Create a new project folder:
mkdir podcast-audio-snippets-generator
cd podcast-audio-snippets-generator
- Initialize a new Node.js project
yarn init -y
- Install Express.js to create a small web server that will handle a webhook response from the AssemblyAI API:
yarn add express
- Create a server.js file to house your Express server:
touch server.js
- Paste the code below inside server.js:
const express = require("express");
const app = express();
const port = 3000;
app.get("/", (req, res) => {
res.send("Hello World!");
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}`);
});
Add the AssemblyAI API
- Create an audio.js file to send AssemblyAI API requests:
touch audio.js
- Install the Axios and dotenv package for sending API requests:
yarn add axios dotenv
- Create an assembly.js folder inside the
lib
directory:
mkdir lib
touch lib/assembly.js
- Paste the code below inside
lib/assembly.js
file:
require("dotenv").config({});
const axios = require("axios");
const assembly = axios.create({
baseURL: "https://api.assemblyai.com/v2",
headers: {
authorization: "YOUR_ASSEMBLY_TOKEN",
"content-type": "application/json",
"transfer-encoding": "chunked",
},
});
module.exports = { assembly };
In these lines of code above, we are creating a new Axios instance that will hold our AssemblyAI API base URL and credentials in order to easily send requests.
- Paste the code below into
audio.js
:
const fs = require("fs");
const { assembly } = require("./lib/assembly");
const file = `${__dirname}/podcast-demo.mp3`;
fs.readFile(file, async (err, data) => {
if (err) return console.error(err);
const { data: audioUpload } = await assembly.post("/upload", data);
console.log(audioUpload);
const transcript = await assembly.post("/transcript", {
audio_url: audioUpload.upload_url,
webhook_url: "YOUR_NGROK_URL/webhook",
iab_categories: true,
auto_chapters: true,
boost_param: "high",
custom_spelling: [],
});
});
In in the code above, we are reading the podcast audio file and sending it to the AssemblyAI API which gives us an upload URL for the file. We then then send this to the transcript endpoint on AssemblyAI, which will start the transcription job and assign it a webhook URL. Once the job is complete, AssemblyAI will send us a POST request for our webhook URL that contains the transcription data.
- Update
server.js
with a webhook handler:
const express = require("express");
const app = express();
const port = 3000;
const { assembly } = require("./lib/assembly");
app.get("/", (req, res) => {
res.send("Hello World!");
});
app.use(express.json());
app.post("/webhook", (req, res) => {
console.log(req.body.transcript_id);
assembly
.get(`/transcript/${req.body.transcript_id}`)
.then(async (res) => {
console.log(res.data);
})
.catch((err) => console.error(err));
res.sendStatus(200);
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}`);
});
Let’s break down what the code above is doing:
- First, we will import the Axios instance where we are storing all of our AssemblyAI API credentials:
const assembly = require("./lib/assembly");
- Next, we add body parser to parse the webhook JSON data:
app.use(express.json());
- Finally, we add a POST API endpoint to handle the webhook from the AssemblyAI API which will have a transcript job ID. We’ll use this later to get our transcription data:
app.post("/webhook", (req, res) => {
console.log(req.body.transcript_id);
assembly
.get(`/transcript/${req.body.transcript_id}`)
.then(async (res) => {
console.log(res.data);
})
.catch((err) => console.error(err));
res.sendStatus(200);
});
- Now, run the express.js server:
node server
- Run ngrok service to share the public URL with our local express.js server (localhost:3000):
ngrok http 3000
- Update the ngrok URL in the audio.js file:
const transcript = await assembly.post("/transcript", {
audio_url: audioUpload.upload_url,
webhook_url: "https://77e8-102-48-82-243.ngrok.io/webhook",
iab_categories: true,
auto_chapters: true,
boost_param: "high",
custom_spelling: [],
});
- Send the transcription and upload the API requests by running audio.js:
node audio.js
Add the Editframe API
Now we’re going to bring the Editframe API into our project to handle the creation of video segments.
- Install the
@editframe/editframe-js
SDK
yarn add @editframe/editframe-js
- Create a
generate_videos.js
file inside the lib directory:
touch lib/generate_videos.js
- Paste the code below into
generate_videos.js
:
const { Editframe } = require("@editframe/editframe-js");
const editframe = new Editframe({
clientId: "YOUR_EDITFRAME_CLIENT_ID",
token: "YOUR_EDITFRAME_TOKEN",
develop: true, // dev mode to get progress logs on the terminal and open new encoded video in a new tab
});
const generateVideos = async (chapters, wordsArr, categories) => {
for (const chapter of chapters) {
let composition = await editframe.videos.new(
// options
{
dimensions: {
// Height in pixels
height: 1920,
// Width in pixels
width: 1080,
},
metadata: {
headline: chapter.headline,
gist: chapter.gist,
summary: chapter.summary,
},
// Duration of final output video in seconds
duration: chapter.end / 1000 - chapter.start / 1000,
}
);
const video = await composition.encodeSync();
console.log(video);
}
};
module.exports = { generateVideos };
Let’s dive into what the code above is doing
- In these lines of code, we import the Editframe SDK and initialize a new Editframe instance:
const { Editframe } = require("@editframe/editframe-js");
const editframe = new Editframe({
clientId: "YOUR_EDITFRAME_CLIENT_ID",
token: "YOUR_EDITFRAME_TOKEN",
develop: true, // dev mode to get progress logs on the terminal and open a new encoded video in a new tab
});
- Here, we create a new function that will take in chapters, words, and categories from the AssemblyAI API as its arguments. After that, we create new video composition to contain metadata (summary, gist, and headline) for each chapter, as well as specify each chapter’s duration. Finally, we encode the video:
const generateVideos = async (chapters, wordsArr, categories) => {
for (const chapter of chapters) {
let composition = await editframe.videos.new(
// options
{
dimensions: {
// Height in pixels
height: 1920,
// Width in pixels
width: 1080,
},
metadata: {
headline: chapter.headline,
gist: chapter.gist,
summary: chapter.summary,
},
// Duration of final output video in seconds
duration: chapter.end / 1000 - chapter.start / 1000,
}
);
const video = await composition.encode();
console.log(video);
}
};
- Now, let’s update the webhook API POST endpoint handler with a function to generate our videos:
// import generateVideos function from lib/generate_videos.js
const { generateVideos } = require("./lib/generate_videos");
// update POST API handler
app.post("/", (req, res) => {
assembly
.get(`/transcript/${req.body.transcript_id}`)
.then(async (res) => {
generateVideos(
res.data.chapters,
res.data.words,
res.data.iab_categories_result.results
);
})
.catch((err) => console.error(err));
res.sendStatus(200);
});
- Create an
add_subtitles.js
file inside lib folder to add subtitles to the video:
touch lib/add_subtitles.js
- Paste the code below inside
add_subtitles.js
:
const addSubtitles = async (words, chapter, composition) => {
let wordsConcatenated = [];
for (const word of words) {
if (wordsConcatenated.length >= 8) {
await composition.addText(
// options
{
text: wordsConcatenated.map((el) => el.text).join(" "),
color: "#ffffff",
fontSize: 40,
textAlign: "center",
textPosition: {
x: "center",
y: "center",
},
},
// layer config
{
position: {
x: "center",
y: "center",
},
size: {
height: 1920,
width: 1080,
},
timeline: {
start: wordsConcatenated[0].start / 1000 - chapter.start / 1000,
},
trim: {
end:
Math.round(
(wordsConcatenated[wordsConcatenated.length - 1].end / 1000 -
wordsConcatenated[0].start / 1000) *
100
) / 100,
},
}
);
wordsConcatenated = [];
} else {
wordsConcatenated.push(word);
}
}
return new Promise((resolve, reject) => {
resolve("done");
});
};
module.exports = { addSubtitles };
Let’s explore what the code above is doing.
- In these lines, we create a new function called
addSubtitles
that take in three arguments: words, chapter (the current video segment that has been split by AssemblyAI), and composition (the Editframe composition to which the text will be added). Also, we will initialize a newwordsConcatenated
array that will hold up to eight words, since the AssemblyAI API gives one word per array item and we’d like to add multiple words per video composition to prevent layout issues:
const addSubtitles = async (words, chapter, composition) => {
let wordsConcatenated = [];
- Below, will loop through the words array and check that the wordsConcatenated array does not exceed the maximum number of words that can be added to the video composition (which is eight). Also, we will calculate the start of the text with
wordsConcatenated[0].start / 1000 - chapter.start / 1000
, in which we take the first word start time, divide it by 1000 (which converts it to seconds), then subtract the chapter (current split video part) start time to reset the timestamp to 0:
for (const word of words) {
if (wordsConcatenated.length >= 8) {
await composition.addText(
// options
{
text: wordsConcatenated.map((el) => el.text).join(" "),
color: "#ffffff",
fontSize: 40,
textAlign: "center",
textPosition: {
x: "center",
y: "center",
},
},
// layer config
{
position: {
x: "center",
y: "center",
},
size: {
height: 1920,
width: 1080,
},
timeline: {
start: wordsConcatenated[0].start / 1000 - chapter.start / 1000,
},
// Calculates the duration of text in the video by subtracting the last
// text item end time in the wordsConcatenated Array from the start time of the
// first item in the same array.
trim: {
end:
Math.round(
(wordsConcatenated[wordsConcatenated.length - 1].end / 1000 -
wordsConcatenated[0].start / 1000) *
100
) / 100,
},
}
);
wordsConcatenated = [];
} else {
wordsConcatenated.push(word);
}
}
return new Promise((resolve, reject) => {
resolve("done");
});
- Now, create an
add_images.js
file inside the lib folder to get images from the Unsplash API that match subtitltes topics:
touch lib/add_images.js
- Install the wordsninja package to split the AssemblyAI labels string. (Learn more here). For example, with wordsninja and simple JavaScript, we can convert
Automotive>AutoBodyStyles>SUV
toAutomotive AutoBodyStyles SUV
, then toAutomotive Auto Body Styles SUV
.
yarn add wordsninja
const WordsNinjaPack = require("wordsninja");
const WordsNinja = new WordsNinjaPack();
const axios = require("axios");
const addImages = async (categories, composition, start) => {
const promises = categories.map(async (category) => {
if (category.timestamp.end > category.timestamp.start) {
const label = category.labels[0].label;
await WordsNinja.loadDictionary();
let string = label.split(">").join("");
const { data } = await axios.get("https://api.unsplash.com/search/photos", {
params: {
query: WordsNinja.splitSentence(string).join(" "),
client_id: "YOUR_UNSPLASH_CLIENT_ID",
orientation: "portrait",
content_filter: "high",
},
headers: {
"Content-Type": "application/data",
Authorization:
"Basic YOUR_UNSPLASH_CLIENT_TOKEN",
},
});;
const duration = Math.round(
category.timestamp.end / 1000 - category.timestamp.start / 1000
);
let imagesArr = Array.from({ length: ath.ceil(duration / 5) });
const imagesPromises = imagesArr.map(async (_el, index) => {
if (data.results[index] && data.results[index].urls) {
const imageUrl = data.results[index].urls.full.split("?")[0] + "?q=80";
start += 5;
await composition.addImage(imageUrl, {
position: {
x: 0,
y: 0,
},
size: {
height: 1800,
width: 1080,
format: "fit",
},
timeline: {
start,
},
trim: {
end: 5,
}
});
return new Promise((resolve) => resolve());
}
});
await Promise.all(imagesPromises);
console.log("Image Added", imagesPromises.length);
start += duration;
return new Promise((resolve) => resolve());
}
});
await Promise.all(promises);
return new Promise((resolve) => resolve());
};
module.exports = { addImages };
Let’s break down the code above.
- Here, we will import wordsninja to split label strings, and Axios to get images using the Unsplash API:
const WordsNinjaPack = require("wordsninja");
const WordsNinja = new WordsNinjaPack();
const axios = require("axios");
- In these lines, we create a new function called addImages to hold categories (Topic labels), current chapter, and the Editframe composition. We also load the wordsninja dictionary to split topic label strings and use a first labels array item with a high degree of accuracy. We then use the Unsplash API to retrieve photos that match our topic label:
const WordsNinjaPack = require("wordsninja");
const WordsNinja = new WordsNinjaPack();
const axios = require("axios");
const addImages = async (categories, composition, start) => {
const promises = categories.map(async (category) => {
if (category.timestamp.end > category.timestamp.start) {
const label = category.labels[0].label;
await WordsNinja.loadDictionary();
let string = label.split(">").join("");
const { data } = await axios.get("https://api.unsplash.com/search/photos", {
params: {
query: WordsNinja.splitSentence(string).join(" "),
client_id: "YOUR_UNSPLASH_CLIENT_ID",
orientation: "portrait",
content_filter: "high",
},
headers: {
"Content-Type": "application/data",
Authorization:
"Basic YOUR_UNSPLASH_CLIENT_TOKEN",
},
});;
- In the lines below, we calculate the duration of the category labels to get the number of images we will need to fill using our label data. We then map the number of needed images, and add them to our video composition. Finally, we wait for all promises to be resolved, then return a promise as a result of this function:
const duration = Math.round(
category.timestamp.end / 1000 - category.timestamp.start / 1000
);
let imagesArr = Array.from({ length: Math.ceil(duration / 5) });
const imagesPromises = imagesArr.map(async (_el, index) => {
if (data.results[index] && data.results[index].urls) {
const imageUrl = data.results[index].urls.full.split("?")[0] + "?q=80";
start += 5;
await composition.addImage(imageUrl, {
position: {
x: 0,
y: 0,
},
size: {
height: 1800,
width: 1080,
format: "fit",
},
timeline: {
start,
},
trim: {
end: 5,
}
});
return new Promise((resolve) => resolve());
}
});
await Promise.all(imagesPromises);
console.log("Image Added", imagesPromises.length);
return new Promise((resolve) => resolve());
});
await Promise.all(promises);
console.log("End");
return new Promise((resolve) => resolve());
- Update
lib/generate_video.js
:
const { Editframe } = require("@editframe/editframe-js");
const { addImages } = require("./add_images");
const { addSubtitles } = require("./add_subtitles");
const path = require("path")
const editframe = new Editframe({
clientId: "YOUR_EDITFRAME_CLIENT_ID",
token: "YOUR_EDITFRAME_TOKEN",
develop: true, // dev mode to get progress logs on the terminal and open new encoded video in a new tab
});
const generateVideos = async (chapters, wordsArr, categories) => {
for (const chapter of chapters) {
let composition = await editframe.videos.new(
// options
{
dimensions: {
// Height in pixels
height: 1920,
// Width in pixels
width: 1080,
},
metadata: {
headline: chapter.headline,
gist: chapter.gist,
summary: chapter.summary,
},
// Duration of final output video in seconds
duration: chapter.end / 1000 - chapter.start / 1000,
}
);
// filter wordsArr Array to get only words that is in the current video chapter
const words = wordsArr.filter(
(el) => el.end <= chapter.end && el.start >= chapter.start
);
const chapterCategories = categories.filter(
(el) => el.timestamp.end <= chapter.end && el.timestamp.start >= chapter.start
);
let start = 0;
// chapterCategories.slice(0, 12) used it fro Unsplash API rate limit 50 requests per hour
await addImages(chapterCategories.slice(0, 12), composition, start);
await addSubtitles(words, chapter, composition);
// Add audio file to video composition that matches the subtitle by trimming the audio using chaprt timestamps
await composition.addAudio(
path.resolve("podcast-demo.mp3"),
{
volume: 1,
},
{
trim: {
start: chapter.start / 1000,
end: chapter.end / 1000,
},
}
);
// Add audio Waveform to video composition that matches the subtitle by trimming the audio using chaprt timestamps
await composition.addWaveform(
// file
path.resolve("podcast-demo.mp3"),
// options
{ color: "#fff", style: "bars" },
// config
{
position: {
x: "center",
y: "bottom",
},
size: {
height: 100,
width: 1080,
},
trim: {
start: chapter.start / 1000,
end: chapter.end / 1000,
},
}
);
const video = await composition.encode();
console.log(video);
}
};
module.exports = { generateVideos };
Conclusion
Et voila! We have successfully automated the process of generating podcast video snippets which we can now distribute on social networks, and get as much milage out of our podcast content as possible.
Here’s some examples of video generated using this project:
file1.mp4
file2.mp4
file3.mp4
file4.mp4
This example is generated using the video version, you can find it in the Github repo