Algorithms Suggest Epic Moments in Live Streaming

How to spot a memorable or "epic" moment in a live video content? Meeyoung Cha and Kunwoo Park talk about their recently published work inEPJ Data Sciencewhich uses a deep learning model to identify such epic moments.

Live streaming has become a popular internet culture. Platforms like TikTok and Twitch have over 60 to 140 million monthly active users.

Virtually anyone can stream content on these platforms, which makes finding epic and funny moments challenging due to the sheer volume of seemingly mundane and lengthy videos.

在一项发表的新研究中EPJ Data Science,我们展示了人工智能(AI)如何帮助人类编辑很快发现实时流媒体内容的有趣片段。

This decision is collectively made based on audience reactions in chat messages, the structure of video frames, view counts, and streamer information. Among these, emojis and audience reactions act as critical components that guide the AI algorithm.

深度学习用于从多模式数据中学习史诗时刻的特征,以提出有趣的视频片段,其中包括胜利,有趣,尴尬和令人尴尬的时刻。

When tested via a user study, this AI suggestion is found to be comparable to expert suggestions in spotting epic moments.

Using recommended clips as guiding data

To train the algorithm, we need guiding data that represent “epicness.” On Twitch, there are manually constructed “clips” or Twitch highlights that are 5 to 60 second long segments, contributed by streamers and viewers.

Figure 1 shows an example of live-streamed content that lasted 11 minutes and 55 seconds. Two segments of this content had been highlighted as recommended “clips”, running 53 seconds and 30 seconds each.

Figure 1. Interesting segments of live streaming are highlighted as two separate clips, which received 21 views and 170 thousand views each. By collecting these clips, we can build an algorithm to automatically detect epic moments.
©作者(2021)

The second clip reached over 170,000 views indicating more epicness. The figure also shows user reactions to those chosen video segments. Emojis or Twitch-specific emoticons are commonly expressed in chat.

We collected two million user-recommended clips and the associated user conversations to understand the ingredients of epic moments. Our work defines epic moments as an enjoyable bite-sized summary of a long video content.

史诗般的时刻与视频亮点相似,因为它们都是长视频的简短摘要,但两者的功能也有所不同。史诗般的时刻代表了“愉快的”时刻,而亮点本质上是“信息丰富的”。

社会信号作为史诗般时刻的提示

We discovered that emotes and user reactions play a critical role in finding epic moments.

Figure 2 shows clustering results on emotes that appear in user chats on the two-dimensional space identified by the t-distributed stochastic neighbor embedding (t-SNE).

该颜色表示集群的类别,该图呈现了最接近每个emote群集的五个示例单词令牌。我们可以看到相似的情绪在Twitch上的情感表达功能。

图2.每个集群的示例情绪和相关文本。每个emote嵌入向量(上)和示例情绪和相关文本令牌(下)的群集表示图。表示形式由T-SNE绘制,相关令牌是由符号群和单词向量之间的距离选择的。
©作者(2021)

These insights are used to build a deep learning model called Multimodal Detection with INTerpretability (MINT), which merges and analyzes key features like chat, video metadata, and view count.

The comprehensive features from these three domains capture different aspects of epic moments, and combining those cues leads to a better prediction.

一项用户研究还证实,算法建议被判断为像人类推荐的剪辑一样有趣。

此外,算法建议涵盖了各种环境,例如失败的游戏时刻,有趣的舞蹈动作,游戏中令人惊讶的复出和非游戏时刻,如图3所示。

Figure 3. Example of the algorithmic suggestions on epic moments. The MINT model can discover (a) failing, (b) funny, (c) outplay, and (d) non-game moments.
©作者(2021)

By contrast, most human suggestions contained game winning moments.

随着人口不断增长的时间花费时间观看互联网上的直播内容,AI建议可以帮助编辑和观众发现史诗般的时刻。

对MINT算法的代码感兴趣的研究人员和用于培训的剪辑数据集可以在我们的GitHub页面上找到更多信息https://github.com/dscig/twitch-highlight-detection

View the latest posts on the On Physical Sciences homepage

注释