What do movie characters’ relationships reveal about gender, and how has this changed over time?

尽管进步取得了不断的进步,但电影中的性别差距仍然很大。利用IMDB的字幕和信息进行了15,000多部电影,我们发现在整个类型的电影中,平均只有3.3名女性。

数据科学有可能提出广泛的社会科学问题。在这里,我们将注意力转移到电影中女性的刻画,该行业对社会对生活的影响有重大影响,包括自尊和职业选择。bob体育手机app下载

在过去的几年中,电影业的性别差距引起了很多关注。这个问题是众所周知的:妇女的薪水仍然不足和代表性不足。这种情况必须改变,我们认为改变某事的最佳方法是阐明问题所在。

How can data science help?

作为数据科学家,我们决定利用网络和机器学习算法来通过执行迄今为止最大的分析来调查电影行业的性别差距问题。

为此,我们从在线电影数据库IMDB和从封闭的字幕字幕中获取的电影对话数据集融合了数据,以创建最大的电影社交网络语料库(15,540个网络)。

Analyzing this, we investigated the role of on-screen women in the film industry over the past century. First, we combined data on movie subtitles with the IMDb dataset. Next, using named entity recognition (NER), we extracted the movie characters from the subtitles and linked them to the actors. Then, we built a social network of interactions among the movie characters.

How to build a social network from subtitles

To better understand how our algorithm works, let’s look at three lines from the “The Matrix” subtitles as an example. First, using NER we detect where and when each character names appeared in the subtitles. In this case, we have a scene where Morpheus talks with Neo. To find the actor named and to verify that it is a character, we match the names found in the subtitles with the character list from IMDb.

最后,使用匹配的字符,我们在电影间隔中出现的字符之间创建了一个链接,而该字符的时间间隔小于预定义的阈值。在我们的示例中,我们知道Morpheus向Neo介绍了自己,并且我们知道Morpheus和Neo在5秒钟的时间间隔内交谈。

If 5 seconds was smaller than the predefined threshold, we connect and edge between Neo and Morpheus. We do this process for all the lines in the movie subtitles which results in a weighted social network where the edge weight is the number of times two nodes (characters) appeared together.

The evolution of female representation in the Star Wars movies series (Copyright Dima Kagan, Thomas Chesney & Michael Fire CC BY 4.0).

看中心

Using these networks, we investigated the difference between genders in movies. We thought it would be interesting to analyze the number of women in the top-10 roles according to their centrality in the network.

尽管上个世纪有所改善,但在电影中的领导角色中,男性的平均水平仍然是女性的两倍。

The main roles are the most important in a film that gets most of the spotlight and it is important to see enough women in these roles. We found that, on average, women play fewer central roles in films with a very evident gap.

在上个世纪,这个数字一直在不断增长。但是,今天平均而言,在电影中前10名角色中,男人仍然是女性的两倍。该结果表明,平均而言,女性角色更小。

如何衡量公平代表?

Today, the most well-known measure of how fairly women represented in films is theBechdel测试. To pass the test the film has to pass three criteria: (1) it has to have at least two women in it, who (2) who talk to each other, about (3) something besides a man.

我们想到的第一件事并不容易手动检查每部电影是否通过测试,为什么不自动化它。我们使用我们构建的网络来提取基于网络的功能,并根据机器学习算法创建了自动化的BechDel测试。

Using our automated Bechdel test we found that some movies currently are misclassified it terms of their Bechdel test score. Additionally, we used it to quantify unclassified movies discovering an increase in the number of movies passing the Bechdel test.

Marilyn Monroe was an American actress famous for being cast as comedic “blonde bombshell” characters. Image by skeeze, Pixabay CC-0.

尽管Bechdel测试肯定是一个有用且重要的测试除了男人以外的其他几秒钟之外,这部电影将通过传统的Bechdel测试。

我们坚信,今天我们应该进行一项测试,以提供更准确的电影中女性代表性。我们提出了一项测试,该测试使用每个性别的总数(相互作用数)来衡量性别差距。

我们认为,一个好的经验法则应该是:

可悲的是,所有电影中只有12%通过了这项测试。话虽如此,我们发现了许多证据表明,电影中女性代表性的趋势有所改善。

这些结果凸显了大量数据以及高级算法以及提高性别不平等研究的高潜力。未来使用类似方法的研究还可以分析电视系列和其他类型的媒体发现其他空白。

查看有关社会主页的最新帖子bob体育手机app下载

注释