What Do Universities Mean By ‘Diversity?’

Aug 6, 2019 by

David Rozado Research Summary –

Several political creeds over the past few decades have come to support the idea that diversity is valuable and desirable and that diverse societies may improve communication between people of different backgrounds and lifestyles, leading to greater understanding and peaceful coexistence. The usage of the term “diversity” has gained prominence as a result, at least in the Anglosphere.

Academic institutions have willingly embraced the concept of diversity and have put in place procedures to foster diverse faculty, administrative and student bodies by supporting the recruitment of individuals from historically excluded populations. The pro-diversity efforts have been justified in terms of how diverse academic communities provide educational benefit to students and on grounds of achieving social justice.

Given the well-documented benefits of viewpoint diversity (Duarte et al. 2014; Shi et al. 2019),  especially for enterprises of exploratory nature such as education and research, the degree of universities commitment to embrace viewpoint diversity becomes a metric of paramount importance. Yet, there is a scarcity of scientific work which has studied universities’ attitudes towards viewpoint diversity.

This work analyses how 50 elite universities in the United States use the terms diversity and diverse in their online institutional domains. In particular, the focus is on quantifying to what extent universities concentrate on the demographic denotation of diversity over its intellectual denotation. The sample data analysed consists of a large corpus of textual data gathered from the institutional web domains of the studied universities. An automatic web crawler (spider) was used to scrape textual data (16GB) found in universities websites by automatically following links within a University domain and collecting all detected textual content except structural, coding and css styling HTML elements.


In order to study the usage of the diversity concept by universities, distributional semantics theory is used, which postulates that linguistic items with similar distributions tend to have similar meanings (Firth 1957). That is, the meaning of a word can be approximated by the set of contexts in which it occurs.

Recent advances in machine learning such as word embeddings for natural language processing (NLP) have given credence to the distributional hypothesis (Mikolov, Yih & Zweig 2013).  A word embedding model derives from a large corpus of text a mapping of words to dense vector representations in a continuous high dimensional space (see Figure 2) that capture complex semantic and syntactic relations between words by leveraging the cooccurrence statistics of words and contexts in the corpus on which the model was trained.

A particular type of word embedding that has become very popular in the machine learning literature is the word2vec set of techniques (Mikolov et al. 2013). Word2vec uses a shallow neural network to learn a distributed representation of words based on the textual contexts in which they occur within a text corpus, thus leveraging the distributional hypothesis. After training word2vec on a text corpus, words that are used in similar contexts will end up with similar numerical vector representations. One of the most impressive capabilities of word2vec is its ability to draw together words that are used synonymously in similar contexts even if they never appear together in the training corpus. This feature is a key component of the ability of word2vec to generalize.

Figure 2 illustrates the mapping of words to vector representations carried out by word2vec. A key property of word embedding models is the clustering of terms with similar semantic roles (see top right of Figure 2) and the existence of structure in vector space such as regular offsets between pairs of words with a particular semantic association that map to culturally meaningful relationships such as gender (see bottom right of Figure 2). Since word2vec brings words used in similar contexts, and thus semantically related according to the distributional hypotheses to adjacent regions in the vector space, the context in which a word is used in a corpus of text can serve as a reliable proxy to estimate the semantic denotation with which the word is used in the corpus. Connotations of words in vector space can be estimated by calculating the cosine similarity in vector space between a word vector of interest and any other reference term.

Full Paper:

David Rozado (2019). “Using Word Embeddings to Analyze how Universities Conceptualize ‘Diversity’ in their Online Institutional Presence.” Society. DOI: 10.1007/s12115-019-00362-9

David Rozado is a Senior Lecturer in the College of Enterprise and Development at Otago Polytechnic (New Zealand).

continue: What Do Universities Mean By ‘Diversity?’ – Heterodox Academy

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.