Project 3— Data Visualization

15 min readNov 12, 2020

CMU Communication Design Studio Documentation

Week 1 — 11/10/2020

For this project, we were each assigned to a dataset and the existing data visualization to explore. We were tasked to illuminate connections among data in ways that help viewers recognize, engage in, and think critically about important inherent relationships, that may not be apparent or may be overlooked.

Analyzing the data visualization

The data visualization which focuses on diversity in tech companies can be found here.

What data is introduced?

The data visualization is titled “Diversity in Tech”. The maker wanted to showcase gender and race breakdowns of employees at key technology companies from 2014 to 2017. The visualization introduces two main columns as gender and ethnicity. Gender column characterizes female and male. Ethnicity column consists of White, Asian, Latino, Black, Multi, and Other. 23 tech companies are introduced. The maker also showed gender and race breakdowns in other entities for comparison, including US population, fortune 500 CEOs, employees at Top 50 US companies and US congress.

Image source: Diversity in Tech data visualization

How would you characterize the steps in the story?

The interactive data visualization allows the user to filter and sort the data to reveal other key points. For example, the user can click the tab of each year to see the breakdowns in year 2014, 2015, 2016 and 2017. The user also has the ability to toggle year on year change to see the percentage difference compared to the year before.

The user can click on the tab of a gender or a racial group to sort that column in an ascending or descending sequence. This allows people to compare the breakdowns among companies.

clicking a tab will first sort the column in a descending order

What relationships emerge from the visualization?

The gender breakdown in each tech company does not relate with the race breakdown. Generally, about half of all employees are white in key technology companies. Asian group makes up a second largest portion of all employees, while the population of Asian is only 10% of white population in the US. The percentage of Asian employees ranges from 9% to 45% among tech companies. I was surprised to see that companies with close number of female and male employees, such as Indiegogo and Pandora, still have far more white employees than employees belong to other racial groups.

One interesting finding was that 21% of Amazon’s employees are black, while the percentage of black employees is under 10% in any other tech company. I guess it is because Amazon has many black employees for its delivery service.

What do you believe the maker wants you to see?

The maker presents the information about which tech companies employ the most people of color and which are more in favor of white employees. The maker seems to place the most emphasis on the percentage of white male employees in key tech companies. The two columns are placed next to each other, and the similar colors are used.

Why is their stance important?

The explanation below the graphic mentions that some companies started expanding their gender and ethnic diversity since 2016. Within these companies, Asian staff accounted for the majority of the increase, while the ratios of employees from other racial groups slightly increase. The implication that tech companies start employing more people of color seems the driving factor for creating the visualization.

Their stance is important because it also shows that Asian staff accounted for the majority of non-white employees.

Analyzing the dataset

What other relationships might be inherent in the data and of value to highlight?

I noticed that the comparison data regarding the employee breakdown in US population, top 50 US companies etc. are only available in 2014 and 2017, and the maker didn’t show the change in gender ratio and ethnic diversity. What are the countrywide and global trends in employee breakdown from 2014 to 2017?

From 2014 to 2017, white population in the US decreases 3%, while both the Asian and the Latino increase 2%, and the black community increases 1%. Does the ethnic diversity change in leading tech firms reflect the nationwide population trend?

Week 1–11/12/2020

Richard Wurman Reading

Richard Wurman is an information architect. In his book Information Anxiety 2, He talks about five ways of organizing information — LATCH method:

Location: Could be a specific place, geographical location, or distance

Alphabet: Ascending or descending alphabetical order

Time: Best for organizing events that take place over fixed durations

Category: Grouped based on commonalities, is helpful when making comparisons. Category is well reinforced by colors.

Hierarchy: Used when people want to assign value or weight to the information.

Data Visualization Examples

Income Disparity in the States

I like the consistency of the form, and how it gradually uncovers many layers of information when you hover over each circle.

Wind Map

I like the animation in this viz, winds look soft but very eye-catching. Grayscale is used in a very effective way.

Crayola color chart

I like the simplicity of the form, and how color chart evolves over the time. This is relevant to my data set because I am also interested in seeing how employee diversity changes over time.

Coffee drinks illustrated

I have a “types of coffee” visualization very similar to this one in my iCloud. The viz is simple and really effective in showing the composition and proportion for each type of coffee.

What facets of your data are you considering using in your project and why?

I would like to explore how the gender ratio and ethnic diversity in technology companies evolve over the time. Instead of focusing on the percentage change among different companies, I am interested in exploring the trend in the tech industry.

I am also curious to see employee breakdown in different career levels, which is not reflected in the current data visualization. If white males account for the majority of the people in decision making levels in a company, even if the gender ratio is close to US population and the company is expanding its ethnic diversity by employing more people of color, structural inequity is still a issue.

What design research question is guiding your project?

How does employment diversity of different career levels in key technology companies evolve over time?

What organization methods do you imagine leveraging in the data (LATCH)?

Location: In the data spreadsheet, people can review and compare the information of employment diversity that come from different tech companies.

Alphabet: Entity names are alphabetically organized in the existing visualization.

Time: The data set covers the period from 2014 to 2017. The viewer can click year tab in the top-right corner of visualization to see the breakdowns in year 2014, 2015, 2016 and 2017.

Category: This data set utilizes categories in a few ways. For each year’s dataset, the overarching categories are total workforce and change from previous data. For company, the maker categorizes companies into social media sites and tech companies. For gender, there are two categories as female and male. For ethnicity, the maker identifies racial groups as White, Asian, Latino, Black, Multi, Other, and Undeclared.

Hierarchy: In the notes of data set, the maker mentions that in most cases, gender data are global while ethnicity data are US only. However, this difference in scale isn’t reflected in the current data viz. When the viewer clicks on the tab of a gender or a racial group, the entities will be organized by magnitude from large to small in that category.

What coordinate system(s) do you see emerging as logical and appropriate?

A cartesian coordinate system will work well for comparing difference in gender ratio and ethnic diversity among tech companies. Each row represents the values for a company, and each column shows percentage difference. Companies are sorted by percentage of a category, in descending order, rather than alphabetically. Instead of giving the US population breakdown its own row, I am thinking it could be presented as a vertical line to provide a sense of low and high.

I am also interested in year to year change in each category, so I might explore ways that shift focus.

What may serve as a logical sequence for people to move through the content (narrative/indexical/combo)?

I would like to use a combination of narrative and indexical sequences. Indexical sequence means viewers have the option to explore various categories(gender, ethnicity, job categories). While the narrative approach will help explain why the reader should care.

Week 2–11/17/2020

Visualization Components by Nathan Yau

The ingredients of visualization can be broken down into four components: visual cues, coordinate system, scale, and context.

Visual Cues: The key components are position, length, angle, direction, shapes, area, volume, color saturation and color hue.

Coordinate System: The coordinate systems can be categorized as cartesian, polar and geographical.

Scale: Scales can be selected on the basis of numeric, categorical and time.

Context: Information that lends to better understanding the who, what, when, where, and why of the data.

I found the table below from Yau’s reading as a good way to organize all of the different visual cues and coordinate systems.

Yau’s reading talked about scales and how they can be on a spectrum from literal to abstract. Literal representations might make it hard for viewers to find patterns/relationships, but abstract representations might make it difficult to understand content.

In-class review of Visualization Components by Nathan Yau, notes by Stacie

Scales includes:

· Linear(0,1,2,3) (implies a sequence)

· Categorical(cold, warm, hot) (categories are on the same plane)

· Percentage(0%,10%,20%) (categories and parts to whole)

· Logarithmic(1,10,100,1000)(rarely)

· Ordinal(good, bad, terrific) (hierarchy involved)

· Time(month, season)

· Location

· Alphabetical

Interactive Sensory Patterns by Stacie Robrbach

Design strategies that facilitate concrete learning are:

· Pattern and Detection

· Representation using visual, temporal, and/or aural forms

· Interaction and Experience

What organization systems do you propose employing for each type of data and why?

The organization systems I’m thinking of utilizing are:

Time: show employees’ demographic representation data from 2014 to 2018, focusing on leading tech companies including Apple, Amazon, Facebook etc.

Category: show percentage of each gender and ethnic group in each company, show gender and ethnic representation across different job categories.

Hierarchy: For each company, prioritizing race and gender breakdowns among all the employees, then viewers can explore employee diversity data across job categories.

What levels of scale and ranges do you plan to use and why?

I plan to use the following scales and range:

Categorical

Show different categories under gender and race. I feel ranges for ethnicity in current data viz are very clear: White, Asian, Latino, Black, Multiracial and Other. In EEO-1 form, there are 7 race ranges, including native Hawaiian and American Indian. Considering employee counts in the two groups, I will combine these two ranges as “other”.

EEO-1 form collects data on ten major job categories, I will combine job categories to create three: leadership, technical, non-technical, so that it makes it easier for viewers to compare ethnic representation across job categories.

Percent

I will use percent to show gender ratio, ethnic diversity, and demographic representation across different job categories.

Time

Show data from 2014 to 2018(2019 EEO-1 survey is not yet opened). Existing dataset has data points from 2014 to 2017. I want to include employee breakdown data in recent years since they are available, and probably there is a tendency as I noticed that tech companies expanded their ethnic diversity in 2016.

Location

Show data about each company.

Week 2–11/19/2020

Research on Diversity in Tech

I did secondary research around diversity in tech, and found some high level conclusions:

In tech leadership position, white employees are significantly over-represented relative to their overall percentage of the US population, while Black and Hispanic/Latino employees are significantly under-represented.
This article mentions that prominent tech companies including Apple, Facebook, and Microsoft began sending out diversity reports since 2014. However, reports show little change in last six years. Among leadership and technical roles like coders and engineers, the diversity numbers are even lower.

I was impressed with Facebook’s diversity report, they showed how gender and ethnic representation across job categories evolve over time.

Microsoft’s Global Diversity & Inclusion Report illustrated demographic representation in a detailed way.

I also found that all companies with 100 or more US employees have to file an EEO-1 with the government every year. The EEO-1 breaks out employee counts by gender, race, and job category (manager, professional, laborer, etc…). Through further exploration, I found that many tech companies share diversity data in their annual diversity report. Since I am really interested in exploring employee diversity across different job categories, I decided to add new data to current dataset.

Week 3–11/24/2020

Visual exploration of representation

I started visual forms exploration by assigning each type of data a unique variable/cue, and tested with different approaches for representation. I did 3 rounds of sketches.

Here are takeaways after discussing with Carol and Stacie:

I would go with the direction of using color hue to represent race with six ranges, because identifying six colors has a lower cognitive load than differentiating six shapes or textures.
If use shapes to represent gender, then by assigning each gender icon a different color, I could use this way to show each racial group.
Thinking about layering the information, I wonder would job category be another layer of information added to icon or it would be the axis so the positions of icons would indicate the percent. I will explore both directions.

Week 4–12/1/2020

What did you gain from the presentations today and how are you revising the plan for your prototype?

Based on feedback I received from the presentation, I plan to work on the following aspects:

Pathway through my data

I feel the last step of current pathway is still too broad. What do I want people to see from demographic presentation across job categories? Which job categories should I focus on? I gained some insights after revisiting the dataset and doing secondary research. Here are my new findings:

In order to build inclusive work environments, offering diverse individuals leadership positions is important so that underrepresented groups can have the ability to make changes.
Asian employees make up a greater share of professional workforce than other minority groups, but their representation decreases at the leadership positions. However, the stairway leading white men to the top widens as it rises.

I am thinking the last step of my data visualization could around showing percentage change across job categories for each gender and racial group.

2. Redefine ranges for the scale of job category

The new hierarchy could be:

Executives
Managers
Technical roles(Professionals & Technicians)
Support roles(sales workers, administrative support, service workers)
Craft & Labor roles(craft workers, operatives, laborers & helpers)

3. Visual cues & Variables

Consider:

if using sound to present time is easy to discern, maybe use temporal aspects that slowly present information over time
use skin tones for racial groups

I plan to spend some time to refine the connections between visual/aural/temporal cues and scales.

Week 5–12/8/2020

During today’s work session, I discussed my progress with Amrita and here were feedback I received:

There is a close cognitive connection between skin tone colors and racial groups. I would like to stick with using color to show race.
The shapes I used to show gender is a little hard to differentiate. I might consider using the same color background under the icons’ lineworks. The icons look playful, but they might reflect some stereotypes about gender identity.

Below are the updated and original visualizations regarding gender ratios. I removed arrows in axises and dashed lines indicating 50% as they confused audience at first glance.

Left: updated visualization showing gender ratios in 2014; Right: original visualization showing gender ratios in 2018

Week 5–12/10/2020

What did you gain from the testing session today and how are you planning to fine-tune your final prototype?

During the testing session today, we were grouped into teams of four and each one presented the work in progress to three classmates who don’t work on the same topic.

I shared still images of data visualization I made in Figma, then walked them through the path that viewers might take.

Left: Global gender ratio in tech companies; Right: Demographic representation in Apple in 2018

Viewers will see details about each group’s representation when hovering over the icon.

People like the interaction I plan to prototype that when hovering over an icon, all the same icons are highlighted, meanwhile they can see more information regarding the group of people the icon represents. I would like to keep this iteration for my final prototype.
I used different coordinate systems(although both are cartesian) in two types of frames. I was asked if I could combine them into one single visualization with interactions to reveal different types of data (gender ratio and demographic representation). I have done a couple of iterations for coordinate systems. Since my big question is related to employee representation across job categories, Stacie suggested that it still makes sense that I make “job category” as one axis so that viewers can compare information, and the other axis could represent amount.
During the testing session with classmates, I got feedback that to make gender icons more mindful. The face icons with light color slightly confused my audience, because I used the same color to represent white people. Also face icons could relate to stereotypes that men have short hair and women have long hair. With this consideration, I plan to explore other icons representing gender. I would like to pursue a more abstract form.
When I highlighted the patterns I found that the way leading white men to the leadership roles get wider as it rises, which is opposite to Asian employees. Alex would like to know what does equality look like. Based on this feedback, I consider adding “US population” as a range or reference line for people to compare with.

Design decisions I made for my final prototype:

Left: Iterations for coordinate systems; Right: Iterations for representing race and gender

Final prototype in Figma:

Week 6–12/17/2020

What are your key takeaways from the second project? How might you apply what you learned in the future?

I found myself often return to Yau’s reading when playing with different visual cues for each type of data. That reading is really helpful for learning how to represent information in a clear and effective way! I did a lot of research on diversity in tech in the early phase of this project, diversity in tech is a big issue with topics related to disparities in education, unconscious bias in hiring and promotions of people in different racial groups etc. Through this project, I learned how to identify the kind of data that interests me from research, then visually representing it in a more comprehensible way. In the future, I would continue considering using visual, aural, and temporal channels for communicating information.

I did my first Figma prototype for this project. I used Sketch and Adobe XD for prototyping before. It was great to see that these screen-based prototyping tools share similar functions, and I found the learning curve was low. I was so impressed with “smart animate” function in Figma. It’s powerful and even generates the smooth transitions I didn’t expect to see! (When clicking a tech company’s logo from the entry page, the transition looks like icons in the US population are distributed to different job categories) And this is way more faster than making the similar animation/transition in After Effects. This makes me think that the workflow could be more productive if we could effectively combine using different tools to generate the visual effects we want for future projects.