What follows are the slides and transcript from my WikiSym talk for the WikiViz 2011 award. Parts of the talk are showing up around the web, and I wanted to have the full context somewhere.
A presentation is about the context of a particular moment, the people, pacing, and tone, and being available afterwards for comments, questions, criticism. So please understand that reading won’t be the same as being.
The talk seemed to mean something to some people; I want to give it the chance to mean something to more people. The good stuff starts in the middle.
I’ll be talking a little about my WikiViz project and also about my personal obsession with missing values and visualizing emptiness.
WikiViz 2011 was about visualizing the impact of Wikipedia. The goal of the competition was to improve our understanding of how Wikipedia is affecting the world beyond the scope of its own community.
The first question when creating a visualization is always: what data to use? I spent a lot of time looking for outward-facing data about Wikipedia. Then on Wikipedia itself, I found data about the global usage of Wikipedia, by country and language. When I finally found data about Wikipedia traffic by country, I knew I had the connections I needed between the world and the world of Wikipedia.
Huge thanks to Erik Zachte, the data curator on whose work this whole visualization rests.
I cleaned data with R and visualized it with Processing, both open source tools. The top represents countries, colored by region and more broadly by global north (blue) and south (red). The bottom represents languages. Connections represent over 100,000 page requests in the year from April 2010 to March 2011. It’s interactive, countries and regions can be highlighted, and sorted by population, pageviews, pageviews per person, and internet access. All data is transparently available on rollover.
In this image we’re sorting by access and can immediately see the difference between global north and south in terms of access.
Europe has generally high access and lots of connections.
Africa has generally low access and few connections.
Oceania also has generally low access, and has no connections.
(None of which is particularly surprising.)
The call for participation asked to see surprises: specifically, where Wikipedia has less reach than expected. This depends on the population and internet access of a country, both of which are considered here. These are the 25 countries* with fewer than 2 pageviews per person per month and greater than 25% access. For comparison: the highest pageviews per person per month is Iceland, with 15.7.
I think that visualization is amazing for its ability to force us to see what’s missing; to see the missing values in a collection of data. Anyone who has experience with data analysis, especially with analyzing other people’s data, knows the feeling of being totally preoccupied with missing values: how are they represented in the dataset? How should we deal with them – bootstrap to fill them in, or throw out the associated data completely?
I find that visualization trains my mind to notice what’s missing.
When I sort by region, I can force you to see the emptiness -
the missed connections in the global south.
The more I do visualization work, the more I notice who’s missing, not just globally, but personally. It’s going to get a little personal here….
First, I think everyone here should know about Data No Borders (update: Data No Borders is now DataKind). DataKind is the creation of Jake Porway and Drew Conway; it hosts “data dives” that connect data scientists with nonprofits and NGOs to collaborate and solve data problems. I love DataKind, and was up at 6am streaming their presentation at the Strata Conference a couple of weeks ago. Later that day, these tweets came through, and I was excited to see this spontaneous meetup, so I clicked through to the picture, and saw……..
…….the table. And I think this spontaneous meetup is so awesome, but right away I notice who’s missing. I always wonder if other people notice.
I want to harness the power of whitespace in real life – I imagine diving into that picture and putting empty chairs around the table for all the people who are missing from the conversation. I want to create physical empty spaces to visualize the missing values. To force us to see who is missing.
So I won this WikiViz contest, and I’m so excited; I’ve been checking out the WikiSym page, and see that there are talks about the lack of female Wikipedia editors, and I come across these tweets….
…the joke of course is that there are no female authors on these papers. But all I can think is how happy I am that someone else is noticing who is missing.
I’m going to be honest with you: I don’t think Wikipedia needs more female editors to improve its coverage of Sex and the City. I think Wikipedia needs a diversity of editors because of Feodor Vassilyev:
I met Feodor Vassilyev almost 3 years ago, when the octomom insanity prompted me to look into who had given birth to the most children, and Wikipedia had my answer. Feodor’s first wife gave birth to 69 children: 16 pairs of twins, 7 sets of triplets, and 4 sets of quadruplets. So I think we can all agree that it is Mrs. Feodor Vassilyev, and her extroardinary capacity for birth, who is the true subject of this page, and Feodor’s accomplishment was really just being lucky (or unlucky) enough to be married to such a prolific woman.
But Mrs. Feodor Vassilyev has no page on Wikipedia. She gave birth to 69 children, but Feodor gets the title.
I love this example of bias in Wikipedia, and I’ve saved it for three years, because it is so obvious and so subtle. There’s no way to identify this with text analysis. A particular human mind is required to notice the bias, and give the title to Mrs. Feodor.
Now I don’t think this example is “capital-I” Important, but I think there must be examples like this all over Wikipedia, and it makes me happy to think that every Feodor has its editor – they just need to be connected.
The full quote from the title of my vis:
I want to say thank you to “11 guys” for noticing who was missing, and for taking the first steps to investigate not just who was missing, but why it might matter.
Thank you to anyone who walks away from here and continues to notice who is missing from events and conversations, and seeks out and invites a diversity of voices to the party.
This is both a global and a personal endeavor. It doesn’t require a grand gesture – just noticing when someone, who wanted to be in a conversation, walks away without saying anything, and taking the time to stop and say: I noticed you were there. What was it you wanted to say?
The generosity of these kinds of micro-actions invites individuals to a conversation and helps to develop a field. A micro-moment of connection can change a person’s life.
Very much related to that: Thank you to the judges for taking the time to really see this work. I know it demands a lot of the viewer.
Thank you to WikiSym for inviting me.
From the personal back to the global, just like there are women like me who want to be sitting at the Data No Borders table, there are people in those empty spaces of my visualization who want to be Wikipedia editors, who want to contribute, but don’t know it exists, or don’t see a way in.
In the panel just before this, we discussed openness vs. accessibility. Heather Ford said, “Openness is easy – you just put a license on something and say it’s open.” Accessibility is hard – someone has to take responsibility, and commit sustained effort.
So – the goal is: we meet back in 10 years and see the circle FILLED. No more missing values, no more missed connections, no more empty spaces.
And because of the amount of Wikipedia data being collected, we will be able to see, rather than speculate on, exactly how a diversity of voices has changed patterns of edits, the content, and the connections of Wikipedia.
We will all have a Wikipedia for everyone, that reflects the collaborative contributions of everyone.
Thank you for listening (reading!). I’d love to continue the conversation with anyone who’s interested.
*Low Views | High Access countries, from greatest to least internet access: South Korea, UAE, St. Lucia, Jamaica, Bahrain, Iran, Tuvalu, Seychelles, Turkey, Brazil, Belarus, Morocco, Dominican Republic, Saudi Arabia, US Virgin Islands, Palau, Tunisia, Guyana, China, Venezuela, Peru, Phillipines, Vietnam, Cook Islands, Thailand