Wednesday, November 4, 2009

A space for public debate: A digital ethnography of a Wikipedia article

Robert W. Gehl

The last chapter of my dissertation on Web 2.0 examines Wikipedia. I've come to the conclusion that Wikipedia is exactly what our mediascape needs: a space for public debate and purposeful knowledge production. I came to this conclusion based on a close, ethnographic reading of an article in Wikipedia. I hope to test this conclusion by looking at other articles; however, it is a long, involved process to closely read one article, let alone many.

Kristin already laid out a basic description of the digital ethnographic method, so I won't go into too much detail, but here's a breakdown of what I'm doing with Wikipedia:
  • Article selection: I wanted to focus on a topic which is controversial to see if Wikipedia editors can produce a quality article despite the potential for dispute. I selected the "Hurricane Katrina" article. Katrina obviously exposed a lot of contentious issues in American culture and politics, particularly race, class, the role of the media, and the role of the state in neoliberalism. It's also a topic that is outside my normal studies, so I came at this issue fresh, so to speak.
  • Search methods: In order to create a history of this article, I have to look at its evolution. With at least 10,000 edits - there's no way to tell - this article's history is complex. I used Wikiblame to search for terms like "race" or "class" and tracked where those terms appear in the article's history. However, this is a blunt instrument; most of my examination came from simply going through the diffs of the article during specific times (such as all of September 2005). This is an exhausting process, but reveals "edit wars" over controversial aspects of the hurricane's social impact. I reiterate what I said in my last post: there has to be a better way to search Wikipedia history.
  • Thick description: looking at the article history is not enough. I also closely examined the associated "Talk" page for trends and themes. I also looked at user pages to see where their interests lie. I wanted to understand the debates that went on behind the scenes, and I wanted to understand the expectations and desires of the participating editors.
  • Non-participation: While Kristin noted that ethnography typically relies upon participant observation, I did not participate in editing the Katrina article, nor did I engage in discussion on the "Talk" page. I did not want to interfere with Wikipedia editors as they debate. I am, however, a participant in other articles, so I do know quite a bit about the cultural norms of Wikipedia editing, so I draw on that knowledge as I interpret the editors' actions. Moreover, I do not feel there is an ethical issue here (ie, consent) since this material is public. If you feel otherwise, let me know in the comments.
My findings are going to be elaborated in my dissertation, but I will give an excerpt here. Here's just one conversation that happened in the "Talk" page of the Katrina article. A little context - this conversation happened less than two weeks after Katrina hit land in Louisiana. This is of course about 9 days after Kayne West said "George Bush doesn't care about black people" on national television, and of course this was also a period of time when the news media made many reports of looting and rape, reports which were later found to be exaggerated or wrong.

On 12 September 2005, editor Barneygumble started a thread titled "Silence on the 'Looting and Mayhem.'" He argued that
The only mention of the looting in the whole article is in respect to [Kanye] West's comments. The mayhem has been completely hushed up. Widespread looting; when darkness came in the Superdome, there was [sic] robberies, rapes, and ever [sic] people were murdered. Rescuers were setup by gangs and then robbed. About 10% of the police deserted. Why is there zero mention of the mayhem in this article? All the accusations of "racism" are to cover the murder, rape that went on in the city. Unlike New York on 9/11 or their 2003 blackout, where New Yorkers helped each other, New Orleans decended [sic] into a hellhole. However, not a lick of mention on wikipedia [sic].
Barneygumble's complaints echoed the sentiments of many Americans at that time: what made this disaster so unlike 9/11? Why did New Orleans seemingly descend into chaos, while New York drew closer together? He implicitly points to race as a factor; at the time, there were many people arguing that the urban blacks in NO were prone to descend into violent chaos. Here, we can see that he argues that the reason that the violence was not being covered in the Katrina article was because of PC concerns about racism. Barneygumble, like all Americans, was relying on news media coverage, much of which was at the time sensationalizing the violence and looting that occurred in NO. He wanted to have a broader discussion than just on what Kanye West said.

He got it. The next day, RattBoy replied:
Search the article. The word, "rape," appears three times--in the Superdome and Convention Center sections. There's also the following: By September 8 there were reports that the claims of rape and murder at the Convention Center and the Superdome could be false. Wikipedia is a great resource for collecting verified info, and discarding rumors, regardless of how widely, or hysterically, they're reported.
And another reply, from Dystopos, on 13 September:
The general sense of various first-person accounts was that the "genral [sic] lawlessness" was greatly over-sensationalized by the mass media and that the "out of control" looting was limited to very few districts (chiefly Canal St and the uptown Wal-Mart) and did not last for very long. Sporadic violence persisted for a few days until the National Guard was deployed in force. Other stores were "looted" in a more orderly fashion, including pharmacies cleaned out by doctors and hospital staff with police escort. Order was generally maintained by NOPD in the CBD and French Quarter, though episodes of vandalism and attempted burglary were reported. There was no reporting on WIDE swaths of the city where rescuers went about their business and found mostly peace and gratitude. The actually verifiable events fall short of the general sense of violent anarchy portrayed in the media. It is neither factual nor NPOV to describe a city "disinteregrated (sic) into a 'third world country' ".
Of course, these editors are right. The reports of looting and violence that immediately followed the storm were later debunked - see Popular Mechanics' remarkable article on the subject for an exploration of the myths of Katrina. One major theme of the debate in the "Talk" page is a close examination of the proposed media sources and their reliability; this is a very valuable activity.

Arising from this debate - and many like it - is an article which deals with many tough questions: race, class, gender, the role of the state, and the role of the news media in reporting disasters. It's not perfect. Sometimes it is maddenly passive; sometimes it is wrong. However, all in all, the editors used the "Talk" page to come to consensus on what to include in the article, relying on verifiable sources and a neutral point of view. They were exceptionally quick in reverting information which was debunked as time went on. Unlike a newspaper, which runs corrections after the fact and often away from the front pages, Wikipedia is corrected quickly and obviously by its editors.

And, in comparison with another article, Encyclopedia Britannica's "Hurricane Katrina" article, Wikipedia's version deals with the social impact of the storm in much greater detail. Wikipedia is able to handle both the meteorological facts of the storm - windspeed, landfall, pressure - and the more inchoate social facts.

Ultimately, I argue that this is markedly different than the discourse one might get in blog comments or the comments forum of a newspaper, because these debates have a productive purpose: the editors must, in the end, create an encyclopedia article which is neutral. We often talk about the need for public debate, but we don't as often give people ownership of that debate, outside of occasional elections. In the case of Wikipedia, however, debate leads to media/content/knowledge production. This is extremely valuable, because then the debate leads to an action, rather than debate for debate's sake. It of course is no replacement for journalism or academic production, but it is a welcome adjunct.

I'm drafting this chapter now. Feedback is very welcome - leave it in the comments!

Thursday, October 8, 2009

A call for better Wikipedia search

Robert W. Gehl

My main point in this post: someone somewhere needs to make a better Wikipedia search engine, one capable of searching the vast material below the articles on the surface.

To prove this, I'm going to tell you about a research project I'm currently working on for my dissertation. (This is a bit involved but my main point will be clear in the end.) A consistent argument in analyses of the Web is that there is a balkanization of political discourse happening. Probably the most notable argument is Nicholas Negroponte's notion of the "daily me"; the ability of users to customize their online news consumption to the point that they purposely avoid news and opinion that do not affirm their views. For example, through Google News, I can expose myself only to left-leaning news and editorial content and completely avoid right-leaning material. Or, I can only read the blogs Daily Kos and Talking Points Memo. Meanwhile, my neighbor only reads the Washington Times and Michele Malkin's blog. And never the twain shall meet, unless its at one of the many protests and counterprotests which happen in DC.

I agree with this basic analysis; at least anecdotaly, it seems as if political discourse is extremely rancorous in the US. But this analysis begs the question: are there any places where Web users are in fact exposed to differing points of view, whether they like it or not? Where balkanization is mitigated? I've concluded that Wikipedia is that space.

At first, this seems odd. Wikipedia's editorial policy is based on three rules: neutral point of view, no original writing, and verifiability. That is, they want every article to be as objective as possible, to contain no original syntheses of secondary sources, and to be well-cited. This seems like an unlikely place for debate.

However, in order to produce articles, Wikipedia editors - who can be anyone in the world - must have a place to debate neutrality and verifiability. Every Wikipedia article has a corresponding "Talk" page. In those pages, editors debate the quality of sources and how to phrase critical issues in a neutral way.

For an article on an innocuous, bland American television show like The Hogan Family, the Talk page has little debate. However, consider an article on Hurricane Katrina. Katrina tore aside a veil in American culture, a veil over the importance of race and class and the role of the state in a neoliberalized society. In that article's Talk page, Wikipedia editors have debated sources and ideas which tackle the role of class and race in the disaster. They do so with exceptional logic and openness. While the argument can get heated, it is simply amazing how a bunch of "nobodies" were able to debate these controversies in a civil and in-depth manner.

I've been examining this debate, which is now four years in the growing and spans thousands of edits and possibly over 100,000 words on the talk page alone. I detail this debate in one of my dissertation chapters.

And this is where I need search help. Wikipedia's own search only returns current articles. If I wanted to find the word "race" in the Katrina article, I will get the current version only and none of the 10,000+ previous editions. If I want to find out which editor wrote the word "race" in the article... I can't.

There are some tools out there. Wikipedia has an article history search called Wikiblame, but it is quite limited. I have to know the date I want to search, and if I know that, there's not much need to search, is there? Wikitrust is a brilliant tool, developed by PARC. However, it doesn't seem to work on semi-protected Wikipedia pages; "Hurricane Katrina" is one such page. History Flow, from IBM, looks great, but isn't available to the public.

And none of them search the "Talk" page, where all the debate happens.

In the case of the "Hurricane Katrina" article, there are 8 archived "Talk" pages. That's a hell of a lot of material. And it can't be searched.

There has got to be a better way! If you know of one, PLEASE LET ME KNOW.

Monday, October 5, 2009

MUVEs and Pedagogy - Pt 2

Kristin Scott

In my last blog post, I wrote about the challenges of using multi-user virtual environments, such as Second Life, in the classroom. Today, I want to focus on some of the potential pedagogical benefits and offer a few of my own successful moments.

As I stated in my last post, using Second Life (and other similar virtual environment tools) can be particularly challenging, if for no other reason than the practical issues involved. However, if a teacher has the time to develop the students’ virtual skills, as I recently did in a course I taught last spring for New Century College at George Mason University, entitled Cybercultures, then the use of MUVEs can be beneficial. Since my Cybercultures course was a six credit hour hybrid course, which met for three hours a week in a physical (computer) classroom and three hours a week in the virtual environment, my students had more time to develop the necessary navigational skills than most other courses would allow. Additionally, for those students whose personal computers were not adequate to run the software, I made available a computer lab with Second Life downloaded onto the school computers. I also recruited a number of other educators (from GMU and elsewhere) to act as mentors, should students need help in SL when I was not signed in; though I also had frequent avatar-to-avatar meetings in-world to facilitate the process. Nevertheless, even in this ideal environment, my assignments for weeks one, two, and three necessarily focused on basic virtual skills, so it wasn’t until week four that my students were able to move onto the digital ethnographic assignments I had created and thus concentrate more on the content of the course rather than practical skills.

Of course, because my class investigated specific themes such as the social and political movements that take place within cyberspace, political economy, the formation of virtual communities, and cyber-identities and bodies, taking them into Second Life to actually experience what we were studying made sense. Though doing digital ethnography was particularly challenging. Because our adventures were being recorded on a public wiki, I also had to introduce students to a much abbreviated version of the Protection of Human Subjects Procedures and have them each create an HSP handout (in the form of a virtual notecard) to give to avatars that they thought they would include in surveys or ethnographic description/work.

The assignments produced some really interesting results, though; and students seemed engaged in the discussions that integrated their virtual experiences with the readings. In order to explore the political economy of cyberspace (and Second Life more specifically), for instance, I had them each attempt to earn or garner Lindens (the official currency of SL) in-world. And in order to enrich our discussions of cyber-identities, gendered behavior in cyberspace, and virtual bodies, in weeks 5 & 6, I had students change their originally created avatars’ sex/gender and/or race. Since students, without exception, all had first created their avatars to closely resemble their real selves and worked with these avatars for several weeks prior to this assignment, they had become somewhat invested in their alter-virtual selves, so switching sex/gender and/or race for a period of two weeks led to some fascinating insights – but not just for the students.


Typically, in past classes, whenever discussions of sex/gender arose, for example, I would find myself primarily focused on the social, political, and economic constructions of sex and gender, but the conversations usually revolved around what it meant to be female or feminine. Since ~95% of my students in my Cybercultures course were, in real life, female, switching sex and/or gender meant becoming male or acquiring virtual masculine traits. And for the first time since I’ve been teaching, I found our discussions circulating primarily around the difficulties of being male, since most of my female-to-virtual-male participants had reported feeling ignored, hyper-scrutinized, or were always suspected of “hitting on” female avatars whenever initiating conversations. The majority of the women in my class thus came out of the experiment feeling particularly aware and sympathetic of male/masculine stereotypes and roles in society. At the bottom of the assignment page are links to individual responses to this virtual assignment. I also participated, along with the students, in the sex/gender change (see photo above) - so the assignment became much more collaborative, as well.

So for this particular class, I found the virtual environment to be a particularly stimulating and useful addendum to learning; but for most, the benefits are (and I think will continue to be) debatable. How valuable are the learning experiences? And are they worth the intensity of time and energy spent?

I do think one of the strongest potentials, though not utilized nearly enough, is that of foreign language education, especially since voice came to Second Life. Though one of the pitfalls, of course, of a virtual foreign language learning environment would be the same problem often experienced in one’s native language when communicating online, which is the proliferation of text-speak.

Though I have some serious questions about the overall benefits of virtual education, I have not lost hope for greater potential. MUVE’s can certainly become a rather helpful and engaging supplement to learning, but I don’t believe they are able to significantly replace one-on-one class time. I will be interested, however, in seeing how virtual educators will feel after several years of teaching in these environments. I suspect that some will remain as fascinated by its potential as when they first entered the virtual education domain, while others will likely begin to more rigorously question its pedagogical effectiveness. In any case, I hope those reading will share their own experiences, questions, or concerns with the use of MUVEs in education.

Friday, October 2, 2009

Multi-User Virtual Environments and Pedagogy

Kristin Scott

Since I’m currently working on a talk that I will be giving about my use of Second Life in the classroom for the upcoming "Community of Distance Educators Academy,” a day-long “knowledge sharing” event between instructional designers, educational technologists and those within other related disciplines from George Mason University, University of Mary Washington and Northern Virginia Community College, I thought I’d share some of my thoughts here first.

I have become a strong proponent of the use of technology in education because I think it can be an invaluable pedagogical tool. From YouTube videos and blogs to multi-user virtual environments (MUVEs), wikis and digital portfolios, these tools can help address diverse learning styles, interactively engage students in the learning process, significantly expand the learning environment beyond the four walls in which we often sit, and better prepare them for the use of digital tools in their future professions. On my website, I include a variety of samples, including the use of such tools for courses, generally, and individual student projects, specifically.

I’ve now worked with Second Life as a pedagogical tool for about 3-4 years. I’ve also been to a variety of educational events in-world, have seen some simulating literary/historical virtual environments, and have engaged quite a bit with other academics in the field. And yet, I’m still rather divided about its potential in education. Because I have a lot to say about the use of this particular tool in the classroom, though, I’m going to post my thoughts in two separate blog posts. Today, I want to focus on the challenges this tool presents, and my next blog post will focus on the potential benefits.

Theoretically, at least, Second Life provides a virtual environment in which students can come together (whether from the same class or from all over the world) and collaboratively deepen or expand their learning experiences. Because Second Life is a 3-D environment built entirely by its users (virtually present in the form of avatars), the idea is that students are able to interact outside of the physical classroom in a way that enhances or conveys a sense of “presence” that is lacking in other digital environments. Because virtual environments are “interactive,” they allow students to not just see the 18th century nuances portrayed in vivid colors on a film screen, for instance, but allow them to (virtually) step inside the 18th century and interact with the 3-D environment and each other. I know that one can be a part (rather than just an attendee) of a performance of an in-world play. And I have teleported myself to the virtual Camp Darfur in Second Life and have had the “chance to walk through a ghosttown [sic] of a refugee camp, victimized by years of torture and genocide.” I’ve experience a number of these types of educational environments, and I do, theoretically, see the potential it offers for additional learning.

Theoretically, a teacher of practically any subject could utilize this tool in a variety of ways, a few examples of which follow:
  • Architectural students could use the virtual platform to visually imagine and build their designs.
  • History and literature students can visit and interact within a simulated version of an old London community (for instance) and virtually experience all they read about.
  • Film students can demo their reels in a virtual theater and get early reactions (or even produce Machinima films).
  • Advertising students could learn about virtual world marketing and advertising.
  • Foreign language students can practice typing and speaking with people from other countries. 
The list of possible pedagogical uses seems only as limited as one’s imagination. And those who teach in distance learning programs might find multi-user virtual learning environments particularly useful as a more stimulating environment than bulletin boards, chat rooms, or discussion forums.

However, although this learning environment is a rich one, there are, I believe, multiple practical challenges that must first be addressed (or at least understood). The use of Second Life, in most instances, requires an inordinate amount of preparatory time. The majority of students have neither heard of Second Life nor engaged in this or similar virtual environments, so the amount of time an instructor must spend to even set the virtual stage for learning can be exorbitant. In order for the tool to be an effective one, the teacher must first spend several weeks helping students create an avatar, find clothes and other free items, navigate around, teleport, conduct searches, and learn the virtual lexicon, customs, and culture. And the learning curve is high, for both teacher and student.

Additionally, virtual learning environments can be riddled with a bevy of other problems — many students simply don’t have computers powerful enough to run the program without crashing; graphics can sometimes take too long to load and/or SIMS get sluggish; virtual environments often experience lag and various interruptions; and griefing from other avatars can also occur. Additionally, many teachers complain about the lack of technical support from their IT departments for the downloading and maintenance of MUVEs on school computers. And then there are issues around virtual space; in order to conduct classes in-world or at least have meeting points, a virtual property must be acquired, and this can cost quite a bit of real money. For those teachers operating without institutional-funded virtual campuses (and there are many, myself included), finding a virtual location for student avatars to meet can be personally costly.

And then there are the questions of pedagogical value (once all these practical hurdles are overcome). Why go into a virtual post-Katrina New Orleans, one might argue, if you can pull together a bus load of students to go down there and see for themselves; better yet, to go assist those who need help and have a hands-on learning experience that they will never in their lives forget? On the other hand, a lot of people can’t travel to another destination or into the past, so virtual environments give students a different type of access - and often experiences they would otherwise not have at all. And for these reasons, I certainly see the potential for virtual environments as an addendum to learning, but not yet a major tool of learning. And I will admit that I hope to never see it as a replacement for in-class learning altogether.

I don’t think, however, that it would be particularly productive to discuss (at this point) the larger debates over pedagogical value, because the bottom line, in my estimation, is that the value derived from MUVEs is as variable as the teachers, students, educational goals, institutions, funding, practical tools available, and computer/digital access. The best I can offer are my own experiences, for better and worse, and hopefully provide some critical points to consider.

In my next blog post, I will discuss, in more detail, some of the potential benefits of multi-user virtual environments and share a few of my own more successful moments. In the meantime, Mark Pepper, a PhD student in Rhetoric and Composition from Purdue University, has created an annotated bibliography of educational online resources, which highlights a number of articles related to the use of Second Life in teaching that I think is worth perusing. And Laura Nicosia, professor of English and director of English Education at Montclair State University, has created a slideshow called “Second Life as a Pedagogical Tool” that provides a more concentrated vision of the potential benefits.

Wednesday, September 23, 2009

Digital Researcher Profile: Mark Leccese

Robert W. Gehl
Editor's note: This is the second of a series of profiles and interviews we at Digital Research Blog are conducting with people who are doing research on digital technologies and cultures. For this profile, we've interviewed Mark Lecesse. This profile comes from an email conversation held between Mark and Robert.

Mark Leccese is an assistant professor in the Department of Journalism at Emerson College. He has worked as a correspondent for The Boston Globe, political reporter for the Fitchburg Sentinel, editor-in-chief of The Tab Newspapers, State House bureau chief for Community Newspaper Company, and was the founding editor of Beacon Hill: The Newspaper of Massachusetts Government. His freelance reporting and writing has appeared in publications including The Columbia Journalism Review, The Quill, The Boston Phoenix, Commonwealth, and State Legislatures, and he has been a contributing editor at Boston Magazine. Mr. Leccese has taught journalism and writing at Boston University, Northeastern University, and Boston College. His primary areas of interest are public affairs journalism and online journalism, and his research is focused on political blogs.

RWG: I had the pleasure of hearing you give a talk at Media in Transition 6. You discussed empirical research into blogs and political journalism, and you found that "Political blogs are like a newspaper comprised of only op-ed pages and opinion columnists." That is, that the top political blogs – like Daily Kos or Michele Malkin's blog – hardly do any original reporting or primary research; instead, they focus on opinion and often write about mainstream news stories. You argue that this undermines the overall perception that bloggers are replacing (let alone besting) traditional news media. Has your research effected how you teach journalism at Emerson?

Mark Leccese: Oh, yeah, absolutely. I primarily teach skills courses -- I worked as a newspaper report and editor (primarily covering politics) and a magazine writer for 28 years. I taught here and there as an adjunct, after work, for 15 years before I got hired in Sept. '07 on the tenure track at Emerson - and was expected to produce research.

Friday, September 18, 2009

Zotero 2.0b7 Review

Robert W. Gehl

Readers of this blog know that I am a big fan of Zotero, a Firefox bibliography plug-in. I've enjoyed using it so much I've even switched to the beta 2.0 version and I've left the occasional bug report and feature request at Zotero's forums.

Using a beta means dealing with problems. 2.0b6, the last version, had its share. Zotero became extremely slow in that version. Searches caused my Linux box to go gray, adding tags took seconds (which is too long for tagging - a process that should be instantaneous), and adding new citations was also slow and tedious. In comparison, Endnote X is much faster. Still, despite these flaws, Zotero and the Open Office plug-in saved me a lot of time as I drafted a chunk of my dissertation on Web 2.0.

To explain how hyped I am about 2.0b 7, let me give you an analogy: imagine you're a sprinter training in sand. Running in sand is hard. It slows you down. Then, you shift to the track, and being on solid ground makes you feel like you're flying.

The latest version, 2.0b7, makes 6 feel like sand. 7 is lightning-quick. Adding tags, adding citations, doing search - all much, much faster.

Plus, Zotero 2.0 now has a special URL: zotero://fullscreen. This allows Zotero to look more like a stand-alone program, rather than something on the edges of Firefox. It seems to me that they're beta-testing breaking Zotero out of Firefox, which would be a very welcomed change.

Add this to the features already in 2.0: syncing libraries across multiple computers and groups. The sync feature I have yet to take full advantage of, partly because it's a slow process. But I have been using it more and more due to 7's speed. The groups feature, on the other hand, is a great alternative to online citation systems such as CiteULike. I have a Web 2.0 group with 20 or so members, and members are starting to contribute citations. It's a great way to collaborate with people interested in the same field, and I imagine teachers can use this with their students to share research. I do wish, however, that Zotero Groups could add features such as a better messaging system; since I'm the admin of the Web 2.0 group, I occasionally want to send all members an email, but there's no "email group" option. Plus, I'd love to see a comments field for citations in the group setting, allowing members to publicly discuss books or articles.

Getting 7 to work with Open Office in Linux (Ubuntu) took a bit of work - I had to switch from the native Linux IcedTea Java substitute in Firefox to full-on Java. Open source purists might have an issue with this. However, Sun produces Java and OO, perhaps explaining why the two need to work together. And, after I got them working, the system was much faster, allowing me to continue to work my research into my writing without having to worry about citation formatting.

As I head into the home stretch with my dissertation, thanks to this latest version of Zotero, I'm sprinting.

Sunday, September 13, 2009

Digital Researcher Profile: David Houghton

Robert W. Gehl
Editor's note: This is the first of a series of profiles and interviews we at Digital Research Blog are conducting with people who are doing research on digital technologies and cultures. For this first profile, we've interviewed David Houghton.

David Houghton finished his Psychology (Hons) BSc in 2008 at the University of Westminster, London, and in October of 2008 he started work as a PhD student at the University of Bath in the Information, Decisions, and Operations program, working on the project PVNets, a three-year online privacy research project. At Bath, he works with his colleague Dr. Asimina Vasalou and his supervisor, and Dr. Adam Joinson.

Currently, he is researching Twitter. In this interview, he talks about the attraction of Twitter as a research object, and some of the challenges he's faced examining it. Part of his research involves surveys; if you are a Twitter user, help David out by taking a few minutes to fill out his survey at https://www.survey.bath.ac.uk/closedtweets.

What drew you to do research on Twitter?
Twitter presents a useful API for research into information disclosure. Not only is it part of the [social networking] buzz that's attracted users from all over, it relies almost solely on text based communication. The basic API, which can be argued as detrimental to the privacy of Twitter users, presented the ideal medium to collect freely available data yielded by them. With [social network] users sharing so much information, even the 140 character 'tweets' are data rich.

And how are you approaching this object?
In terms of methodology, there was a bit of invention, but this resulted in ignoring the API as a tool but manually searching for tweets using *a* in Twitter's search tool, then cutting and pasting the data. The logical next step, to gather data on closed accounts was to conduct an online survey giving participants informed consent so that such data, and open account data, were gained ethically. As data are freely available on Twitter, does it mean I should use it without permission? (and other such cans of worms!)

Key to my research is the use of the Twitter API in www.secrettweet.com. My ultimate research goal, currently, is to understand and maximise privacy for multiple stakeholders across a network, which first requires identifying what's meant by sensitive. By comparing tweets from Twitter and SecretTweet, it's hoped that clear linguistic markers become apparent to enable this recognition to occur.

So it's a comparison between the public Twitter and the purposely private SecretTweet. How do you go about comparing them?
The research has been split into two studies. We found significant differences between tweets and secret tweets, although, tentatively for the time being! The Linguistic Inquiry and Word Count software was used to run the linguistic analysis and the results put into SPSS. There were 13 linguistic markers that distinguished between secret tweets and normal tweets with an overall prediction accuracy of 91.6%, which is rather suggestive in itself.
The word categories were:
  • Word Count (secret tweets contained more words),
  • Personal Pronouns (secret tweets containing more personal pronouns),
  • She/He (secret tweets containing more she/he words),
  • Articles (normal tweets containing more),
  • Past Tense (secret tweets containing more),
  • Swear words (normal tweets containing more),
  • Family Words, Human Words, Sexual Words (secret tweets containing more of these),
  • Question Marks, Exclamation Marks and All Punctuation (Normal tweets containing more of these).

What's the payoff? What are you hoping to find?

These results may be useful in determining which chunks of text, or communication between users may be potentially sensitive, if they contain the markers observed for secret tweets. It may be useful to site owners to determine which communications should have a privacy setting to allow them to be observed only by the receiver or open to the public, as Facebook has recently adopted. Although it may be more useful to inform users of social network sites in a bid to protect their 'friends' privacy, or to realise what is potentially harmful to post about themselves or others. The main point being: it is not just credit card details, addresses, date of birth and the usual categories that users should be cautious of divulging but perhaps anything with the above markers. That is not to say that tweets or expressions with higher word counts or She/He words are sensitive, but that it should be taken into account if there are a combination of multiple of the above markers.

How does the survey at https://www.survey.bath.ac.uk/closedtweets fit in with this?
The survey is part of study 2 that aims to determine if there are differences (based on the 13 linguistic markers) in tweets depending on audience size and account type. Although analysis is still under-way for this, preliminary data shows that perhaps because twitter is open to anybody on the Internet, the content of the tweets may be mundane, so by providing better privacy controls, the content will become richer and less benign. So the current survey is to clarify if closed Twitter account users tweet less benign information than open account users and compare this against audience size (ie, 'followers').