[TOOLS] How to do some really cool data analysis on the cheap

image of network

Network graph from Gephi

Historian Michelle Moravec recently posted a fantastic article on the Chronicle of Higher Education’s ProfHacker blog where she dishes up some pretty fantastic advice for scholars looking for new digital tools to help them in their work. She recounts the lessons she learned while working on her Visualizing Schneemann project, a work that uses different digital network analysis methods to analyze the correspondence of feminist artist¬†Carolee Schneemann in order to get a better picture of her Schneemann’s social networks, influences, travels, etc. The end result is an amazing collection of graphs that really help viewers to better understand the artist.

You can (and should) read the article for yourself, but I wanted to highlight a few of the digital tools Moavec used in her research. Not only are these free, but they could be a huge help to anyone interested in data visualization, digital textual analysis, or even just folks who want to actually “see” their own social networks in a new way. Happy hacking!

Gephi: A free, open source, high-powered tool for working with network graphs. If you don’t know anything about network theory, don’t panic: there are a number of tutorials that walk you through the basics of creating your own graphs. It even has a built-in tool that allows you to analyze your email account so that you see a visual representation of who you correspond with and their relationships to each other.

Raw: Raw is a much more general-purpose data visualization tool than Gephi and a heck of a lot easier to use for novices. All you need to do is cut and paste your data into the Raw web site from a source like Excel or even a flat-file database, set a few parameters, and poof! you’ve got a really snazzy looking graph in vector format (meaning you can easily resize it without getting all bit-mappy) that’ll impress anyone who sees it next time you have to do a presentation or turn in a paper.

StanfordNER: This one’s a little bit more obscure and requires a bit more technical knowledge to set up, but if you’ve ever wished that you had a tool that was able to automatically identify and extract “entities” such as names, places, organization names, etc. from big chunks of text, the Stanford Named Entity Recognizer uses some pretty fancy natural language processing techniques to do just that.

TimeMapper: Probably the coolest tool on the list (though I have to admit that I kinda have a soft spot for timelines), TimeMapper allows you to create really cool map/timeline mashups for free. All you have to do is fill out a Google spreadsheet (provided) with information such as the name of your timeline entry, dates, description, and place and TimeMapper takes your info and creates a custom timeline tied to a custom Google map so that you can visualize your information not only in time (though the timeline) but in space as well (on the Google map). The Google spreadsheet at the heart of TimeMapper even automatically converts location names to latitude and longitude coordinates! You can even spiff up your timelines with images and links for additional information.

Graph showing polarized twitter topic network

Polarized Twitter network. Click the image to see more detail.

Edit: cool new study…and a bonus tool!¬†Not long after I posted this I happened to run across this article on mapping Twitter Topic Networks on the Pew Internet and American Life site. It’s a really fascinating piece that identifies 6 distinct patterns of behavior on Twitter: Polarized Crowds (see image to the right) where groups on either side of an issue talk to each other but interact little with their opponents; Tight Crowds of “highly interconnected people” with a few others from outside the main group; Brand Clusters where well known products, services, or celebrities form a hub that attracts large numbers of people from all over the Twittersphere; Community Clusters that form a number of smaller groups with their own individual characteristics; Broadcast Networks mainly consisting of people retweeting news from major news sources; and Support Networks that form around major brands that use Twitter for customer support. The article also includes a fantastic Method section that describes in detail how the researchers used the free NodeXL Excel Template to collect and visualize the data in their study. NodeXL can do some pretty cool stuff: check out the NodeXL Graph Gallery for tons of examples.

 

Facebooktwitterredditpinterestmailby feather

Leave a Reply