Client Projects

The Virtual Goods Trade in World of Warcraft

Client Name: Isaac Knowles

Project Description (goal/scientific or practical value):
The goal of this project is three-fold. The first goal is to understand the relationship between real-life geography and the trade of virtual goods in a large virtual economy. The second is to use this knowledge to search for interesting relationships between flows of trade in the virtual economy and flows in the real economy. The third is to understand the virtual economy as a robust center of economy activity in its own right.

Each of these goals has both scientific and practical value. Economists have long believed that there could be important relationships between the real and virtual economy, which are hidden within the millions of virtual economic transactions that take place each day. Furthermore, it remains a major goal of economic and other social scientific research to use virtual worlds and virtual economies as test-beds for theories and policies. Analysis of this data constitutes an important building block in this larger research agenda.

The results of this work will also be of value to the game industry, for which the virtual economy can often represent the sole source of revenue (through a form of taxation).

Information on dataset(s) to be used:
World of Warcraft (WoW) is the world's most-populated, pay-to-play massively multiplayer game. Of its many features, one of the most interesting is the robust in-game economy. Trades are effectuated through an auction house similar to what is found on websites like eBay and alibaba. Blizzard, the publisher of World of Warcraft, exposes and updates the listings on this auction house. The data primarily consist of the "supply schedule" for every auction house in every server based in North and South America, Australia, and Europe. Secondary data sets provide information about the servers themselves, as well as extended information on the items that are listed for sale.

Web-link to dataset(s):

World of Warcraft Auction data sets

Relevant publications, websites, etc.:
Documentation for the WoW community API
Summary Article of the WoW auction houses
Article with robust discussion of a similar virtual economy

Publication Notes:

The client will be actively involved with the project, and available for any questions. If any work resulting from this partnership is suitable for publication, it will be pursued as such. Students who were/remain actively involved with the project will be included as co-authors.

 

 

AIDS as a Global Media Event

Client Name: Vladimir Cajkovac

Project Description (goal/scientific or practical value):
AIDS has radically transformed the world and become the focus of interdisciplinary study and research from a medical, cultural, and media-historical perspective. Over the past 30 years, the German Hygiene Museum in Dresden has collected numerous items –predominantly posters– which have been used in the media campaign to combat the epidemic. It is the world’s largest collection of AIDS posters with over 9,000 specimens from 147 countries.

The goal of the project is to visualize the distribution of symbols, gestures, and topics addressed in the posters through space and time so that other researchers and members of the public can understand the development of the cultural response to the AIDs epidemic.

Web-Link to dataset(s):
https://iu.box.com/s/quhkme0e7y31x8fpdv1l8it0f6xmkm23

Information on dataset(s) to be used:
In the course of the fellowship project "AIDS as a global media event" 2715 posters from the German Hygiene Museum’s collection were classified and codified using the ICONCLASS system. Each record captures the date, language, ICONCLASS classification number, keywords, and geographic data for a given media object.

ICONCLASS is an iconographic classification system which assigns codes (combinations of numbers and letters) to common subjects in Western art. ICONCLASS is hierarchical classification that uses alphanumeric codes divided among 10 major classes.

Relevant Publications, websites, etc.:
German Hygiene Museum
Kulturstiftung Des Bundes- AIDS as a Global Media Event
ICONCLASS Classification Scheme

Publication Notes:
When you re-use the dataset please do give credit by citing the dataset (German Hygiene Museum. (2015) Posters from the German Hygiene Museum Collection codified using the ICONCLASS system. Retreived from https://iu.box.com/s/quhkme0e7y31x8fpdv1l8it0f6xmkm23and the original project related to it.

 

 

Field Museum Collections Database Visualization

Client Name: Field Museum of Natural History Technology Department (FMNH)

Project Description (goal/scientific or practical value):

The Field Museum's collections database houses nearly 3 million specimen records from a collection of 25 million specimens and artifacts. Our project asks students to develop an interactive visualization of the Field Museum’s catalog and auditing practices. We’d like to be able to show what kinds of records are in the collections database, and how different departments use the database fields and catalog collection objects—and eventually, how different institutions—vary in their patterns of database activity

If there is a programmer in the group, developing a Drupal-compatible interactive visualization to display online would be of interest. The museum collections community would also benefit from students developing and documenting an open and repeatable workflow that can be shared among institutions. Students working on this project have the opportunity to visualize a portion of this data, and their visualization workflows will have utility at natural history and cultural institutions worldwide. For further information watch this video:

Information on dataset(s) to be used:

Students will work with a test dataset related to the Field Museum's China Hall renovation project (currently in progress) to identify “database user traffic” and help answer questions like: which records are most frequently edited, by whom/which department, and in what way? Can students measure activity bursts, clusters of more closely-related records, or other patterns?

The China Hall test dataset includes:

These data were exported from FMNH’s ElectronicMuseum (EMu) catalog in September 2014. The dataset also includes information about Field Museum objects and users, allowing for a variety of temporal, spatial, or network visualizations if students are interested in following up with other questions.

Web-Link to dataset(s):

Field Museum China Hall Collection Records Dataset

Relevant Publications, websites, etc.:

Cyrus Tang Hall of China
KE Software EMu CMS

Publication Notes:

Before publicly sharing any visualization or the data itself, drafts need to be shared with Field Museum collections staff for publication approval. After approval, students would be free to publish results with Field Museum listed as a co-author, and add project results to their resumes.

 

 

C. Elegans Development Visualization

Client Name: WormGUIDES

Project Description (goal/scientific or practical value):
Microscopy allows the position of each cell over time to be tracked in developing embryos. In C. elegans this has enabled the automation of single cell level assessment of developmental phenotypes. Interpreting this mass of data remains an open challenge.

The WormGUIDES project is one attempt to visualize such data using a phone app that allows navigation and customized visualizations of this data. This project provides the typical embryo data underlying the app as well as a second embryo showing a relatively subtle fatal developmental phenotype. Students should explore alternative navigation/visualization styles for the individual embryos as well as the problem of difference visualization, summarizing and allowing intuitive navigation of the divergence between normal and perturbed development. This can be implemented de-novo or within the open source code for the existing android/ios aps, see http://www.wormguides.org/open-source-software.

Information on dataset(s) to be used:
The dataset includes a record of the location and name of every cell in a developing C. elegans embryo over several hours of development. Also included is a record of cell positions in a second treated embryo where the functioning of gene had-1 is knocked out via RNAi. Links are also included to the C. elegans parts list which provides the final fate of each terminal cell in the embryo and to Wormbase a public database of C. elegans knowledge.

Web-Link to dataset(s):
https://sites.google.com/site/wormguides/dispim/dispim-downloads/Embryo_data_MOOC.zip?attredirects=0&d=1

Relevant Publications, websites, etc.:
WormATLAS Database of C. Elegans anatomy
WormBASE - Aggregates C. Elegans worm information
WormGUIDES Project Website

Publication Notes:
We would like to approve public communications which should reference the WormGUIDES project.

 

 

Visualizing the Strategic National Arts Alumni Project

Client Name: Sally Gaskill

Project Description (goal/scientific or practical value):

SNAAP is a national arts research project based at the IU School of Education's Center for Postsecondary Research. Over the last three years, we have collected data from 100,000 arts graduates from 250 institutions in the US and Canada. We ask questions about their educational experiences and subsequent careers, both in the arts and other occupations. We provide confidential reports to each participating institution so that they can use their data for institutional improvement. Our reports to date have included only basic figures and tables; we are interested to develop new data visualizations that our institutions will be able to cut and paste for their own use.

The goal of this project is to provide colleges and universities with information visualizations that help enable academic administrators to use data to better understand their arts alumni and to improve their institutions. For further information watch this video:

Information on dataset(s) to be used:
The data provided to participants comes from the SNAAP survey of arts graduates and institutions. The data has been sanitized to remove any information that could identify a participant in the survey. The dataset comes natively in a SPSS statistical survey format, but will be converted to CSV data files for the project.

Web-link to dataset(s):
SNAAP13 Sample Dataset
SNAAP13 Sample Dataset Variables

Relevant publications, websites, etc.:
SNAAP Website
SNAAPShot Report on Initial Findings from SNAAP Survey

Publication Notes:
Students are welcome to publish results; we would only ask to be informed of publications.

 

 

Interactive Ecosystem Explorer

Client Name: Jorrit Poelen

Project Description (goal/scientific or practical value)
:
Global Biotic Interactions (GloBI, http://globalbioticinteractions.org) is one of the largest (if not largest!) openly accessible linked (as in http://linkeddata.org) dataset that describes how, when, and where organisms interact. Various projects (e.g. http://gomexsi.tamucc.edu; http://eol.org currently use the data to their benefit.

Now that GloBI provides access to information specialists (like you!), the challenge is to build an educational tool that allows middle school and high school students to interactively explore the food web of Tuna (or any other species) in an engaging way. The practical value of this educational tool would be to help students understand complex food webs by way of a responsive, media-rich educational app that can be used in classrooms around the world.

Last year (IV MOOC class of 2014), Sergey Slyusarev and others created an innovative visualization of the entire GloBI dataset (http://tinyurl.com/pu24r2w). This year, I am hoping to work with you to create an engaging experience for students to explore the food webs in and outside of the classroom. For further information watch this video:

Information on dataset(s) to be used:
The openly accessible GloBI dataset describes how taxa (organisms or groups of organisms) interact with each other around the globe. At time of writing (Nov 2014) it consists of about 780k recorded interactions between over 40k unique taxa sources from 23 different ecological datasets. New data is being added (almost) daily. The dataset can be accessed in various ways:

Web-link to dataset(s):
GloBI Biological Species Interactions Datasets

Relevant publications, websites, etc.:
Global Biotic Interactions Project
Global Biotic Interactions Project Blog
GloBI Github Repository
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. http://dx.doi.org/10.1016/j.ecoinf.2014.08.005

Publication Notes:
GloBI Data is licensed under http://creativecommons.org/licenses/by-nc/4.0/

 

 

Visualizing Open Access to MIT's Scholarship - A Global Perspective

Client Name: Sean Thomas

Project Description (goal/scientific or practical value):

In March of 2009, the faculty of the Massachusetts Institute of Technology (MIT) voted to make their scholarly articles available to the public for free and open access on the web in order to make the results of its research and scholarship as widely disseminated as possible. Since then, the MIT Libraries have been collecting and sharing this research openly through its established institutional repository, DSpace@MIT. While the impact of the MIT Open Access Articles collection has been significant, with nearly 3 million end-user downloads of the articles since the collection was established, only recently has more data been available to assess this impact at a more granular level.

The goal of this project is to provide a deeper analysis of the impact of the MIT Open Access Articles program to MIT's research from a global perspective, rather than an MIT-centric one, and to provide visually informative ways to convey new findings of interest. Such information could prove helpful in determining topics of comparatively higher or lower interest to particular countries or regions, could be used to engage MIT researchers more deeply about the value of contributing their scholarship openly, and could be useful in providing visual evidence to peer institutions about the benefits of supporting open access to scholarship.

Information on dataset(s) to be used:

In October of 2014, the MIT Libraries launched the MIT Open Access Articles Statistics service which allows the public to view aggregated download data by MIT departments, labs, or centers and provides access to geographical data regarding the downloading countries of origin.

DSpace@MIT (http://dspace.mit.edu) contains bibliographic metadata and links to full-text for MIT Open Access Articles collection (http://dspace.mit.edu/handle/1721.1/49433). Each article contains metadata such as Title, Author(s), Department(s), Abstract, Journal, Publisher, Publication Date, Citable URI (for open access version), DOI (for final published version). A full dataset will be provided and/or methodologies for extracting data of interest through APIs and common protocols.

MIT Open Access Article Statistics (http://oastats.mit.edu) contains summary information regarding end-user downloads of articles from the MIT Open Access Articles collection in DSpace@MIT. Data is available for the collection in its entirety and sub-organized by individual departments, labs and centers where data can be recombined. Table, timeline, and map views are available in the UI and delimited 'raw' data is available for download.

Web-link to dataset(s):

MIT Open Access Statistics
DSpace MIT Dataset forthcoming

Relevant publications, websites, etc.:
DSPACE at MIT
MIT Open Access Articles
MIT Open Access Statistics
MIT Open Access Policy

Publication Notes:
There are no restrictions to publication of work created for this project.

 

 

Globalization of the United States, 1789-1861

Client Name: Konstantin Dierks

Project Description (goal/scientific or practical value):

This project seeks to use georectified historical maps from the early 19th century as basemaps for the presentation of data about American diplomatic, military, and other activities in the wider world between the American Revolution and the American Civil War.

Historical data is often presented on modern basemaps, but this is inappropriate for understanding how people in the past grappled with and forayed into their own world, as they conceived it. The key here is to lay historical data onto historical maps, interactively with a time slider enabling viewers to trace change over time across multiple data variables.

Being able to present historical data via historical maps would represent a great leap forward with innumerable applications for research historians.

Information on dataset(s) to be used:

This dataset provides historical data about American diplomatic and military relations and activities with the world between the American Revolution and the American Civil War. There are four variables, including the type of foreign policy action taken by the American government, temporal event data, historical and present data geolocations, and some of the latitude-longitudinal work has been done with methodological consistency.

Web-link to dataset(s):
http://globalization1789-1861.indiana.edu/symposium/metadata-for-MOOC-Dec-2014.xlsx

Relevant publications, websites, etc.:
VisualizingNYC
Historical Atlas of the United States
Slave Revolt in Jamaica Mapping Project
Mapping Pacific Voyages

Publication Notes:
Since this project is still in development, and it is an idea with potentially wide-ranging applications, and I am preparing an NEH grant application for fall 2015, publication of results can come only after consultation with me.

 

 

Visualizing Commuting and Knowledge Flows between U.S. Counties

Client Name: Indiana Business Research Center (IBRC)

Project Description (goal/scientific or practical value):

IBRC is in the process of updating its U.S.-wide innovation index that helps practitioners and academics understand every region’s strengths and weaknesses.

In this project, students will analyze and visualize the relationship between commuting flows and knowledge flows across the U.S. using three different datasets: (1) patent applications, (2) university research and development spending, and (3) the total number of county to county commuters. Specifically, we are interested to understand how flows of knowledge—measured by R&D spending and patent applications—from an urban or university center might spread to neighboring regions while workers of the same regions commute to these centers. Visualizing these flows will enable IBRC to compare U.S. regions and help improve current IBRC reports. The ultimate goal of the project is to understand which regions have greater labor and knowledge flows and if this leads to subsequent innovation.

This project will be useful to students interested in economic dynamics and regional innovation and will help develop research skills applicable to businesses and governmental organizations. Students will apply a new operationalization of R&D spillover and patent technology diffusion, look at the statistical relationships between commuting and knowledge production, and come up with a new way to map these flows at different geographical levels. For further information watch this video:

Information on dataset(s) to be used:
The IBRC will provide the students the relevant data in a special tabulation/data-pull available on a non-public link on the IBRC website. The three datasets are drawn from these sources:

  1. Patent applications per county 2006-2010 from UPSTO (http://www.uspto.gov/products/catalog/patent_grants.jsp#heading-1) Lookup of inventor county and associated patent; first inventor’s address is used as a proxy for the county of the patent.
  2. Research and development spending from NSF 2006-2010. To access this data, visit WebCASPAR site, select “NSF Survey of Research and Development Expenditures at Universities and Colleges/Higher Education Research and Development Survey” then select “Total R&D Expenditures in All Fields”. Then, click on the “Modify classification variables” tab and select the years 2006 through 2010 for years. Then, select “Academic Institution, Campus level (survey specific)” at the bottom and click on state and zip code. This dataset is also available directly from us with the zip codes linked to FIPS county values.
  3. Commuting between counties 2006-2010: American Community Survey (Census data)

Web-link to dataset(s):
U.S. Patent Application Bibliographic Datasets
NSF WebCASPAR database
U.S. Census County Level FIPS Codes Spreadshee
IBRC County Flow Dataset

Relevant publications, websites, etc.:
Our first version of the innovation index and the reports relating to it
StatsAmerica Innovation Maps
StatsAmerica Innovation Report on Cross Regional Frontiers

StatsAmerica Innovation Report Unlocking Rural Competitiveness
In context: Annual Commuting Trends for Indiana
Stats Indiana Commuting Patterns Map

Publication Notes:
IBRC clients request co-authorship on publications, and approval of results by client before publication and communication during the life of the project.

 

 

Map the Academic Landscape of “Numerical Cognition”

Client Name: David Braithwaite

Project Description (goal/scientific or practical value):

I recently started to conduct research in a new topic area within cognitive psychology (specifically, numerical cognition). In order to guide and focus my reading of the very extensive background literature, it would be very useful to have a "map" displaying the main bodies of work within the topic and the connections among them, along with papers/authors that are central to each body of work. Such a map would allow me to identify sub-topics of greatest relevance and the authors/papers most central to these sub-topics, and also hopefully would offer a quick snapshot of the main theoretical camps.

The analysis and visualization workflows should be well documented so that maps like this can be generated for new topic areas in the future. For further information watch this video:

Information on dataset(s) to be used:
Typically, I rely on publicly available data via Google Scholar, where my usual approach is to start with a few important papers as "seeds", and then iteratively build up a network of papers by (1) searching forward for papers that cite papers already in the network, and (2) searching backward for papers cited by papers already in the network. The importance of papers, and relations among them, can be identified by the strength of connections in the network (e.g. number of times cited, number of times citing other important papers, etc.).

The data set for this project was compiled from Web of Science database of bibliographic citations that were collected using a query of terms related to numerical cognition and seed papers. It consists of 5,389 records, from between 1202 and 2015.

Web-link to dataset(s):
https://iu.box.com/s/spbaplko1ogfuyi14b4rczhgz1hstktr
https://iu.box.com/s/v0eeezdylcb1zsvn39rw99svu8jbwaks

Relevant publications, websites, etc.:
Booth, J. L., & Siegler, R. S. (2008). Numerical magnitude representations influence arithmetic learning. Child Development, 79(4), 1016–1031. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8624.2008.01173.x/full
Dehaene, S., Piazza, M., Pinel, P., & Cohen, L. (2003). Three Parietal Circuits for Number Processing. Cognitive Neuropsychology, 20(3-6), 487–506. doi:10.1080/02643290244000239
Wynn, K. (1990). Children’s understanding of counting. Cognition, 36(2), 155–193. Retrieved from http://www.sciencedirect.com/science/article/pii/0010027790900033

Publication Notes:
I'd like to be mentioned if my participation is helpful. I don't anticipate involvement that would require my approval of results or my inclusion as a co-author,

 

 

Collaborative Case Studies

Client Name: Caroline Wagner

Project Description (goal/scientific or practical value):

Data is available on international collaboration in science at the disciplinary level for the years 2008 and 2013. For these data, it would be useful to map these collaborations onto a map of science to get a sense of the patterns of international collaborations within and between scientific fields.

The following visualizations would be awesome:

Information on dataset(s) to be used:
The data is broken down into collaborations by scientific fields. Network visualizations can be created showing connections within fields or between two or three specific fields. For further information on the datasets and research questions view this document: Dataset information

Web-link to dataset(s):
Collaborative Case Studies Datasets

Relevant publications, websites, etc.:

http://glennschool.osu.edu/faculty/wagner/index.html
http://scholar.google.com/citations?user=OBu0OHEAAAAJ&hl=en
http://www.sciencedirect.com/science/article/pii/S1751157708000448
http://download.springer.com/static/pdf/548/art%253A10.1007%252Fs11192-005-0001-0.pdf?auth66=1415904155_dee620a31239b23f73c3310562350861&ext=.pdf

Publication Notes:

I would be happy to co-author papers with students who create great outcomes.

 

 

Visualizing a Research Library: An Exploratory Analysis of Indiana University's Library Collections

Client Name: Dr.Andrew Asher

Project Description (goal/scientific or practical value):

The Indiana University Bloomington (IUB) Libraries represent one of the largest collection of research materials in the United States, with over 10 million volumes and an annual collections budget of approximately $20 million. This project will help library collection managers better serve the Libraries' diverse constituencies by providing analysis of how IUB's collections are used and recommending areas in which to focus future resources. This project will specifically seek to answer the following questions: What are the current strengths and weaknesses of the IUB libraries collections? What are the disciplinary differences in usage patterns? Are there meaningful clusters of usage that suggest links between disciplines or subject areas? Are there patterns in past usage that can be used to predict future usage, and by extension plan purchasing decisions? As this is an exploratory analysis, students may develop additional research questions during the course of the project.

Information on dataset(s) to be used:
The dataset represents the holdings of the Indiana University Bloomington libraries. Each line represents one item and its associated metadata (e.g. subject area, format, language, last circulation date, etc.), with about 10 million records total.

Web-link to dataset(s):
https://iu.box.com/s/eruygquastuzfhzm1u6hcbr13dg7dgg5

Relevant publications, websites, etc.:

Some similar work has been done at Columbia University: http://www.visualisingdata.com/index.php/2014/11/library-project-visualising-columbia-universitys-collection/
and by IU's Ted Polly and Brianna Marshall at Wabash College https://docs.google.com/file/d/0B3mdMrcNVAl4a3VSQkNrUG1FMkU/edit

Publication Notes:

I would like to approve the project results before they are presented publicly, however I would expect an outcome of this project to be project website that students could link to on their resumes. After the project is complete, I would like to work with students to present the results as a co-authored conference presentation or journal article.

 

 

Visualization of US Trade Remedies Dataset

Client Name: Dr. Badri Narayanan Gopalakrishnan

Project Description (goal/scientific or practical value):

We have a long time-series data on trade remedies' orders. These orders are imposed against companies that export to the US, in unfair ways, with unduly high volumes and prices lower than their production costs.

The objective is to develop a visual tool to show the prevalence of such orders in different commodities and exporting countries. The tool should provide opportunity to look at multiple ways of visualizing: by commodities, by exporting countries and by years. For example, one should be able to see the relative extent of these orders in different commodities being exported from a particular country, different exporting countries for a particular commodity as well as movements over the years. We would also need to explore options in how to visualize projections on future cases and industries based upon past experience. For further information watch this video:

Information on dataset(s) to be used:
The dataset will be shared with the students with password protection, in the link shown below. It has information on orders of anti-dumping and countervailing duties issued by the US Government on several exporters for several commodities over years. Data is available at very detailed commodity level and partner countries.

Web-link to dataset(s):
US Trade Remedies Dataset

Relevant publications, websites, etc.:
http://www.intl-tradelaw.com/category/trade-remedy-overview/

Publication Notes:
Students can publish the results, but two of us need to be coauthors: Badri Narayanan and Nithya Nagarajan. Of course, we would like to have a look at the results and also contribute to the analysis indeed. We will be employing the tool developed in our website, with due credit to the students.

 

 

Global Trade Visualization Tool

Client Name: Purdue University Global Trade Analysis Project

Project Description (goal/scientific or practical value):

Global Trade Analysis Project (GTAP) is a widely employed framework for policy modeling across the world. This includes a set of databases and models used for a broad set of policy analyses. We, as the developers based at Purdue University, have been wondering how much visualization can be done for this dataset. I have written a paper in this regard with a colleague (Chandramouli et. al)

Students may want to use their innovations as well as the ideas developed in the paper above to come up with a visualization tool for the GTAP trade dataset. For further information watch this video:

Information on dataset(s) to be used:
GTAP Data Base is a publicly available dataset, which contains among other things, international trade data comprising exports and imports from one country to another of several products.

Web-link to dataset(s):
https://iu.box.com/s/43bz2uy3aaj1v270ysvy10prjq1yjhxm
https://iu.box.com/s/yly6idsfwcb15jpgnfvulcggu95c6hn5

Relevant publications, websites, etc.:
Please have a look at the paper below for ideas in this regard:
Chandramouli, M, G Badri Narayanan and Gary Bertoline. (2013). A Graphics Design Framework to Visualize
Multi-dimensional Economic Datasets. Engineering Design Graphics Journal. 77(3), 1-14.

Information on sectors and regions in GTAP Data Base:
https://www.gtap.agecon.purdue.edu/databases/regions.asp?Version=8.211
https://www.gtap.agecon.purdue.edu/databases/v8/v8_sectors.asp

Publication Notes:
I'd definitely encourage writing a paper or two based on this application and I'd like to be a coauthor and will provide all inputs needed in due course. Of course, my approval of the results is needed for publication.

 

 

Social Network Analysis of “Tutor/Mentor Networking” Conferences

Client Name: Daniel F. Bassill

Project Description (goal/scientific or practical value):

Since 1993, I've used GIS maps and a variety of visualizations to communicate ideas and strategies that leaders throughout the Chicago region could use to build and sustain non-school tutor/mentor programs that connect inner city youth and adult volunteers and help those youth move through school into jobs and careers. Since 2006, interns from a variety of universities, including IU, have created new interpretations of my work, applying their own talent.

I use these in blogs, web sites, social media to influence how others use their own time, talent and dollars to help kids in Chicago and other cities. In doing so, I demonstrate how a small group of people can influence actions of others throughout the world, by how they communicate their ideas on the Internet.

This project asks students to create an information visualization that shows participation in Tutor/Mentor Networking Conferences held in Chicago since 1994. The goal is to show growth in participation in Teacher/Mentor conference over time and look at how organizations and job titles function within the network. The visualizations and analysis will help participants and organizers find ways to connect with each other after our conferences. Work done will be shared with others who organize conferences and events. For further information watch this video:

Information on dataset(s) to be used:
The data to be used for this project was created by a volunteer in 2010. Their description of the initial project and analysis may be found at their blog. The data is composed of conference attendance lists for most of the Tutor/Mentor Conferences held in Chicago from May 1994 through Nov. 2014. The data comes in 42 Google Sheets files which are easily converted into CSV documents. Students will look at the yearly conferences and create tool or procedure that automates the transfer participant data from tabular data sheets for social network analysis.

For extracting social networks of past conferences, I have provided Google spreadsheets showing participant information. Free software should be used.

Web-link to dataset(s):
Will upload spreadsheets to Google Docs where students can access them.
http://www.tutormentorconnection.org
Dataset on Google Drive

Relevant publications, websites, etc.:
At http://tutormentorconnection.ning.com/group/cktmc you can see where I coach interns who have worked with me on past visualization projects.
At http://www.tutormentorexchange.net/definition-of-issues/ideasanimation you can see a library of visualizations done by past interns. This illustrates how work done by IU students would become part of an active on-going effort, not just a classroom exercise.

Publication Notes:
I would want to be a co-author and co-owner of any work created and be offered review privileges prior to publication, just to assure accuracy of how my own work and organization are presented.

 

 

CoBRA: Comic Book Readership Archive

Client Name: Dr. John Walsh

Project Description (goal/scientific or practical value):

The Comic Book Readership Archive project, or CoBRA, proposes to build a digital archive—of primary source material and related data sets—to document American comic book readership and fandom. The archive will include content from such sources as: fan mail, fan club publications and membership rolls, contests sponsored by publishers and fan clubs, fanzines, and programs and attendee records from comic book conventions and similar events.

Comics scholarship is an established area of academic research and the subject of thousands of dissertations, journal articles, book chapters, monographs, and digital projects. Comics readership has been a specific target of scholarly attention. However, pervious studies have not fully considered the vast documentary record of comic book readership that will be compiled and analyzed in the CoBRA project.

In the “Bibliographic Essay” concluding his study, Of Comics and Men: A Cultural History of American Comic Books, Jean-Paul Gabilliet writes: “fan mail constitutes a largely unexplored source of information about the reception of characters, stories and creators.”[1]

The CoBRA project will address this gap in comics scholarship by providing access to a large and growing archive for the study of comic book readership, including fan mail. Our archive will allow new research questions to be asked (such as: Can we identify trends among prolific letter writers? Are they following particular characters, writers, artists, publishers? Through content analysis, can we identify profiles of people who self-identify as a particular race, ethnicity, gender, or occupation? Etc…) and will enable new forms of research, such as interactive maps, timelines, other information visualizations, and computationally-assisted content and data analysis.

Information on dataset(s) to be used:

To date we have generated 3321 data records from two comic book publications: The Fantastic Four (1961-73) and The Avengers (1963-73). The records include names, addresses and other details about letter writers and fan club members. This small data sample from only two of dozens of serials is revealing. Many famous individuals associated with the comic book industry appear as authors of fan letters. Other letter writers include George R. R. Martin, author of A Game of Thrones, and film producer Michael Uslan. Authors of fan mail identify themselves as college and university students, fraternity members, law students, and university faculty. Others identify as military personnel, some stationed in Vietnam during the war. A fan from the Bronx identifies himself as a “young Negro,” and—in the context of the 1960s civil rights movement—thanks Marvel, the publisher, for including people of color in their stories.

Our current data set of over 3000 records include:

Web-link to dataset(s):
https://dl.dropboxusercontent.com/u/1175263/cobra_data_2014-12-13.xlsx

Relevant publications, websites, etc.:
Conference presentation: http://dcl.ils.indiana.edu/cobra/berlin_comics_workshop_with_notes.pdf
Related project: http://cbml.org and http://digitalhumanities.org:8081/dhq/vol/6/1/000117/000117.html
Related publication: https://www.academia.edu/1564874/Seducing_the_Innocent_Fredric_Wertham_and_the_Falsifications_that_Helped_Condemn_Comics

Publication Notes:
Students are free to publish the results of their work, preferably in an open access publication, and to include their work on their résumés. I would like to be included as a co-author and have a role in reviewing the results. Published results should acknowledge the CoBRA project and key participants: John A. Walsh (Indiana University), Carol Tilley (University of Illinois), and Kathryn La Barre (University of Illinois).

 

 

Visualizing DHQ bibliography

Client Name: Digital Humanities Quarterly

Project Description (goal/scientific or practical value):

Under a recent grant from the NEH, DHQ has developed a centralized bibliography which supports the bibliographic referencing for the journal. We are thus able to generate metadata records for each DHQ article that include a full inventory of citations, permitting detailed analysis of citation patterns. DHQ’s data includes approximately 200 articles dating from 2007 to the present, containing approximately 6500 citations in all. Our DHQ article metadata includes author, title, date of publication, abstract, topic keywords, and authors’ institutional affiliations. The bibliographic citation data includes basic publication data and also the genre of publication (journal article, book chapter, conference paper, etc.). All DHQ articles are published under an open-access license and are available as XML.

For this Information Visualization MOOC, DHQ is seeking innovative visualizations that can help its readers explore and understand the networks of citation that operate within the journal. We are interested in visualizations that can help us learn more about how citations reflect differences in academic culture at the institutional and geographic level, and also changes to that culture over time. We are also interested in visualizations that can help illuminate correlations between article topics (reflected in keywords) and citation patterns.

Information on dataset(s) to be used:

Founded in 2005, DHQ is an open-access online journal published by the Alliance of Digital Humanities Organizations (ADHO), which is the premiere international professional association for digital humanities research. The journal is hosted at Northeastern University and Indiana University. The journal began publication in 2007 and has published eight issues covering a wide range of material from across the digital humanities. DHQ serves as a crucial point of encounter between digital humanities research and the wider humanities community.

Web-link to dataset(s):
http://digitalhumanities.org/dhq/data/dhq_ivmooc_data_2015-01-09.zip

Relevant publications, websites, etc.:
http://www.digitalhumanities.org/dhq

Publication Notes:
Students are free to publish and share the results of their work under a Creative Commons BY-NC-ND license.

 

 

Infectious Disease Publications Analysis: Ebola, Avian Influenza and Swine Influenza

Client Name: Thomson Reuters

Project Description (goal/scientific or practical value):

Infectious disease has been an interesting topic in both academic research and everyday life. This project focuses on three infectious diseases: Ebola, which was first discovered in 1976 in Democratic Republic of the Congo; avian influenza (H5N1), which was first discovered in 1997 in Hong Kong; and swine influenza (H1N1), which was first diagnosed in humans in 2009 in the United States.

This project aims to analyze publication outputs of the three infectious diseases for time period 1996 to 2014. The students will conduct research on scholarly publications and provide quantitative analysis and data visualization on the following topics:

  1. Identify authors and organizations conducting research on the three diseases. Identify percentage of research on the disease itself (for example, the pathology of the disease, the medical treatment of the disease) vs. on topics related to the disease (policymaking and such). Provide explanation on the findings.
  2. Identify collaborations between researchers within and across organizations on the three topics: Ebola, avian flu and swine flu. What percentage of these research activities are multidisciplinary? What are the most collaborated disciplines? Who are the most collaborated researchers/organizations on topics relevant to the three diseases? And finally make recommendations on researchers/organizations for new researchers based on their research interests.
  3. Trend analysis on the outbreak of the disease and research outputs. Investigate correlations between the outbreak of diseases and publication outputs. Furthermore, identify differences of trends between the three diseases. For example, swine flu is the newest identified pandemic disease of the three, what is unique about swine flu in its research outputs compared with Ebola and avian flu?

Scientific or practical value:
This project is designed for students to develop basic data mining and information visualization techniques. For students with background in machine learning and programming, they can focus on developing a recommendation system for researchers for collaboration. Additionally, students can also perform time series analysis on both the spread of the diseases and the publication outputs on the diseases.

Information on dataset(s) to be used:

https://iu.box.com/s/8l5ayrqqvqehel9sab8zaly88cvpaj8g

Web-link to dataset(s):
ADDRESSES.csv
ARTICLES.csv
Author_ADDRESSES_link_table.csv
AUTHORS.csv
CATEGORIES.csv
KEYWORDS.csv
KEYWORDS_PLUS.csv

Relevant publications, websites, etc.:
Information on the dataset to be used:
World Health Organization
http://www.flu.gov
Thomson Reuters Web of Science

Web-link to the dataset:
http://www.who.int/en/
http://www.flu.gov

Information or links to relevant publications, online sites:
http://www.who.int/tdr/publications/en/

Publication Notes:
Review and verification of results; acknowledgments for data provided if applicable.

 

 

Visualizing Shopper Behavior

Client Name: Raymond Burke, Customer Interface Lab

Project Description (goal/scientific or practical value):
The project aims to gain insights into how shoppers behave and respond to the in-store environment by analyzing foot traffic, product interaction, promotional activities, and customer characteristics in grocery and mass retail stores.

Through the visualization project, we hope to address one or more of the following questions: (a) How does behavior change as shoppers progress through the store, perhaps due to time, budget, or contextual constraints? (b) What is the best way to visualize multiple customer segments, or "outliers" behaving atypically, on a 2D map? (c) How do initial purchases/interactions influence the path through the store? (This will require purchase category data, which we can provide.)‎ (d) What is the impact of in-store promotions on shopper behavior?

Information on dataset(s) to be used:
The traffic patterns are recorded by a head-mounted eye-tracker and subsequently superimposed on a 2D store floor plan. Each customer's track consists of a number of time-location "nodes". The data will be provided as a CSV file for each store (ID, time, x, y coordinates, duration). Additional data, such as gender, age, promotions, and purchase/product interactions for each node are available.

Web-link to dataset(s):
Sample of Customer Shopping Behavior Data and Store Map

NDA is required before full access to this data set is provided to the client by each student working on this project. It can be found at the link provided:
https://iu.box.com/s/37so4qxxm2yto6dqy1qzz7c1h9egj979

Relevant publications, websites, etc.:
https://www.youtube.com/watch?v=jeQ7C4JLpug
https://iu.box.com/s/b8oiku3qopngye4hyxqhiqmmym0zf682
https://iu.box.com/s/dwct5otb16vexkumecbf17gg0y3kfopw
https://iu.box.com/s/1lijka5yj0y9lkddm0ccaxyauzqnwsu9

Publication Notes:
Due to the confidential nature of the data, students will be asked to sign a non-disclosure agreement. We can discuss the feasibility of reporting aggregate results. Please note that this project is restricted to residential IU students (29374 section of Z637).