Study on Visualizing Sets and Networks

The results of this work appear in the following paper:
A Task-Based Evaluation of Combined Set and Network Visualization. Peter Rodgers, Gem Stapleton, Bilal Alsallakh, Luana Michallef, Rob Baker, Simon Thompson. Information Sciences, Elsevier. 367-36:58-79.

Take a look at our video outlining the research.

We tested 5 different software systems that allow sets and networks to be visualized on the same diagram for multiple sets. The results were obtained using Amazon Mechanical Turk with a between subjects methodology. The below 5 links take you to the study that was completed by the participants. Participants took one of these 5 tests, and could only take one test. Participants who attempted to restart after starting a test but had not finished were not allowed to do so. For each different type the questions are the same, except for some variation in the text of training questions.

Here the questions are presented in the same order each time to allow for comparison between the diagram types and with the correct answers(see the below table for the order). In the actual study, the questions were presented in a random order (except the 4 training questions) as explained below. The pages email the results at the end; in the study these answers were sent to Mechanical Turk in a url query string.

The first 4 pages were training, and were always presented in the same order. There were 2 inattentive paritipant test questions. The text of these requires the participant to click on the image rather than the radio button and "Next page" button. The intention is to ensure the participant reads the questions rather than clicks as quickly as possible through them. The order of these inattentive paritipant test questions was randomised, but they were always on pages 6 and 13. The 12 data generating questions were presented in a random order, with the proviso that questions on rotated diagram had to have a gap of three other questions between them. The diagrams for questions 7,8,9,10,11,12 were rotated (and relabelled) versions of the diagrams for questions 1,2,3,4,5,6 respectively.

The four types of task used in the study were:

Task Type Class Text
T1 group-link How many people with interests in X have exactly Y connections to other people?
T2 group-network What are the interests of the person who, if removed, leaves exactly X people disconnected from all the other people?
T3 group-link How many direct connections are there between people interested in X and people interested in Y?
T4 group-network What is the fewest number of people you need to pass through to get from people interested in X to people interested in Y? (Do not include the people at the start and end of the path)

The types of the questions and correct answers are:

Page Question Type Answer
Page 2 Training 1 T1 One
Page 3 Training 2 T3 Seven
Page 4 Training 3 T2 Food and News
Page 5 Training 4 T4 One
Page 6 Inattentive 1 click on region
Page 7 Question 1 T3 One
Page 8 Question 2 T3 Three
Page 9 Question 3 T2 Travel
Page 10 Question 4 T2 Internet and News
Page 11 Question 5 T1 Three
Page 12 Question 6 T4 Zero
Page 13 Inattentive 2 click on region
Page 14 Question 7 T1 One
Page 15 Question 8 T2 Android
Page 16 Question 9 T3 Zero
Page 17 Question 10 T4 Zero
Page 18 Question 11 T4 One
Page 19 Question 12 T1 Zero

The visualizations were generated from source data derived from SNAP Twitter ego networks:

Question Set Network
Training 1 & 2 & Inattentive 1 61781462.circles 61781462.edges
Training 3 & 4 & Inattentive 2 21222922.circles 21222922.edges
Question 1 & 7 17636894.circles 17636894.edges
Question 2 & 8 48730516.circles 48730516.edges
Question 3 & 9 65360846.circles 65360846.edges
Question 4 & 10 84114921.circles 84114921.edges
Question 5 & 11 88931752.circles 88931752.edges
Question 6 & 12 198370650.circles 198370650.edges

Anonymised study results in csv format. The time column records the time spent on the question page in seconds. The error column gives a 1 if the question was incorrect and 0 if the question was correct. 500 participants took the test, 32 were considered to be inattentive paritipants, based on failing to click on the image (anywhere on the image) for both of the inattentive participant test questions and were not rewarded through Mechanical Turk. Corrupt data was found to returned for 1 other participant. The data here contains the data from the 467 participants who were attentive and whose results did not contain corrupt data.

The software for the SetNet system is available here: SetNet download.

Peter Rodgers, University of Kent
Gem Stapleton, University of Brighton
Bilal Alsakkakh, University of Vienna
Luana Micallef, University of Kent
Robert Baker, University of Kent
Simon Thompson, University of Kent

The code used in this study is modified from that of INRIA, Paris, The paper describing their study is:
Micallef, L.; Dragicevic, P.; Fekete, J.Assessing the Effect of Visualizations on Bayesian Reasoning through Crowdsourcing. Visualization and Computer Graphics, IEEE Transactions on , vol.18, no.12, pp.2536,2545, Dec. 2012.