This project focuses on the human evaluation of methods and algorithms for trustworthy collaborative editing. The project brings together expertise on distributed collaborative systems of SCORE team with expertise in user studies of Department of Psychology of Wright State University.
The project focuses on two substantive areas - collaborative editing and trust-based collaboration-, with one overarching methodological contribution.
Collaborative editing including an insightful understanding of real-time requirements for collaborative editing, grounded in a theory for the effect of real-time constraints in collaborative work. Current related work is fundamentally flawed, based on tasks with varied time constants, idiosyncratic task coupling and uncontrolled compensatory strategies. The project focusses as well on non real-time or asynchronous collaborative editing and awareness management for the coordination of work in the presence of conflict and disruption.
Trust-based collaboration where users master and control their data by deciding with whom they share their data without relying on a central authority. We investigate new trust-based access control mechanisms where access is given based on user trust values that are dynamic and vary during a collaboration. These mechanisms are scalable and usable. The main questions we address are how to compute the trust values such that they correctly reflect collaboration experiences between users and that they are accepted by users.
Methodologically, validation requires the expertise of both computer scientists that designed the systems and social scientists for conceptualizing and measuring human behaviour in collaborative work. We are developing new methods for the cost-effective evaluation of collaborative work to compensate for otherwise unrealistic sample sizes and costly engineering, using game theory to inspire task analogues and simulated users along with human users.
Progress and ongoing work
Real-time collaborative editing
We have studied real-time requirements in collaborative editing. We examined user behavior of 80 participants organized into teams of four, conducting three different editing tasks using levels of delay ranging between 0 and 10 seconds of delay in the distribution of document changes across users:
- A proofreading task, in which participants corrected a short text, containing several grammatical and spelling errors
- A sorting task, in which participants located the release dates of an alphabetized list of movies, and sorted them accordingly and
- A note taking task, in which participants listened to a 10 minute interview on the topic of cloud computing, and provided an integrated set of notes on the interview
- All participants completed a follow-up questionnaire at the completion of the three task series.
For the sorting task we used measures of sorting accuracy based on the insertion sort algorithm, average time per entry, chat behavior, collisions between users, and crucially, task strategy (tightly coupled or loosely coupled task decomposition of the task). We found out that delay slows down participants which decrements the outcome metric of sorting accuracy. Tightly coupled task decomposition enhances outcome at minimal delay, but participants slow down with higher delays. A loosely coupled task decomposition at the beginning leaves a poorly coordinated tightly coupled sorting at the end, requiring more coordination as delay increases. A paper describing the results was published in CDVE 2014.
Note taking task
The precise research questions for this study were how does delay influence the quality of the final document in terms of the number of grammar errors present in the document, the amount of redundancy and the number of keywords present in the final document with respect to the transcript of the audio. We also wanted to answer to the question whether users try to adopt compensatory strategies to overcome delay by means of coordination over chat that we quantified according to the use of accord language and definite determiners. And finally how do delay, experience and compensatory collaboration effort interact to affect task performance.
As dependent measures we analysed:
- Number of words in the text base
- the keywords as a measure of document quality. Keywords is computed as the number of main keywords present in the final version of the document provided by each group of users. We examined the number of keywords divided by the number of words
- redundancy is another measure of document quality. Redundancy is computed as the sum of redundancies of each section in the document. Redundancy of a section was measured by analysing the recorded videos of the collaborative editing session. Redundancy of a section represents the maximum number of occurrences in that section of any topic present in the audio.
- Error Rate is another measure of document quality. Error rate is computed using Reverso tool. Reverso checks misspellings and grammar of a text in any language. We examined the number of errors divided by the number of words.
- Chat Behavior was studied for measuring coordination. We examined the number of words, accord language, and definite determiners.
- Survey responses. For instance we examined the experience of using collaborative editing of users. We divided the groups into high experienced and low experienced.
We found that the error rate is higher for groups that experienced a higher level of delay. Redundancy is higher for groups in higher delay condition. We also found out that as the delay increases the keywords depicted by users decreases. We separated the groups into high experienced and low experienced according to the data in the questionnaire. For high experienced groups redundancy increases with the delay, but for low experienced groups we could not see the same tendancy. We also measured chat behavior by means of number of accord words and definite determiners which together provides a common ground knowledge which we considered as a measure of coordination. We have seen that low experienced groups used more coordination to manage redundancy. High experienced groups did not adjust their collaboration effort to manage redundancy. Results of this task were published at ECSCW 2015.
Measurement of delays in real-time collaborative editing systems
By means of simulations we measured delays in popular real-time collaborative editing systems such as GoogleDocs and Etherpad in terms of the number of users that edit a shared document and their typing frequency. We simulated a variable number of users that contribute with a variable frequency to a shared document edited in collaboration by using GoogleDocs. We varied the number of users that collaborate on the document from 1 to 50 and the typing speed from 1 character/s to 10 characters/s. We measured the delay between the time when a modification is done by a simulated user and the time this modification is visible to the other simulated users. We measured that the delay is incrementing with the number of users and with the typing frequency. GoogleDocs does rarely support more than 30 clients that have an average typing speed. For 1 to 30 clients and for speeds of 1-10 characters/s, delays are ranging between 0 and 25 s. In Etherpad users are disconnected if the number of concurrent users is higher than 10. Results of this study were published at Internet of People Workshop 2016 organised in conjunction with Networking conference.
Trust-based collaboration model
One of the important goals of this associate team is to develop an experimental design for testing SCORE's trust-based collaboration model for a large community of users. To ground our findings in existing social science theory, we examined the game theory literature that spans cognitive science, psychology and economics. Some of the questions we worked on were: is there a game theory model that could reflect document sharing in collaborative editing? is there a game theory model that deals with user reputation? what task scenario can be proposed for experimental studies? Our paradigm builds on game theory methods in several ways. First, we adapted a well-established trust game to a repetitive setting that best suits to trust-based collaboration. Second, we added user attributes for trust, as required in the SCORE model, in order to determine their effect on the decision to collaborate. Trust values are updated based on the satisfaction level for the exchanges during the game.
We designed a trust metric that reflects user behaviours during the interactions in trust game. A current trust value is computed based on the current iteration of the game. This current trust value is aggregated with computed trust values from previous iterations of the game involving the same partners. Aggregation of trust values during various iterations of the game reflects variations in user behaviour, i.e. our metric is robust against fluctuating user behavior. We validated our trust metric by using an empirical approach against data sets collected from several trust game experiments. We showed that our model is consistent with rating opinions of users, and our model can provide higher accuracy on predicting users' behavior compared with other naive models. This result was published at TrustCom 2016.