Paper (2019-03-05)

Tagged as: blog,
Group: I Entry 6, Paper

Writing the Paper

Since the last entry in this blog, we were busy writing a paper about our field of research. We described and analyzed both pre- and main study that are presented in earlier blog entries. Here is the abstract of our paper:

In this paper we researched the user task performance differences between mobile and desktop devices. Furthermore, as a preparation, we researched common tasks for both system types in focus groups. From these results, we constructed three tasks for our main study which we conducted with 34 participants. We conclude that for our subset of possible comparable tasks no system performed overall better at each of our three tasks. Only for one task the desktop system performed significant better in regard to task completion time and error rate. Also raised NASA TLX scores for the respective tasks showed no significant difference between mobile and desktop devices with regard on the perceived mental load. We cannot generalize to all possible tasks on mobile and desktop systems and therefore cannot conclude an overall better performing system. Besides we only compared one possible mobile system and one possible desktop system out of entirety of all possible system variations.

We did not write about analyzing our main study in the blog until now, because we inserted our results directly into the paper. To compensate this shortcoming, our findings are also presented in the following section:

Regarding the NASA TLX, we found no significant differences concerning perceived mental load with any of our three tasks on the two system types. A Shapiro-Wilk-Test confirmed normal distribution, so it is possible to use statistical test relying on it. This enabled us to use a t-test to check significance. However, there was no significant difference between the workload perceived between smartphone and desktop devices for any of the tasks. This result shows users do not perceive fulfilling tasks on one of the system types as more exhausting with regards on the mental load. Therefore, it is less probable that factors like frustration or fatigue impact user performance to a greater extent.

Our goal was to inspect correlation with task completion time and error rate with each of our three tasks individually. As it turned out, error rate was difficult to measure for two tasks, so we had to rely more on task completion time. We did not have multiple dependent variables for most of the tasks and therefore had to use an alternative to ANOVA. Task completion times were distributed normally after ShapiroWilk. Again, we used a t-test to check the significance of our data. The navigation task and the online shopping task showed no significance discrepancy between mobile and desktop devices in terms of task completion time. We found a significant difference between writing an email on a smartphone and writing it on a desktop computer (p = 0.0007 < 0.05).

Since we also recorded task success, we ran the same statistic calculations with only the successfully completed tasks. Task one (navigation) had 14 completions with smartphone and 13 completions with desktop interaction, task two (online shopping) was completed 11 times on smartphone and 15 times on desktop and task three (email) was completed 16 times on smartphone and 15 times on desktop. Every task had 17 iteration on each system type. When the navigation task failed, it was usually because participants could not find the landmark of the ruined castle very quick and either gave up or our time limit of ten minutes per task expired. In online shopping, chosen products which did not meet our set criteria (case for Samsung Galaxy S8, animal motive, Prime shippable, low price) were counted as failed tasks. When writing emails, incomplete tasks occurred when a participant misspelled the given mail address and when the ten minutes limit could not be met. To apply above-named statistical tests, we had to bring both scores sets to an equal size, which already limits the accuracy of the following results. However, we used all possible combinations and additionally rounded values in the larger sets to align the samples and could not find further significance. As mentioned above with the whole sets, only task three showed a significant difference between the task completion times of mobile and desktop devices.

When conducting our study, we also measured error rate, but in tasks one (navigation) and two (online shopping), error rate was identical to task success, because one error usually meant that the task could not be completed correctly anymore. With written emails in task three, we could count spelling errors for the error rate. So, we at least have results concerning error rate for task three. Testing for normal distribution with Shapiro-Wilk was contradicting, because the value set for desktop was distributed normally, but the set for mobile devices was not. Because of this, we were not able to apply a t-test and instead used MannWhitney-U. Results show a significant difference between the error rate of writing emails with smartphones and desktop devices.

We took notes about process of how participants carry out the tasks. In the navigation task, we found out that one navigation error is often very demotivating, to a degree that the task could not be fulfilled at all. Only three participants out of ten who made a navigation error recovered and went on succeeding in the task (2 smartphone, 1 desktop). Task two online shopping had a few requirements the wanted smartphone case should comprise. To meet all of the requirements, the first thing of need is an appropriate search query, which did not pose a problem to the participants. The quickest way to meet all criteria in the search results is then to sort all products according to the requirements. However, not all of the participants were aware of this function and several searched the default result pages manually. Task three email was mostly about writing a text. With the desktop and connected keyboard, writing text is fastest when using the ten-finger system. Of course, this is not a skill everybody has mastered. In our study, seven of the seventeen participants who had to write the email on desktop were able to use the ten-finger system, the remaining ten did not. For digital keyboards, particularly on touchscreen, auto complete is a key function which can enhance writing speed. In our study, two of seventeen participants with smartphones used auto complete to a greater extent. Additionally, desktop writers often went through the text after typing to check for spelling mistakes and correcting them. This praxis did not occur with smartphone writers, maybe because they trusted in the auto correct function of smart keyboards.

Our paper has now reached the peer review phase, after which hopefully most of the paper's weaknesses in its current state will be identified. After we corrected these weaknesses identified by the reviewers, we will submit the paper. We will present all of our work on these studies and the results on March 29. 2019 at University of Regensburg.