Evaluation of the study (2019-03-08)

Tagged as: evaluation, results
Group: F In this blog entry we present our results, which we use in our submission. Not every evaluation is included, only the combinations of varibales that we deemed to be important for our research got included. Different statistical tests were used.

Evaluation of the study

The post evaluation was made by watching the videos we recorded, when the subjects were doing the study. We identified task times by the time the subject pressed on the task folder (first click) until he finished the task (last click). Additionally, we examined the error rate at the end of our evaluation. We had 30 participants in total (10 Linux, 10 Windows and 10 Mac) and therefore 30 representative videos. For our evaluation, we transcripted the whole study into tables we could work with. To prove our hypothesis (several H1 and H0), we used different statistical tests to prove them. We used t-Tests and linear regressions for testing the significance and correlation.


Why did we choose to use the ATI score as a variable? As we want to give people a recommendation on which file manager they should use for which case, its important for us to assess the users preference of approaching with a system. If he doesn’t like to use technical systems, it might be better to recommend him something easier to use, so he doesn’t need to spend a lot of efforts for it, while experienced people who are keen to use new systems, could try out more difficult systems like the terminal. Therefore we want to prove our hypothesis, that the higher the ATI Score of a person is, the better he can deal with a unknown system. Task Completion Time and Error-rate are going hand in hand. As we used the task completion time as our main variable, since this is the important variable we want to research, if there are actually difference in performance when using different file manager for different tasks.

Why did we not pick certain combination?

As we proceeded with our evaluation, we noticed that some combination of variables are simply not significant, hence they did not get examined and presented in our study. Especially on the comparison of dual pane and default file manager, there was hardly any significant differences in contrast to the terminal times. Consequently, we decided to examine the SEQ – Terminal and the Terminal – Time, since they provide a significant difference compared to the dual pane and default file manager (zeigen).

H0: SEQ – Terminal has more impact on our completion time than the ATI – Score. H1: SEQ – Terminal has less impact on our completion time than the ATI – Score. We conducted a multiple linear regression to confirm the relationship between task completion time as dependent variable and the ATI – Score and SEQ – Terminal as independent variables. Figure x shows how the different variables would influence our dependent variable task completion time and which one of those has the most impact on it. The result shows, that the ATI has a higher impact on the task completion time compared to the SEQ – Terminal. An increase in both ATI – Score and SEQ – Terminal would decrease our task completion time by 370,34 seconds on the ATI – Score and 222,46 seconds on the SEQ – Terminal, since we have proven before, that we have a negative moderate correlation (compare bla). Accordingly, the theoretical increase of the ATI – Score has more impact on our time than an increase in the SEQ – Terminal. Consequently, we decline our H0 and accept the H1.

ATI – Task Completion Time

We are comparing the ATI score with the overall completion time of the different tasks the participants performed. In order to test whether or not connection or relation between the two variables exists, we performed a correlation analysis. For this purpose, we used the Spearman rank correlation. Our hypothesis before the tests were: H0: Participants with a higher ATI will be slower than participants with a lower ATI. H1: Participants with a higher ATI will be faster than participants with a lower ATI.

As figure x shows, the spearman r-test got a correlation coefficient of 0,56, which demonstrates a moderate downhill negative relationship between the two variables. The p-value of this test resulted in 0,00014, consequently the result is significant. This shows a tendency to a negative correlation between the users ATI score and the completion time of the task he performed. This means, that we can decline our H0 and accept the H1, since there is a tendency that people with a higher ATI completed the task faster than people with a lower ATI.

Completion time by operating system

H0: Operating systems will influence the completion time. H1: Operating systems will not influence the completion time. One of our assumptions we had since our preliminary study was, that there is not a significant difference in completion time, depending on which OS is used. The preliminary already hinted at this, but as it was only conducted with 6 people, there was no significant statement to be made. With the evaluation ongoing, we performed a t-Test for each OS with each other. The results for the p-Values were as follows: Window – Mac: 0.971 Window – Linux: 0.303 Mac – Linux: 0.226 Therefore, none of the differences between the OS are significant, which means we can decline the Ho and accept the H1. Since there is not significant difference between those, we decided to group up all of the OS (Windows, Linux and Mac).

ATI – SEQ Terminal

H0: A higher ATI Score tends to lead to higher rating in the SEQ. H1: A higher ATI Scores tends to lead to lower rating in the SEQ. As we stated before, we did not include the SEQ – Dual pane and SEQ – Default in this examination due to lacking significance. With this metric we also wanted to test whether or not user experience influences the perception of difficulty. This was done by performing a spearman rank correlation. With the spearman r-test we got a correlation coefficient of 0.32, which indicates a weak uphill positive relationship between the two variables (see Figure x).

Figure x shows the correlation between the two variables, which is a weak one, since the variation is visible, but with the help of the “mean line”, we can see a tendency. This indicates a weak positive correlation between the user’s ATI and the SEQ – Terminal they filled in for the different tasks. The p-value of this test resulted in 0.084, which is above the 0.05 significance threshold, so this result is not significant and can’t be generalized. We can therefore decline the H0 and accept the H1

ATI – Terminal Time

H0: The higher the ATI score, the more time the participants need in the terminal. H1: The higher the ATI score, the less time the participants need in the terminal. We already found a correlation between the ATI and the overall task completion time and also wanted to take a closer look at the relationship of the ATI to the completion time, as we would suspect that the same is especially true for the terminal, as it is a more specialized tool to begin with and would probably benefit from higher experience even more. To test the two values we again performed the spearman r-test, which resulted in a correlation coefficient of -0,436, which indicates a weak downhill negative relationship (see Figure x).

This indicates that the higher the ATI score of a user is, the fewer time it tends to need. The p-Value of this test resulted in 0.0160, which demonstrates a significance. We decline the H0 and accept the H1.

Individual Tasks

We grouped Tasks 1,7 and 8 together, as we classified those as sorting tasks, which means, they all require the user to sort by different properties like creation date or file type. We observed in our study, that the standard GUI file manager performed the best in these kind of tasks followed by the dual pane file manager and the slowest was the terminal by far. The performed t-test showed a significant difference between all the three different file managers. (p-values Explorer-Double: 0.02460492907900039, Explorer-Terminal: 5.665753681250677e- 13, Terminal-Double: 2.0392690643939165e-09). Therefor our observation can be confirmed, and the normal explorer type file manager is best suited for those tasks. Task 2 required the user to compare multiple files in difference folders with each other. We observed the dual-pane file manager to be best for this task, as we expected because of the advantage of having two separate views next to each other. The dual-pane was the fastest in the test, followed by the explorer and the terminal. The performed t-test showed a very significant difference between the terminal and the explorer (7.141357355290888e- 05) and also between the terminal and the dual pane file manager (2.2598083647636781e-07). The difference between the the two GUI file managers was not significant enough, (0.06611518192098476) but came close. Our guess was shown to be right as the dual-pane was the fastest, however the difference between it and the explorer is not significant, so we couldn’t demonstrate an advantage of the dual pane compared to the standard explorer. Task 3 required the user to identify specific images in a folder and delete them. We suspected that the terminal would perform worst because of its text based nature and the results seem to reflect that. The standard explorer performed the best, the terminal the worst and the dual-pane file manager was in in the middle. The t-test we performed showed that there are significant differences between all three of the file managers (p-values: Terminal-Explorer: 1.2612578660548344e-14, Terminal-Double: 5.3428252105596115e-11, Double-Explorer: 2.30767e-05) This proofs that the terminal is significantly slower in the task involving media and to our surprise the explorer was also significantly faster than the dual-pane, which we didn’t expect. Task 4 required the user to create 10 text files and name them in ascending numerical order.We suspected the terminal to be the one coming out ahead here and that seems to be the case. The terminal was the fastest and the performed t-test shows a significant difference between the terminal and the other two file managers. (p-values: Terminal-Explorer: 9.391477983322305e-06, Terminal-Double: 1.981008456767521e-06) The difference between the two non terminal file managers was not significant. This results in the terminal being the best option in this task. During task 5 the user had to find a specific file in a folder with numerous sub- and subsubfolders. During the test the standard explorer performed the best, followed by the dual-pane file manager and far off the terminal. The results of the t-test are as follows. There is a significant difference between all of the file managers. (p-values: Terminal-Explorer: 2.988885563031086e-09, Terminal-Double: 2.49122440291807e-05, Explorer-Double: 4.590e-05) The t-tests confirm the observations and the explorer is the fastest for this task. The 6th task required the user to open a file, read the content and then change the file’s name, based on the content. The terminal performed worst in the test and the other two were roughly equal in completion speed. Our performed t-tests show a significant difference between both GUI based file managers and the terminal (p-values: Terminal-Explorer: 1.2517755687174636e-09, Terminal-Double: 5.2801354659045624e-09) but no significance between the explorer and the dual-pane (p-value: 0.2936247233926097) Therefore we can’t identify a best option here, but we’veshown, that the terminal is the worst one.

Error Rate

We counted the errors the subjects made during the test, in order to compare the different file managers for their overall susceptibility to errors. We defined an error as a wrong action which was either not part of the task to be completed or slowing down the completion because it was unnecessary or wrong. We did not include typing errors in this, as the terminal is completely text based and could not be compared with the other two. However, we counted entered commands which resulted in an error message as an error, so we didn’t count individual letter mistakes but overall mistakes in a command. Our results showed more errors with the terminal than the other two. The performed t-tests to examine the results showed a significant difference between both the explorer and the dual pane compared to the terminal, but none between themselves. (p-values: Terminal-Explorer: 1.4177950494795577e-05, Terminal- Double: 5.6020634451905884e-05, Double-Explorer: 0.5860…) We can conclude from this that there is no significant difference in error rate between the dual-pane and the explorer file managers, but the terminal is more prone to errors by a significant difference.