![]() |
Bulletin of Applied Computing and Information Technology |
Technical Report T1:Report on the eighth BRACElet workshop: BRACElet Technical Report, 01/08 AUT University, Auckland |
|
07:01 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Tony Clear Anne Philpott Simon Phil Robbins Clear, T., Philpott, A., Robbins, P. & Simon. (2009). Report on the eighth BRACElet workshop: BRACElet Technical Report, 01/08 AUT University, Auckland. Bulletin of Applied Computing and Information Technology, 7(1). Retrieved February 4, 2012 from http://www.naccq.ac.nz/bacit/0701/2009Clear_BRACElet_Report.htm 1. IntroductionThis paper reports on the activities of the Eighth BRACElet workshop held 4 July 2008 concurrent with the NACCQ Conference at AUT University. The BRACElet project is a longitudinal multi-institutional multinational investigation into the code reading, code comprehension and code writing skills of novice programmers. 2. Background - Workshops to Date
3. The Goals of the NACCQ 2008 BRACElet WorkshopThe purpose of this eighth workshop was to analyse assessment data from novice programmers contributed by participating institutions, thus furthering the inquiries into how novice programmers comprehend and write computer programs. Recognised educational frameworks of cognitive sophistication such as SOLO ( Biggs & Collis, 1982) and BLOOM (Anderson et al., 2001) were to be applied as adapted for the programming domain. Analysis activities would include assessment of consistency of SOLO ratings using recognised statistical techniques. It was hoped that the outcome of this active workshop would include plans for data collection and further analysis relevant to each institution. 4. ParticipantsThe workshop was attended by nine participants from one overseas institution and three institutions within New Zealand (Table 1). Other AUT participants attended the presentation sessions. Table 1. Sixth workshop participants
5. Preliminary Agenda for BRACElet workshopsAppendix A below gives the initial agenda for the workshops, which were initially planned as two half-day sessions, but with most of the attendees present from the outset, effectively became one session, continuing throughout the day. 6. Progress of the workshopThe workshop began with a brief review of the SOLO analytical taxonomy as adapted in prior BRACElet work for application to program reading tasks. An interpretation of the taxonomy was outlined from the NACCQ conference paper by Clear et al. (2008), to update workshop participants on the classification approach. These categories are tabulated in table 2. The workshop then moved into a practical session in which the participants broke into small groups in order to analyse a corpus of data brought to the workshop, and classify student responses according to the SOLO categories given in table 2. Sets of questions and student responses from assessments and examinations from AUT, SIT and University of Newcastle were available as data for analysis, covering different classes and question types, both reading and writing tasks. Table 2. SOLO Categories Derived in Clear et al., (2008)
The groups worked in pairs or threes to jointly conduct their analysis. 6.1. Analysis groupsThree groups worked independently to analyse and classify the contributed data applying the given SOLO categories. The first group worked on introductory writing tasks, using the AUT-supplied data from a set of very concrete exercises and responses (n=37) assigned to Programming Fundamentals students - a CS0-type course for a cohort who had been struggling with the standard CS1. The second group worked on a more intermediate set of paired reading and writing questions and responses (n=29) from a University of Newcastle examination at the CS1 level. The third group worked on a more advanced level (CS3) set of student responses (n= 12) to a pathfinder algorithm tracing assignment. 6.2. Analysis groups - findingsThe first group found themselves trying to apply descriptions that related to code reading tasks to a code writing task, and felt that although the SOLO categories were valid, new descriptions were needed. It was also felt that, unlike with the code reading tasks, the code writing tasks were quite restricted in that even a correct answer could not be said to demonstrate relational thinking. The students had been given a Book class with two attribute variables and one accessor method. Question 1 was "write an accessor method, getName, to return the name of the book". A correct answer required two lines of code (a signature and a return statement) plus braces. Question 2 was similar requiring a mutator method. Question 3 was "complete the method isAvailable so that it returns false if the book is on loan but true otherwise". A skeleton method had been provided with a local Boolean variable that had been initialised to true and returned. There was an instance variable, onLoan, which was true when someone had borrowed the book. Students were expected to write a simple if statement (returning !onLoan was possible, but not expected from these students). The group felt that these questions were of the Bloom type "change in representation" as the requirements were almost pseudocode. The group suggested that the following list (table 3) might be suitable SOLO descriptions: Table 3. Suggested SOLO Categories Applicable to Program Writing Tasks
Based on this, most of the group's responses were either multistructural (the code worked) or prestructural (the code either did not compile or did not work at all). The second group produced a set of categorisations for questions 23 and 25f from the dataset, which traversed the full set of available SOLO classifications. A brief demonstration of the application of interrater reliability on the dataset for question 23 using SPSS, and categorical data analysis with Kendall's W, indicated a very high degree of agreement [.994] between the pair of raters (see table 4 below), however the rating was conducted jointly by consensus, so high agreement would be expected. Table 4. Newcastle Question 23 SOLO Classification Test Statistics
a: Kendall's Coefficient of Concordance
Action Point: This dataset could now be independently rated by a further set of raters, which might prove illuminating. The third group worked on a more advanced level (CS3) set of student responses (n= 12) to a pathfinder algorithm tracing assignment. These responses required some careful analysis before beginning the SOLO classification. Classifying the task itself took some time. On referring to the question types of the "Common Framework" in table 5, the group concluded that it was a new type of task not represented to date. Table 5. Common Framework component IV: the Assessment Framework (ex. Whalley & Robbins, 2007)
The task required students to simulate how a pathfinder algorithm operated after observing the operation of the game software, and then being given a question with set positions marked on a grid (representing a board including occupied squares a starting square and a target square). In addition students were given a printout of the code and a sample dataset, and asked to depict that simulation of finding the shortest path to the target. In some respects it equated to a more sophisticated "fixed code - reading" type of question, and in other respects it was a "change in representation - reading" type of task. Mapping the task against the revised Bloom taxonomy (Anderson et al., 2001), we concluded that the task was probably beyond the "apply" level and better reflected an "analyse" task. We defined it as a new type of question, a "simulation task". While initially thinking that the task as assigned was innately "multistructural", on conducting the SOLO rating the group encountered one relational response wherein the student had not only traced the algorithm, but critiqued it by making observations about the advantages and disadvantages of the algorithm's design. In addition they noted that the task of devising a suitable representation to depict the response was in fact a second task, which they assessed as being at the more advanced revised Bloom's level of "create", and perhaps innately demanding a relational level of response. Had a predefined response structure been specified (e.g. scaffolding by providing some form of empty diagrammatic or tabular input framework to be filled in etc.) then a more multi-structural response level may have been achievable? Responses were further categorised by whether students had represented the "data" elements of the simulation and/or the "process" elements 6.2.1. Reflections on the ease of SOLO Classification Several implications arise from this work of the three groups in relation to SOLO classification. 1) The purpose of SOLO classification was unclear to some participants, and its value as a benchmarking and differentiating tool with an underlying theoretical basis needed to be explained. 2) The guidelines provided in table 2 from the paper by Clear et al. (2008) did not appear to guarantee consistency in classification, and it appeared that a workshop approach to develop a common understanding was required (e.g. for RE what is a ‘minor error' that retains a Relational coding, rather than a relational "fudge" which should more fairly be coded as unistructural or prestructural?). The "descriptions" in table 2 above addressed reading tasks only, not writing tasks, and needed expansion. The existing "descriptions" also needed expansion to better phrase the concepts, if rating is to be conducted based only on those descriptions - but given the comments just made in 2) is that a valid premise? Action: expand descriptions (Tony Clear) 3) The implicit value system, with its hierarchy of cognitive sophistication from prestructural upwards, proved a distractor for raters. The applicability of different "natural" SOLO ratings for different types of question did not appear immediately obvious or legitimate. This also led to arguments about the "ordinality" of the scale, especially with RE and MO set within the implied hierarchy. The scale was evidently "categorical", however, which satisfied the form of analysis in the paper. In future work it could be statistically tested for a tendency towards ordinality, at least at the major points of the scale. However, the intermediate points (RE, & ME) raised some questions, and should perhaps be excluded from any ordinal scale if ordinality was to be a confirmed property. 4) The rating exercises did suggest that "MO" was not readily distinguishable as a category from "ME" and should be culled from the scale, making it more parsimonious. So the workshop has also resulted in a useful refinement of the reading scale, depicted below in table 6, but still with the limitation of its code "reading" focus in the descriptions. Table 6. Modified SOLO Categories from those in Clear et al., (2008)
5) Finally, table 7 depicts the combined reading and writing SOLO scale, developed as a result of the workshop. This modified classification scheme now needs to be tested in the field. Notably an ‘RE'category description for writing is absent, and it remains to be seen whether this category will become evident from further work with SOLO and ‘writing' tasks. Table 7. Modified SOLO Categories to Reflect both Program Reading and Writing Tasks
7. Future work & publication ideasArising from the workshop a set of specific follow-up actions, publication targets and authorship teams were then set, as a way of carrying the work forward to tangible outcomes. Specific actions agreed were: 1) arrange access to BRACElet wiki at AUT for participants - Simon, Ken, & Mike (Tony) 2) this to include access to the original question sets from the early papers (Tony) 3) Mike to be given email list of BRACElet project members (Tony) 4) MIT students portfolio data to be included in next set of analysis, ethics approvals to be arranged (Mike Lopez) Proposed publications with due dates and teams are outlined in table 8 below. This invitation to contribute naturally also extends to Raymond Lister as one of the principals of BRACElet, who was unable to make this workshop due to ITiCSE commitments. Table 8. Publication Targets
ReferencesAnderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Raths, J., Wittrock, M. C. (Ed.). (2001). A Taxonomy for Learning and Teaching and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives, : Addison Wesley Longman Inc. Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome) . New York: Academic Press. Clear, T., Whalley, J., Lister, R., Carbone, A., Hu, M., Sheard, J., et al. (2008). Reliably Classifying Novice Programmer Exam Results using the SOLO Taxonomy. In S. Mann & M. Lopez (Eds.), 21st Annual NACCQ Conference (Vol. 1, pp. 23-30). Auckland, New Zealand: NACCQ. Kerievsky, J. (2006). Smell to Refactoring Cheat Sheet . Retrieved 02 Aug 2008, from http://weblogs.java.net/blog/wwake/archive/2006/05/smell_to_refact.html Lister R., Adams, E. S., Fitzgerald, S., Fone, W., Hamer, J., Lindholm, M., McCartney, R., Moström, E., Sanders, K., Seppälä, O., Simon, B., and Thomas, L. (2004). A multi-national study of reading and tracing skills in novice programmers. SIGCSE Bulletin 36:4, 119-150. Lister, R., Whalley, J. & Clear, T. (2006). For Discussion: A Framework for a Meta-Project on Students Programmers (BRACElet Technical Report No. 0106). Auckland: Auckland University of Technology. Parsons, D. and Haden, P. (2006). Parson's programming puzzles: a fun and effective learning tool for first programming courses. Eighth Australasian Computing Education Conference (ACE2006), Hobart, Australia, 157-163. Soloway, E. (1986). Learning to program = learning to construct mechanisms and explanations. Communications of the ACM 29:9, 850-858. Whalley, J., Clear, T., & Lister, R. (2007). The Many Ways of the BRACElet Project. Bulletin of Applied Computing and IT. Retrieved June 3, 2007 from http://www.naccq.co.nz/bacit/0501/2007Whalley_BRACELET_Ways.htm, 5 (1). Whalley, J. L., Lister, R., Thompson, E., Clear, T., Robbins, P., Kumar, P. K. A., & Prasad, C. (2006). An Australasian Study of Reading and Comprehension Skills in Novice Programmers, using the Bloom and SOLO Taxonomies. Proceedings of the Eighth Australasian Computing Education Conference (ACE2006) , Hobart, Australia CRPIT, 52 : 243-252. Whalley, J. & Robbins, P. (2007, Jun), Report on the fourth BRACElet workshop. Bulletin of Applied Computing and Information Technology Vol. 5, Issue 1. ISSN 1176-4120. Retrieved December 17, 2007 from http://www.naccq.co.nz/bacit/0501/2007Whalley_BRACELET_Workshop.htm 03 Aug 2008 Appendix A
"A Multi Institutional Study of Program Reading and Writing Skills in Novice Programming Students". Purpose These workshops are open both to current participants in the Bracelet project and those with an interest in analysing how novice programmers acquire skills in reading and writing programs. The workshops will be run as active working sessions with the aim of producing a tangible output, and informing the next phase of the study for each participating institution. It is hoped that the outcome of the workshop will include plans for data collection and further analysis relevant to each institution. It is intended that the results of the workshop may be published in the form of a Bracelet technical report. The workshops have been designed in two parts, to enable participants to either participate for the full day, or attend other half day workshops if they wish. The Bracelet study, initiated in 2004, has been inquiring into how novice programmers comprehend and write computer programs. Partici pants will analyse assessment data from novice programmers contributed by participating institutions. Recognised educational frameworks of cognitive sophistication such as SOLO and BLOOM will be applied as adapted for the programming domain. Analysis activities will include assessment of consistency of ratings using recognised statistical techniques. Preparatory work for the workshop For the morning session participants are requested to bring with them any data collected from their institutions' examinations or other assessment instruments. Ideally this data will include two pre-identified code reading and code writing questions with answers, deemed by the contributors to be at a comparable cognitive level. This data may be sent to the coordinators beforehand, or attendees will bring them along as resource materials for discussion and review at the workshop. For the afternoon session participants should be able to work with the data contributed in the morning session.
NACCQ BRACElet Workshop Agenda (AUT University 4 July 2008) **MORNING SESSION** 11:00-11:40 Intro & welcome. Brief presentation outlining SOLO taxonomy from forthcoming NACCQ 2008 paper, and reviewing contributed data available for analysis. Select one or two questions and work through answers together assigning SOLO categories to responses to demo working of process. 11:40-12:30 Break into two groups and conduct SOLO analyses on assigned questions and datasets. Capture results into spreadsheets. One group will work with Phil's data from the programming fundamentals group. The other group will include Anne's "simulation task" assessment data for more experienced students, which will involve more exploratory analysis of the intersection between reading and writing code. The artefacts are richer with a focus on the design of algorithms. We hope to capture useful results nonetheless. Questions here will first involve task identification, then categorisation. Working with this set of data is expected to continue after lunch. 12:30-1:00 Lunch 1:00-1:30 General, audience led discussion about the outcomes, issues identified, category boundary definitions and exemplars. Review of participants' contributed questions and their relationship to categories identified. 1:30-2:00 Continue with working on Anne's data in two groups. 2:00-2:20 Discussion relating to insights arising. End with a quick (10 minute?) preview of the remainder of the day, in case anyone is still wondering whether to stay or go to another workshop. **AFTERNOON SESSION** 2:30-4:30 Brief recap on morning's work for new participants. Introduction to statistical methods for assessing reliability of multiple raters. Continue to work in groups, but individually conduct analysis of interrater reliability [IRR] from assigned question (explain in plain English from Phil), using SPSS and "Kendall's W". May also need to have one group replicate analysis of Anne's data, in parallel, prior to assessing IRR. 3:30-4:00 Afternoon Tea 4:30-6:00 General, audience led discussion about the findings, highlighting problem areas for further analysis, and a plan ahead for publication and future work at each institution. In groups review questions with low interrater reliability, and discuss problem questions. Where possible, develop agreed rubrics to accurately classify responses, and highlight boundaries between classifications. Discuss plans for extending BRACElet in each institution. Publications Discussion A revision of the BRACElet "compact" (e.g. how you get to be a co- author on the next study). Arrange for jointly authored publication(s) arising from the workshop, e.g. Technical Report, BACIT special edition/section, targeted conferences e.g. ACE, Koli Calling, SIGCSE?, ITiCSE Home | Issue Index | About BACIT
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||