JA6. Learning Journal for Unit 6¶

Statement¶

Your learning journal entry must be a reflective statement that considers the following questions:

1. Describe what you did¶

This was the 6^th week of this course; It was about testing and verifying the correctness of IR systems, and available metrics to monitor various aspects of IR systems. I started the week by reading the required reading and taking notes, and then watched the lecture notes along with some additional YouTube videos. I also did the discussion assignment and I also did the self-quiz.

2. Describe your reactions to what you did¶

I found the topic of this week interesting but it was hard to implement it practically; this was clear in the discussion assignment as we had to assume certain things and build our arguments and claims on top of these assumptions. I also was a bit relieved as there is no programming assignment this week.

3. Describe any feedback you received or any specific interactions you had. Discuss how they were helpful¶

I did not receive any feedback that was worth mentioning.

4. Describe your feelings and attitudes¶

I felt confused and how complex the topic was while reading the discussion assignment responses of my classmates; each one of them assumed things differently and built their arguments which did not convince me most of the time; some people forgot to mention their assumptions and you just see them using some numbers without any explanation of where these numbers came from.

5. Describe what you learned¶

The week started with talking about the gold standard (ground truth) and how it affects the relevance; and how an IR system is only as good as it satisfies the user needs, and there are no metrics that can measure that precisely. The text also explained the difference between a query and an information need, where information needs are extracted from queries but they do not necessarily match the queries (Manning et al., 2009).

The text then moved to talking about standard test collections, and how each of these contains some queries, a set of relevant documents, and relevant metrics to each query. The queries from these collections are sent through the IR system, and their results are compared to the data associated with the query in the collection; the closer the results from the data, the better the IR system is (Manning et al., 2009).

The text then moved to talk about the metrics that are used to evaluate IR systems ranked and un-ranked; the metrics for un-ranked systems are precision, recall, accuracy, and F-measure; and the metrics for ranked systems are precision-recall curves, MAP (mean average precision), precision at K, R-precision, break-even point, ROC (receiver operating characteristic) curves, sensitivity, specificity, and accumulated gain (Manning et al., 2009).

6. What surprised me or caused me to wonder?¶

I was surprised by the many metrics available to evaluate ranked systems, and how they differ from evaluating un-ranked systems where results mainly consist of unordered lists of documents.

Another surprising thing was the big difference in the size of the test collections; where the text mentioned that the Cranfield collection contains 1400 documents, and the GOV2 collection contains 25 million documents; and other collections are focused on cross-language IR (Manning et al., 2009).

7. What happened that felt particularly challenging? Why was it challenging for me?¶

Understanding the differences between the metrics and their meanings was quite hard for me; I had to read the text multiple times to understand the difference between precision and recall, and I still don’t get it fully, let alone the more advanced metrics like MAP and R-precision. The confusion around this area was also clear in the discussion assignment and various assumptions that people made.

8. What skills and knowledge do I recognize that I am gaining?¶

I understand now the broad picture of how IR systems are evaluated, and the metrics that are used to evaluate them. I also understand how the standard test collections are used to evaluate IR systems, how they are built, and their importance in the IR field.

9. What am I realizing about myself as a learner?¶

I realized that I am not comfortable with the mathematical side of IR; every time I see a formula, I get confused and I have to read it multiple times to understand it. I am more comfortable with the programming side of IR; and interested more in the practical side over the theoretical side of IR.

10. In what ways am I able to apply the ideas and concepts gained to my own experience?¶

I started to build my own IR system, using Typescript as I’m more comfortable writing it; I will use this project to learn more about programming and IR problems; I intend to build things from scratch and not use any third-party libraries as the goal is to learn and not to build a production-ready system. I already hosted the code on GitHub, and I will make it public once I have something to show.

References¶

Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge University Press. Chapter 8: Evaluation in information retrieval. Retrieved from http://nlp.stanford.edu/IR-book/pdf/08eval.pdf