JA5. Learning Journal for Unit 5¶
Statement¶
Your learning journal entry must be a reflective statement that considers the following questions:
1. Describe what you did¶
This was the 5th week of this course; It was about possible improvements to the way we rank documents and how to put the components of an IR system together. I started the week by reading the required reading and taking notes, and then watched the lecture notes along with some additional YouTube videos. I then started working on the assignment, which was to implement a simple search engine using the vector space model. I also did the discussion assignment and both graded and self-quizzes.
2. Describe your reactions to what you did¶
I found the topic of this week interesting but it was hard to implement it practically; every time, I would read something from the book and then when I go back to the code to try to implement it, I would start to get confused; but I spent an interesting 10 hours implementing the search engine and I learned a lot.
3. Describe any feedback you received or any specific interactions you had. Discuss how they were helpful¶
I did not receive any feedback that was worth mentioning.
4. Describe your feelings and attitudes¶
I felt confused between the theory that I learned while reading the text and the actual implementation of this theory in the programming assignment; I also felt grateful for my classmates as I read interesting discussion posts on how to determine what type of query to use, and the difference between each of these types.
5. Describe what you learned¶
This week was all about possible improvements to the way we rank documents and how to put the components of an IR system together. (Manning et al., 2009) starts the 7th chapter talking about inexact top K documents, index elimination, champion lists, static quality scores and ordering, impact ordering, and cluster pruning. All of these are efficient methods to improve the ranking of documents.
Then, it talks about the components of an IR system, which are tiered indexes, query-term proximity, designing parsing and scoring functions, and putting it all together. Finally, it talks about vector space scoring and query operator interaction.
6. What surprised me or caused me to wonder?¶
I was surprised that I built a search engine that works; although it does not like anything we particularly learned in this course so far, it was a good experience. I was also surprised that ranked documents can still be used to answer boolean, wildcard, and phrase queries; although it may not intuitively make sense.
7. What happened that felt particularly challenging? Why was it challenging for me?¶
The most challenging part was implementing the search engine or the IR system; as I couldn’t decide what exactly I build. I had to read the book multiple times and watch the lecture notes multiple times to understand the theory behind various parts of the system, but when I went back to the code, I would get confused again.
Finally, I used trial-and-error to build the system; where I picked a query and manually (using the text editor search capabilities) and searched for the relevant documents to this query; then I would run, log, debug, and change the relevant parts of the code until I get the desired results.
8. What skills and knowledge do I recognize that I am gaining?¶
During my work on implementing the IR system; I learned about the vector space model, the definition of vectors, how to do vector multiplication, and how to calculate the cosine similarity between two vectors. I learned about Python language features like lists and dictionaries; and how to use them to implement the IR system.
9. What am I realizing about myself as a learner?¶
I realized that I am comfortable writing code using Javascript/Typescript, as I have been doing it professionally for 5 years, but it was not hard for me to pick up Python and start writing code in it; I also realized that I translate code to Typescript in my head before I write it in Python.
10. In what ways am I able to apply the ideas and concepts gained to my own experience?¶
The optimizations for IR ranking are just great ideas that can be taken out the of IR context and be applied to various other contexts in the programming world. For example, the idea of tiered indexes can be applied to a database system; where we can have multiple indexes for the same table, say per year, and then we return the results from the most recent index first, and then the older indexes, if and only if the user went through all previous results or deliberately asked for a specific year.
References¶
- Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge University Press. Chapter 7: Computing Scores in a Complete Search System. http://nlp.stanford.edu/IR-book/pdf/07system.pdf