JA4 - Classification¶

Statement¶

Your learning journal entry must be a reflective statement that considers the following questions:

1. Describe what you did¶

This was the 4^th week of this course, it was about classification; I started this week reading the recommended material in the learning guide and watching lecture videos. I then did the discussion assignment which asked about the differences between normalization using the min-max and the standardization of data using the z-score. The programming assignment contained a practical example using R to apply the logistic regression algorithm to a dataset.

2. Describe your reactions to what you did¶

Classification is an interesting matter, in this week we looked at the logistic regression which is a special case of regression that suits categorical predictors, we also looked an KNN and notices how changing the value of K can completely change the results. I also learned about the LDA and QDA algorithms which are used to classify data using the Bayes theorem (James et al., 2013).

3. Describe any feedback you received or any specific interactions you had. Discuss how they were helpful¶

I did not receive any feedback that is worth mentioning.

4. Describe your feelings and attitudes¶

The discussion assignment was helpful in understanding the issue of scale and units on the appearance of graphs which may lead to wrong conclusions, I found that interesting. The programming assignment was relatively easy as the data set is small and can be easily visualized, but still, I had to do some research on the R language documentation to be able to finish it properly.

5. Describe what you learned¶

From (James et al., 2013) we learned about the issues of using linear regression for classification and categorical variables in general; we learned about widely used classifiers like logistic regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and K-Nearest Neighbors (KNN); where transitioned from each technique to the others comparing it with the previous one and the problems it solves.

The tutorials (King W, n.d.) and (Skand, 2017) were helpful in understanding the logistic regression, KNN algorithms, the idea of the odds, the logit function and how to apply them in R.

6. What surprised me or caused me to wonder?¶

How changing the value of K in KNN can completely change the results, as the programming assignment asked us to try K=1 where the value is classified as Red, and then when changed to K=3 the values is flipped to Blue, and for me personally I was not able to decide as that value was really on the border and it feels that it belongs to both classes.

7. What happened that felt particularly challenging? Why was it challenging to me?¶

The formulas in the textbook were so complex that I had to skip them and just read them without deep understanding of what they represent. Hopefully, the R language has a set of useful functions that implement these formulas and you don’t need to memorize them all the time.

8. What skills and knowledge do I recognize that I am gaining?¶

I’m now able to define the problem of classification and how complex it is; I have now a good understanding of the concept of K nearest neighbors and how it works; The LDA part is still confusing to me, but I expect that it will get better in the coming weeks.

9. What am I realizing about myself as a learner?¶

I’m realizing that it is hard to me to understand the topics without a practical example, and the labs in the books and the assignments of this course are really helping me to understand the topics.

10. In what ways am I able to apply the ideas and concepts gained to my own experience?¶

As mentioned, I have a better idea now of how to solve the problem of classification, starting of some of history data where we train a model ans predict any values that are outside the training set.

11. Describe one important thing that you are thinking about in relation to the activity¶

A problem at work is interesting that we need to classify is the offers is suitable or not for a user based on the history data of people’s reaction to this offer and how they are similar to the user in question.

References¶

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, NY: Springer. Read Chapter 4 available at https://www.stat.berkeley.edu/users/rabbee/s154/ISLR_First_Printing.pdf
Skand, K. (2017, October 8). kNN(k-Nearest Neighbor) algorithm in R. Retrieved from https://rstudio-pubs-static.s3.amazonaws.com/316172_a857ca788d1441f8be1bcd1e31f0e875.html
King, W. B. Logistic Regression Tutorial. This tutorial demonstrates the use of Logistic Regression as a classifier algorithm. http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html