Prediction Algorithm Output and what it means: If all my predNB values are under 0.50, is it ever guessing "YES?" When I search through my predNB vector ( the output vector of the predict(NB) function that contains the likelihood of success for every
observation) by using a " max(predNB)" statement, it returns the value -- 0.13. I am assuming this means that the maximum likelihood of a
stock having a 10% gain day is 13%....
If that is the case, will the algorithm predict "NO" every single time -- because it will never see a likelihood of success over 50%? And if
that is true, why would the AUC metric (accuracy) ever be below 99.3, since the total number of observations that saw a 10% gain represent
0.7% of my total dataset. ( which would mean that by simply guessing "NO" every single time, I should only be wrong 0.7% of the time.
But my AUC is certainly not 99%.
I'm just a little confused about how the algorithm is working. The math of things is not adding up.
Answers and follow-up questions Answer or follow-up question 1
The AUC is a ranking measure. In other words, it measures how well your predictions rank the observations.
Note that the scores (predictions) are not probabilities, but merely scores.
We can only call them probabilities if the scores are calibrated (see later in the course).
The actual magnitude of the scores is irrelevant. What is important is that if you sort
the observations from high to low by the predicted score, then the higher ranked observations
are more likely to have y=1 and the lower ranked observations are more likely to have y=0.
I will cover this in depth in the section about model evaluation.
Also do not confuse accuracy with AUC, they are completely different performance measures,
and I will cover both in detail later in the course.
Michel BallingsSign in to be able to add an answer or mark this question as resolved.