No, it's not the same as overfitting. It's nothing to do with the algorithm at a...

CardenB · on Nov 14, 2019

I think you’re arguing semantics a bit. What you’re saying checks out, but one could say that overfitting was occurring but the test dataset distribution was not wide enough to catch it.

etbebl · on Nov 14, 2019

As I understand it, if it's overfitting, testing on a second random sample gathered in the same way as the training set should degrade performance. That's not the case here.

(Though maybe the term as used in industry is less strict.)

rcthompson · on Nov 14, 2019

It's not semantics. Overfitting is a completely unrelated problem. Overfitting is an issue with the machine learning algorithm and how you're applying it and can be fixed by changing how you use the algorithm, while what we're taking about here is a problem with the training data. There is nothing that can fix a biased training data set short of getting new training data that doesn't contain that bias.

It would be like if your car was driving in circles and you called a mechanic to fix your steering, and they told you that the actual problem was that both right wheels were missing. That's not a steering problem, and no repair to the steering system will fix it. The only fix is to put new wheels on.

Izkata · on Nov 14, 2019

Overfitting is about being too precise because of the sample inputs, such as if "downward angle" + "brown blob" (one specific dog breed) + "leash" + "lots of green" (grass) was required to identify a dog. GP's example wasn't that, it was just identifying the wrong thing.

XuMiao · on Nov 15, 2019

Instead of overfitting , it's more related to exploitation vs exploration. We see more men related to programming might be just that women are not given opportunities to explore the programming as a career.

When AI makes a decision, right now, people only uses the probability output. Hiring A has .6 probability while hiring B has .4. then we will hire A instead of B. However, if we consider the confidence intervals, the decision might not be that clear. Say +/- .5 to hire A but .2 to hire B. If exploration is considered too, very likely that we will give B a chance.

AI is in the realm of probabilistic decision making, while normal people don't follow. The bias is not from the training side. It's the decision making process incorporating AI should change.