Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's unpopular to talk about, but there is quantifiably more crime in some minority communities in the US.

There are some crimes, like drug use, that are indeed over-reported in some communities due to over-policing.

But there are other crimes, like homicide, that are nearly universally reported, regardless of where they are committed. And those crimes are much more frequent in minority-majority parts of the country.

Turning a blind eye to this problem is a disservice to those communities, because they are the same communities that are most commonly the victims of crimes.



The problem is that these systems are fundamentally unable to distinguish causation from correlation. Which is admittedly a hard prob even for humans, but at least we have some capacity to tease these out.

In this case, the increased crime in these communities are not caused by their minority status, but rather from a multitude of other factors with historical and societal origins.


> The problem is that these systems are fundamentally unable to distinguish causation from correlation. Which is admittedly a hard prob even for humans, but at least we have some capacity to tease these out.

I'm not sure why causation vs. correlation is relevant here. When, say, someone is up for parole the judge will review their past criminal history, conduct in prison, whether or not they will have a support structure when they are released, etc. None of these factors are causal. A judge cannot point to a past crime, or to conduct in prison and say "this factor will cause you to reoffend". No, the decision is made based on factors that are correlated with higher rates of recidivism.


It matters which correlating factors you choose. While the criteria you mentioned are merely correlated for a specific individual, on a macro level there are definite causative reasons why a lack of a support structure etc could lead someone to recidivism. A human is able to look at all of those and consider whether they are plausible causes and, importantly, test their assumptions.


The USA has spent 4+ centuries kicking certain communities down the Maslow Heirarchy, this was done on purpose through laws introducing redlining, Jim Crow, no access to the GI Bill, slavery etc...I'd like a world where we could judge people fairly on the same standard, but we purposely created economic underclasses and that comes with a certain amount of desperation that leads to crime.

White Americans = 70% of pop. , $100 trillion in wealth.

African Americans = 14% of the pop, $2.9 trillion in wealth

African Americans in particular owned almost 10-12% of the land in this country (true wealth) and were promised more from the government in reparations (40 acres) not long ago, but discriminatory policies stripped that land away from them over 100 years.

Being in the USA for 150-200 years results in atleast 500k-1million $ in wealth purely due to land and home value appreciation.

The average black person has about ~500$ in wealth. This isn't a fluke, this was designed... This is a country that criminalizes being poor more and more in many ways, so we can fall into the trap of revictimizing the underclass we created, this time using algos, if we are not careful.


Yes, African Americans are disadvantaged as compared to whites. But how is that relevant to me previous comment?

When a judge is evaluating whether or not to grant parole, is he or she looking for a causal factor that will directly cause the convict to re-offend or is the judge evaluating the convict's situation with factors that are correlated with re-offending?


And you're also confusing correlation with causation here. Yes, "governmental policies have discriminated against African Americans and prevented the accumulation of wealth" is a valid hypothesis, but the data you've presented doesn't show it.

Showing causation with whole-group statistics is very, very hard.


It's not a hypothesis. It's hard to track money flows but not as hard as the anti-reparations crowd make it seem.

The majority of the $500k-1m baseline in white middle-upper class wealth is in homes and land handed down as inheritances, this is not debatable. That property allows a certain amount of leverage to invest in education, businesses etc...

The ben & jerry's founder talked about this in vivid detail on the campaign trail with Bernie Sanders. If he was black where he grew up, no GI bill = no cheap housing = no appreciation of property/land over his childhood = no financial leverage to build his company.

I can take risks and fail without going bankrupt thanks to familial wealth, that is a tremendous luxury not afforded to the group I'm mentioning.


Unfortunately for that argument, the vast majority of white wealth is not in the hands of the middle-upper class, and I highly doubt that the wealth of the 1% (or .1%) is mostly in homes and land inheritance. And if you're counting "middle-upper" as 80-95%, I think the top 5% has ~70% of the total wealth in the US... (for reference, 95th percentile is ~2.5m, or comfortably above your 500k-1m baseline. 1m is 88th percentile, 500k is 80th)

(And in any case, my point was just that it wasn't shown as causation in the stats you cited)


>...the vast majority of white wealth is not in the hands of the middle-upper class

You are correct on that point. More than 50% of the 100$ trillion is held by the top 10%.

On the other point, my point is the first $500k to 1million of wealth was due to inheritance... not the total wealth of the 1%.

Follow up: At one point, any white male could move westward and get free land (whether indegeous owned or not). 40-100 acres after a century+ of property value appreciation is quite a bit of wealth. The $500k-1m number I use as a baseline for white wealth is very conservative.


Cynically, does a bank care why zipcode is correlated with failure to repay loans? Forcing the bank to act as if the probability distributions are different from how they really are sounds like a very awkward way to redistribute wealth. Maybe the right answer would be reparations for whatever society did to everyone in those communities and then total freedom of association for banks and others to give loans as they see fit. Another option would be to give a bad credit subsidy to people in the cases where their low estimated trustworthiness is judged to be due to someone else's error. For example if a factory gives me lead poisoning, my interest rates go up because people with high blood lead content are less likely to keep good credit, but the factory has to pay the difference because they were liable.

This reminds me of the argument for the UBI. Instead of having thousands of tiny chairty programs sprinkled all over society, why not let the economy make itself as efficient as possible and then hand out the charity in units of dollars?



> unable to distinguish causation from correlation

More specifically, the current breed of machine learning is correlation engine, the only strength these networks have is finding correlations in the absence of context or explanation.


> there is quantifiably more crime in some minority communities

Sure... and what about the chronic lead exposure issues in many of these areas, the effects of long-term overpolicing of minor crimes on family stability, and the impact of (lack of) generational wealth and education from being used as de facto or literal slaves for generations?

The 'minority' part here is a correlative factor, not a causative one.


An AI classifier doesn't know, or care, why a particular population has higher crime and lower income, it only recognizes and reports on the pattern.


...which is exactly the kind of "treating culturally- and situationally-contextual results like inherent facts" bias the article is talking about.


And if those results change, so will the algorithm's outputs! But asking the algorithm to make the change seems to be a bit much.

Honestly, my preferred method of solving this would be to train the algorithm on a data set with all of the forbidden values included along with anything else the creator feels relevant - zip code, income, familial status, favorite sport, education - and then when running in production, against real people, don't give it the restricted information. Yes, you could theoretically extract race, gender and other protected stats from the information the algorithm actually uses in prod - but it has no incentive to, since a less-noisy signal is already provided.

For instance, suppose the optimal algorithm for your data set is some linear function of X, Y and Z - let's say X+Y+Z to keep things simple. X,Y and Z are all normally distributed variables, mean of 0 and the same standard deviation. Y has a 0.5 correlation with X, and a -0.5 correlation with Z. If not provided Y, your algorithm might come up with 1.5X+0.5Z as an approximation - extracting a bit of the signal for Y from the things it does have access to. It's suboptimal, but better than just X+Z. Unfortunately, Y is verboten - we're not allowed to discriminate on it, and this approximation ends up with results that track Y. So instead we train with X, Y and Z as inputs, so the derived model is X+Y+Z - and we can drop Y from that model in production, leading to a model that (while less accurate) shouldn't unfairly track Y.


> And if those results change, so will the algorithm's outputs! But asking the algorithm to make the change seems to be a bit much.

The problem is that the output of those algorithms is used to drive decision making that has the effect of maintaining the status quo, by removing opportunities to change it.


The real problem here is a deep political schizophrenia in modern society, or at least parts of it, which demands decisions be deliberately biased towards the outcomes they politically desire. These people then turn around and describe results that are not biased as "biased", which is utterly Orwellian.

I think your comment shows that you understand this. You accept that a decision may be correct, when measured in totally cold and statistical terms. But such decisions would not "change the status quo" and that would be a problem.

But that position is a deeply political one. Why should decisions at banks, tech firms, or wherever be deliberately biased to change the status quo? It's social engineering, a field with a long and terrible track record of catastrophic failure. Failure both to actually change reality, and failure in terms of the resulting human cost.

Injecting bias into otherwise unbiased decisions by manipulating ML models, or by manipulating people (threatening them if they don't toe the line), is never a good thing.


Maintaining the status quo is also a political position, though. In general, there's simply no way to interact with other people at scale without politics coming into play. It can be inadvertent, in a sense that there was no specific intent for "social engineering" - but if one's ethics prioritizes outcome over intent, it doesn't really matter.


Which loops back around to my original point, which is that the notion that we should alter or influence the algorithm because its output does not match our worldview or politics is not the removal of bias it is the deliberate injection of our personal biases.

The whole point of using the algorithm was to make sure personal biases aren't impacting the decision. If we're going to alter the algorithm because we don't like the result, then why are we bothering to use an algorithm in the first place? Just use a human to make the decision. At least in that scenario potential biases have an identifiable source, as opposed to an opaque program that may have been made by engineers that deliberately tuned it to avoid any disparities because they think any disparity in outcome is fundamentally problematic


This is exactly the problem. An AI classifier without sufficient context is simply a very efficient discrimination machine. We must make sure that AI systems have enough context and protection from human bias before large scale deployment.


EXACTLY


But I think you're missing the point. Deliberately altering the model to produce equal results despite unequal patterns in reality is not the elimination of bias, it's introducing bias.

Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?


The answer entirely depends - "Deliberately altering the model to produce equal results despite unequal patterns in reality" is so vague as to be meaningless in this discussion.

> Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?

The goal is to use the results of a model for something (pet peeve, I hate the use of "AI" to describe what are usually pretty standard statistical or ML models). The model you create, and how you apply/interpret it, depend entirely on what you're actually trying to accomplish or change with the results.

Depending on what that is, the kind of "bias reflection" we're discussing is hugely problematic.


And that "something" we want the AI to do is usually grounded in the real world in some way, be it shopping patterns, or crime. And the real world has disparities.

For example, crime rates are not equal between men and women. If we force our AI to assign equal risk of crime to men and women then we will have introduced a bias that either under predicts the rate of male crime, or over prdicts the rate of female crime.


"Truth" is irrelevant and academic.

What matters is: What are the outcomes and consequences of active systems, AI or not.

For instance: How do the algo cope with derivatives of its own output being fed into itself as input at a later stage?


If truth is irrelevant then just return random output and call it a day.

The reality is, truth is relevant and sometimes the truth is inconvenient. Tech workers may want to build an AI that measures risk of recidivism that produces uniform risks across race and gender. But the truth is, rates of recidivism is not the same across all groups. If we produce the desired outcome of equal reporting of risk, then the consequence is that men have their risk underreported to put them on parity with women, or vice versa.


It depends on what you're using it for and why. If you're concretely distributing police resources, you probably want short-term prediction of actual aggregate crime rates (but also want to consider the risk of overpolicing based on crime rates only recorded higher because of historical overpolicing). On the other hand, if you want to understand anything about the populations involved on some abstract level, merely plugging in predicted crime rates based on historical data won't help at all.


This comment chain is in reference to law enforcement and the justice system. I would hope that our law enforcement and justice systems are operating on real world data, rather than a hypothetical world's data.


Yes, law enforcement should use this model to determine where to assign police resources to minimize crime.

The justice system should absolutely not use this system for any purposes, since justice is based on the circumstances of the individual case in front of it, not the societal statistics which apply in the aggregate but may not apply to that specific case.


> Yes, law enforcement should use this model to determine where to assign police resources to minimize crime.

And if an activist engineer deliberately biased the model to avoid indicating disparities in crime, then we will have sabotaged police's abilities to allocated resources. Hence, why this assumption that disparate outcomes are indicative of a biased model is a problem


You're still not getting it.

We already know where crime is occurring. We don't need an AI model for that.

People aren't arguing about not using biased data, they're arguing that the model needs to be designed and trained so that the bias in the data doesn't affect the predictions in the model. And yes, that means deliberately de-engineering bias out of the model, which may involve introducing a counter-bias.

For example, you and others kept bringing up race earlier as a legitimate bias for criminal profiling. But socioeconomic status is far more correlated with propensity to commit criminal acts than race. A model of crime in LA based on race, for example, would assume that people in Ladera Heights are just as likely to commit crime as people in South LA because they have the same race...but Ladera Heights has a fraction of the crime as South LA (and several times the average income). Similarly, you would expect South LA to have less crime than the largely Caucasian Joshua Tree or Fontana...but both cities have higher crime rates than South LA, and for a period were some of the most dangerous cities in California. (Joshua Tree was the inspiration for, and original setting of, Breaking Bad. Fontana used to be known as the Felony Flats.)


No, I have repeatedly and explicitly stated that protected classes like race should not be used as inputs into any sort of model. The notion that I have said that we should use race as an input to a model is completely false.

What I am saying is that pointing to disparities in the outcome of these models to claim that the models are biased is not, on it's own, a reasonable conclusion. As you point out, people in Joshua Tree and South LA have higher rates of crime than average. So if our model flags people from these areas as high risk more frequently than other places, is our algorithm biased? If we deliberately make the model to produce uniform results across different locations because an engineer feels that it's problematic to have a model that produces different results between different geographic cohorts, then have we mitigated bias? No, that engineer intentionally introduced our own bias to make the model adhere to his or her worldview.


I would absolutely not want the justice system to use historical statistical information without considering factors like lead exposure and past overpolicing, and even pure boots-on-ground distributions of law enforcement resources should take into account historical over- and under-policing and its effects on the statistics being used to distribute those resources.


Okay, that's your position. Plenty of people do not want an AI that, say, gives parole to someone that is a high risk of re-offending because someone engineered the AI to be more lenient towards people who grew up in areas with greater lead exposure.


Consider if someone grew up in an area of high lead exposure but was, one way or another, protected from it as a child [1]. Should they be treated as having the same recidivism risk factor as someone who suffered the full effects of lead exposure? If not, how do you separate the effects of lead exposure from other factors to recidivism rates?

[1] See the 1999 documentary Blast from the Past.


The better approach is to define what inputs are going to be used, explicitly. Some like race and religion are going not going to be used, as discrimination on the basis of race or religion is off limits. But age, past offences, and the like probably are

But I think we're diverging considerably from the original point: that forcing an AI to produce equal outcomes despite unequal behavior in the real world is not the elimination of bias, it's the deliberate introduction of bias. If we have an AI that predicts recidivism rates, and we engineer it to produce equal predicted rates across all groups despite different between rates groups in the real world then we are deliberate introducing bias. The truth, regrettable thought may be, is that a magical AI that operates with 100% accuracy - the only people who it flags would have re-offended - is going to produce disparities because recidivism rates are not equal.


I'm sure an algorithm looking at the current state of the world would surmise Europeans are geniuses when compared to the rest of the world population. Totally ignoring the $150+ trillion of wealth stolen at gunpoint over the last 2+ centuries.

The blindspots end up being extremely problematic.


Can you elaborate on how European imperialism is related to whether or not we should take lead exposure into account during parole hearings in the United States?


>Deliberately altering the model to produce equal results despite unequal patterns in reality is not the elimination of bias, it's introducing bias.

Statisticians refer to this process as "controlling for confounding factors." What really matters is what questions you're asking. Data is too often abused, not always intentionally, by people with vague questions.


You're missing the point when asking the final question. Predicting crime rates is not the goal in itself (unless you're running some kind of crime rate bet).

If you're going to use a model trained on simple, biased data, to get, say, insurance estimates for a for-profit company, the model will probably successfully increase profits, so it was a good model.

On the other hand, if you're going to use the same model to help with sentencing, where your goal is to see equality and justice, then the model will do very badly, since it will punish many people for the community/skin they happened to be born in.


The problem is, some groups do commit crimes at higher rates than others. If we are engineering the model to produce outputs with no disparities when disparities do exist in the real world, then our model is going to be biased.

For instance, men commit more crimes than women. If we are building an AI that predicts risk of committing crime (say, estimating rates of recidivism) and we forcibly make it report equal rates between men and women then we will be creating a discriminatory system because it will either under report the risk of men or over report the risk of women in order to achieve parity. Engineering parity of outcome in the model when the real world outcomes have disparities necessarily results in bias.


Is it introducing bias or is it introducing appropriate weighting? The most crime filled neighborhood in America may not be the inner city you have pictured in your mind but Lower Manhattan.


Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure?

What a terrible example. There are many other features being omitted that would predict crime rates. If anything, this is an example of enforcing your own bias on the model by not including all relevant features.


Great then go ahead and include additional features. The problem is, recidivism rates in real life are not equal so as the model increases in accuracy it will inevitably produce disparities in its results.


If you already know what you want the model to say ahead of time and are tweaking it to fit that narrative then there is no point in creating a model in the first place.


You are correct about the distribution of certain crimes, but you've entirely missed the point.*

Problems like violence have multiple causes, but are widely understood to be linked to problems of poverty, inequality, and marginalization. Violence is also self-perpetuating through social networks. When an incautious user of statistical tools fails to investigate the causal story behind such findings, they're going to get the wrong answer. When a policymaker acts on incomplete or misleading findings, they're going to make the problem worse.

*Unless you're arguing that it's the fact of being non-white which makes people more violent, in which case I have nothing to say to you.


> Unless you're arguing that it's the fact of being non-white which makes people more violent

This is not how I interpreted OPs comment at all. I just read it as a note that data shows correlation between minority communities and crime. Minority is a flexible, relative term, to my understanding - which, notably in recent years is frequently attributed to "non-white", but over time has been attributed to many different groups of people (caucasian as well).

Interpreting it as meaning "non-white" in this context is yet another good example of our predisposed biases and relativistic understandings at play, I suppose.

Not to harp on you - just noting the differences in our perceptions.


I totally understand, and maybe I shouldn't have included that, but I do see phrases like "It's unpopular to talk about" get used as dog whistles all the time in other forums (and in real life). There's a whole subculture dedicated to dressing up racism in pseudo-scientific language (race realism etc.)

At the end of the day, there's enough real, vile bigotry out there in our society, I think it's important to be extra clear when discussing topics like this.


I'm glad you did. It's a great example of the problem we are discussing.

I agree, there is are many negative biases in the real world. More so, I'd argue, on the internet, where people feel safer at a distance to state disagreeable opinions.

It's worth noting, especially when we are discussing the nature of how an AI learns.


There is more crime in poor communities everywhere, and minority communities are more likely to be poor. Moreover, the sorts of crimes that poor people commit are more likely to be prosecuted, and minorities are more likely to be prosecuted for them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: