I wonder if it would be feasible to have the words of two languages living in the same word2vec vectorspace. So to understand a feel for a word in a poem of the other language, you can see what summation it is of words from your language
The problem is that this provides a different experience than directly and inherently understanding all the different meanings/connotations of a word in the poem's native language.
What if one language has a word that simply has no equivalent in another language? Wouldn't that word be 'off the charts'?
Chinese has many characters that are pretty much exclusively used for names and making sounds when transliterating other languages. If we're trying to put Chinese characters and English words on a vectorspace, you're going to be missing a lot of Chinese characters which have no meaning.
(1) There is the fact that a certain character was chosen at the time, which can be full of spite, praise, ignorance, or other human characteristics;
(2) There is the character's significance in context (often personal names, toponyms and other types of names are appraised upon aesthetics - for instance if one character has 'fire' and one has 'water', or one character has 'culture' and one has 'weaponry', the name has 'balance' and suggests a certain sort of culture/inhabitant/personal character.);
(3) There is a relationship to other words and transliterations of a contemporary period or similar geospatial origin;
(4) There is the body of other uses of that character with which a particular audience may or may not be familiar, the difference between distinct audiences itself being a possible motivator for character choice;
(5) There are multiple potential pronunciations for a character (in any given historical period) which may layer meaning by suggesting other words or meanings;
... and so on. I agree that translation is not as trivial to approach naively as suggested by the parent, but to ascribe no meaning to characters used is equally wrong. As just one example, literally half an hour ago I just spent a few minutes investigating the different historical transliterations for a local toponym... at least three (3) different characters were used for the same sound through the last 1000 years, and to me this merely indicates its foreign linguistic origin (probably an Yi language, which is classified in the Lolo-Burmese branch of the Sino-Tibetan language family, being closer to Tibetan than Chinese).
Tangentially, I am involved in a cross-cultural poetry group in the city I live in China (昆明), and our pictures were recently featured in the local paper. Local people started stopping me on the street! After 15 years, living in China still sometimes makes me feel like some kind of rainbow unicorn exotic curio. Also, one of the most famous poets of the Tang Dynasty, Li Bai, was Kyrgyz.
> What if one language has a word that simply has no equivalent in another language? Wouldn't that word be 'off the charts'?
No. It'll still be mapped onto the word2vec vectors, but it would just take on bizarre, extreme, and highly unstable values due to maximum likelihood training. It'll associate that untranslatable word with all sorts of random other words like 'cheese' simply because 'cheese' happened to be used within a few sentences of the untranslatable, and the mapping will vary unpredictably with different hyperparameters, random seeds, datasets, implementations etc.
This often happens; for example, in a simple logistic regression, suppose you have 10 datapoints and 1 variable that separates them perfectly, what will maximum likelihood do? It won't give you a sensible odds ratio like 10, it'll give you an OR of ∞, because if the OR is ∞, that maximizes the chance of perfect separation since the likelihood of separation is always 1 and any finite OR would imply a slightly smaller likelihood <1. (While a logistic regression using any kind of regularization or Bayesian prior will behave much better.) You can see this in image-classification CNNs when you give them a photo of something not in the training set - they just come up with whatever is the closest match for the flimsiest and most superficial of reasons. Or if you try to use a NN for time-series prediction and extrapolate out the prediction, it'll be an exact extreme trend, because it's not reflecting any of the real uncertainty.
This is why being able to ask a NN about its uncertainty is so useful: the regular NNs will just confidently predict bullshit because that's what you trained them to do. There's some nifty stuff about this in "Uncertainty in Deep Learning" http://mlg.eng.cam.ac.uk/yarin/blog_2248.html , Gal 2016.
Why would an untranslatable word take on bizarre values? If there are enough training examples of the untranslatable word, why couldn't it actually have a sensible mapping in the vectorspace?
p.s. thanks for the reference to that thesis. I've been wondering about uncertainty in predictions for a side project to predict athletic performance that I'm working on, so this is very timely.
word2vec is trying to predict the co-occurence of words; if there is nothing translatable about it, then it will be effectively random. When you fit a maximum likelihood to random stuff and force it to attribute pattern to the noise, it's not going to come up with any sensible mapping.
But just because it's "untranslatable" from, say, Chinese to English, doesn't mean that we can't see many Chinese-Chinese co occurrences of the word, so this Chinese word would still have a vector of significance relative to other Chinese words, right?
That is- if you ran word2vec on a Chinese corpus, this word would be represented adequately in the vectorspace.
Wouldn't it then make sense, if you could find a proper mapping from one language's space to the other (I don't know if this part is realistic, but maybe by finding correspondences between already-translated words), that this word would not land in a bizarre spot were it to be expressed as a combination of vectors from the other language?
What if one language has a word that simply has no equivalent in another language?
It happens fairly often, and a typical translation is to use a phrase to represent the meaning and context, instead of a single word. A common example:
Vive la France! is often translated as Long live France!