gll89的个人博客分享 http://blog.sciencenet.cn/u/gll89

博文

Embedding layers in Keras

已有 6711 次阅读 2017-10-11 07:05 |个人分类:DeepLearning|系统分类:科研笔记| layer, Embedding

Embedding layer is to turn positive integers (indexes) into dense vectors of fixed size.

Why should we use an embedding layer? Two main reasons:


1. One-hot encoded vectors are high-dimensional and sparse. Let's assume that we are doing Natural_Language Processing (NLP) and have a dictionary of 2000 words. This means that, when_using one-hot encoding, each word will be represented by a vector containing 2000 integers. And 1999 of these integers are zeros. In a big dataset this approach is not computationally efficient.

2.The vectors of each embedding get updated while training the neural network. If you have seen_the image at the top of this post you can see how similarities between words can be found in a multi-dimensional space. This allows us to visualize relationships between words,  but also between everything that can be turned into a vector through an embedding layer.


This concept might still be a bit vague. Let’s have a look at what an embedding layer does with an example of words. Nevertheless, the origin of embeddings comes from word embeddings. You can look up word2vec if you are interested in reading more. Let’s take this sentence as an example (do not take it to seriously):

“deep learning is very deep”

The first step in using an embedding layer is to encode this sentence by indices. In this case we assign an index to each unique word. The sentence than looks like this:

1 2 3 4 1

The embedding matrix gets created next. We decide how many ‘latent factors’ are assigned to each index. Basically this means how long we want the vector to be. General use cases are lengths like 32 and 50. Let’s assign 6 latent factors per index in this post to keep it readable. The embedding matrix than looks like this:

Embedding Matrix

So, instead of ending up with huge one-hot encoded vectors we can use an embedding matrix to_keep the size of each vector much smaller. In short, all that happens is that the word “deep” gets represented by a vector [.32, .02, .48, .21, .56, .15]. However, not every word gets replaced by a vector. Instead, it gets replaced by index that is used to look-up the vector in the embedding matrix. Once again, this is computationally efficient when using very big datasets. Because_the embedded vectors also get updated during the training process of the deep neural network, we can explore what words are similar to each other in a multi-dimensional  space. By using dimensionality reduction techniques like t-SNE these similarities can be visualized (https://lvdmaaten.github.io/tsne/).

t-SNE visualization of word embeddings


From: https://medium.com/towards-data-science/deep-learning-4-embedding-layers-f9a02d55ac12



If you want to use the embedding it means that the output of the embedding layer will have 3 dimensions. This works well with LSTM or GRU (see below)  but if you want a binary classifier  you need to flatten this to 2 dimensions:


  1. model=Sequential()
  2. model.add(Embedding(3,10,input_length=X.shape[1]))
  3. model.add(Flatten())
  4. model.add(Dense(1,activation='sigmoid'))
  5. model.compile(loss='binary_crossentropy',optimizer='rmsprop')
  6. model.fit(X,y=y,batch_size=200,nb_epoch=700,verbose=0,validation_split=0.2,show_accuracy=True,shuffle=True)

An LSTM layer has historical memory and so the dimension outputted by the embedding works in this case, no need to flatten things:

   

  1. model=Sequential()
  2. model.add(Embedding(vocab_size,10))
  3. model.add(LSTM(5))
  4. model.add(Dense(1,activation='sigmoid'))
  5. model.compile(loss='binary_crossentropy',optimizer='rmsprop')
  6. model.fit(X,y=y,  nb_epoch=500,verbose=0,validation_split=0.2,show_accuracy=True,shuffle=True)

From: http://www.orbifold.net/default/2017/01/10/embedding-and-tokenizer-in-keras/



keras.layers.embeddings.Embedding(input_dim, output_dim, embeddings_initializer='uniform', embeddings_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None)


  • input_dim: int>0. Size of the vocabulary, i.e., maximum integer index +1 (that is the number of  unique intergers)

  • output_dim: int>0. Dimension of the dense embedding. It is like the length of 'latent factors'.

  • embeddings_initializer: Initializer for the embeddings matrix.

  • embeddings_regularizer: Regularizer function applied to the embeddings matrix

  • embeddings_constraint: Constraint function applied to the embeddings matrix

  • input_length: Length of input sequences. It equals to input.shape[1]


Input shape:

   2D tensor with shape (batch_size, sequence_length)

Output shape:

   3D tensor with shape (batch_size, sequence_length, output_dim).


From: https://keras.io/layers/embeddings/







https://blog.sciencenet.cn/blog-1969089-1080097.html

上一篇:TensorFlow configure problem
下一篇:Excellent websites for English-Chinese translation
收藏 IP: 128.227.147.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-7-28 12:30

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部