ztqakita's Blog
    • Posts
    • Introduction
    • Algorithms
      • Complexity & Divide and Conquer
      • Dynamic Programming
      • Greedy & Back-track & Branch and Bound
    • Compiler
      • Lexcial Analysis & Parsing
      • Semantic Analysis & Runtime Environment
      • Syntax-directed Translation
    • Computational Neuroscience
      • Ionic Currents
      • Neuroscience Basic Knowledge
    • Database System
      • Database System Lecture Note 1
      • Database System Lecture Note 2
      • Database System Lecture Note 3
      • Database System Lecture Note 4
    • DL
      • Convolutional Neural Network
      • Introduction to Deep Learning
      • Optimization for Deep Learning
      • Recursive Neural Network
      • Self-attention
      • Transformer
    • Life Learning
      • Architectures of neuronal circuits
      • how to model
      • Lecture James McClleland
      • Lecture Yao Xin
    • ML
      • Basic Concepts
      • Classification
      • Decision Tree
      • KNN
      • Perceptron
      • SOM
      • Support Vector Machines
    • Operating System
      • CPU Scheduling
      • File System
      • Introduction & OS Structure
      • Mass-Storage Structure & I/O System
      • Memory Management
      • Process & Threads
      • Process Synchronization
    • Paper Reading
      • Continuous-attractor Neural Network
      • Few-Shot Class-Incremental Learning
      • Integrated understanding system
      • Push-pull feedback
      • reservoir decision making network
      • Task representations in neural networks
    Self-attention

    I. Self-attention Overview Input: A sequence of vectors (size not fixed) Output: Each vector has a label (POS tagging) The whole sequence has a label (sentiment analysis) Model decides the number of labels itself (seq2seq) Self-attention can handle global iformation, and FC can handle local information. Self-attention is the key module of Transformer, which will be shared in other articles. II. How to work? Firstly, we should figure out the relevance between each vector.

    June 11, 2021 Read
    Transformer

    I. Transformer Overview Seq2seq model Input: sequence of vector Output: sequence (not fixed size) II. Encoder It is actually a Self-Attention Model! 对于一个block,它的结构可以理解为以下的形式: 与self-attention不同的是,会采用residual add的方式,将self-attention得到的中间结果$a$加上输出$b$ 经过layer normalization得到新的输出 将新的output输入到FC中,并加上residual 再次经过layer normalization得到这一个block的结果。 理解了一个block以后,整个Encoder就是由n个这种block组成的network。 首先将word进行self-attention得到word embeddding,并在其中加入positional encoding信息 经过multi-head Attention或者Feed Forward Network后,都要接上residual + layer normalization,而这种设计结构就是Transformer Encoder的创新点所在。 III. Decoder —— Autoregressive(AT) The model structure of Decoder is shown as follow: 根据上图我们将逐步逐层地解释结构: Masked Multi-head Self-attention 在产生每一个$b^i$ vector时,不能再看后面的信息,只能和前面的输入进行关系: 从细节上来看,要想得出$b^2$,我们只需将其和$k^1, k^2$做dot-product。

    June 11, 2021 Read
    Recursive Neural Network

    I. RNN Structure Overview The input is a sequence of vectors. note: Changing the input sequence order will change the output We use the same neural network to train, each color in NN means the same weight. When the values stored in the memory is different, the output will also be different. II. Types of RNN Elman’s memory restore the values of hidden layer output, and Jordan’s memory restore the values of output.

    October 15, 2020 Read
    Convolutional Neural Network

    I. CNN Structure Overview II. Convolution Note: 1.Every elements in filter are the network parameter to be learned. 2.Stride means each step you walk from previous position. 3.The size of filter is decided by programmer. From the picture we could know the largest values of Feature Map means there has a feature. Then we do the same process for every filter and generate more Feature Map. If we deal with colorful image, we will use Filter Cube instead of matrix.

    September 14, 2020 Read
    Introduction to Deep Learning

    I. Basic Concepts 1. Fully Connected Feedforward Network 2. Matrix Operation Every layer has weight matrix and bias matrix, using matrix operation we can accumulate the output matrix $y$. Tips: Using GPU could speed up matrix operation. II. Why Deep Learning? 1. Modularization 对neural network而言,并不是神经元越多越好,通过例子可以看出层数的增加(more deep)对于准确率的提升更有效果。这其中就是 Modularization 的思想。For example, while you are trying to train the model below, you can use basic classifiers as module. Each basic classifier can have sufficient training examples.

    August 10, 2020 Read
    Optimization for Deep Learning

    Some Notation: $\theta_t$: model parameter at time step t $\nabla$$L(\theta_t)$ or $g_t$: gradient at $\theta_t$, used to compute $\theta_{t+1}$ $m_{t+1}$: momentum accumlated from time step 0 to time step t, which is used to compute $\theta_{t+1}$ I. Adaptive Learning Rates In gradient descent, we need to set the learning rate to converge properly and find the local minima. But sometimes it’s difficult to find a proper value of the learning rate.

    July 25, 2020 Read
    Navigation
    • About
    • Skills
    • Experiences
    • Projects
    • Recent Posts
    • Achievements
    Contact me:
    • Email: ztqakita@163.com
    • Phone: (+86)18618180071

    Stay up to date with email notification

    By entering your email address, you agree to receive the newsletter of this website.

    Toha
    © 2021 Copyright.
    Powered by Hugo Logo