【CS224n】(assignment3)Dependency Parsing(cs231n assignment3)
學習總結

(1)關于5個assignment的難度,可以參考斯坦福大佬的CS224作業評論,大體是說今年的transformer成為課程重點,由總助教博三大佬John講,他原本還想讓同學們手寫encoder-decoder(后來同學們接受不了就取消了hhh),assignment5最難,需要分別訓練一個vanilla模型和預訓練模型、比較結果;前三個assignment則是和往年一樣,4和5是2021年新加的。
(2)作業1里大家簡單探索了詞向量的性質;作業2里我們推導了訓練詞向量的公式(這是這節課最calculus-intensive的作業);作業3算是唯一一個涉及比較傳統的語言學概念與算法的作業,是關于 Dependency Parsing(依存句法分析)的。作業4是搭建一個機器翻譯模型,只是目標語言變成了Cherokee(美國原住民的語言之一)。
(3)作業5是今年緊跟NLP大趨勢,“重磅”新推出的:在數學部分,我們探索了Multi-head Attention的性質;在編程部分,我們需要復現一些預訓練數據處理的代碼(span corruption),以及實現Attention的一個變種。
(4)stanford nlp組只有4~7名教授,CMU有30+,除了NLP以外,機器翻譯、問答系統、搜索引擎等等,都有專門的課。stanford是一年4學期(每學期10/11周),所以課程里很多任務(如信息抽取、對話系統)來不及涉及。由于時間限制、科技趨勢,課程里偏語言學的概念也越來越少。《數學之美》中的信息論、隱馬爾可夫、TF-IDF、分詞等概念沒在224n涉及,在BERT時代變得不太相關,技術迭代也確實快。
文章目錄
學習總結
一、回顧依存句法分析
二、CS224n作業要求
三、具體題目
第1題
第2題
第3題
第4題:多個句子解析
第5題
5.1 parser_model.py文件
1)`embedding layer`函數:
2)`forward`函數:
5.2 run.py文件
基于神經網絡的依存句法解析器
四、實驗過程和結果
Reference
一、回顧依存句法分析
很多問題都可以轉為分類問題,基于轉移的依存句法分析器就由預測樹結構問題轉為預測動作序列問題。
有一種方法:
編碼端:用來負責計算詞的隱層向量表示
解碼端:用來解碼計算當前狀態的所有動作得分
在本次assignment3中,斯坦福大學提供了句法分析的數據源文件(人工標注形成)。
訓練集的數據源train.conll如下:
1 In _ ADP IN _ 5 case _ _ 2 an _ DET DT _ 5 det _ _ 3 Oct. _ PROPN NNP _ 5 compound _ _ 4 19 _ NUM CD _ 5 nummod _ _ 5 review _ NOUN NN _ 45 nmod _ _ 6 of _ ADP IN _ 9 case _ _ 7 `` _ PUNCT `` _ 9 punct _ _ 8 The _ DET DT _ 9 det _ _ 9 Misanthrope _ NOUN NN _ 5 nmod _ _ 10 '' _ PUNCT '' _ 9 punct _ _ ........
1
2
3
4
5
6
7
8
9
10
11
文件中每個句子以換行,開頭是句子序號,每行有10列(除去句子序號),分別的含義如下(和驗證集、測試集也是一樣的):
ID:單詞索引,每個新句子從1開始的整數;可能是多個詞的標記的范圍。
FORM:Word單詞或標點符號。
LEMMA:詞形的詞條或詞干。
UPOSTAG:從Google通用POS標簽的修訂版本中提取的通用詞性標簽。
XPOSTAG:語言特定的詞性標簽;下劃線如果不可用。
FEATS:來自通用特征清單或來自定義的語言特定擴展的形態特征列表;下劃線如果不可用。
HEAD:當前令牌的頭部,它是ID的值或零(0)。
DEPREL:通用斯坦福與HEAD(root iff HEAD = 0)的依賴關系或者定義的語言特定的子類型之一。
DEPS:二級依賴項列表(head-deprel對)。
MISC:任何其他注釋。
舉例解釋:
(1)第4列詞性解釋的說明,例如:
NNP: noun, proper, singular 名詞,單數
VBZ: verb, present tense,3rd person singular 動詞,一般現在時第三人稱單數
(2)第7列依賴關系的說明,例如:
nsubj : nominal subject,名詞主語
dobj : direct object直接賓語
punct: punctuation標點符號
還有一個文件是en-cw.txt詞向量:
二、CS224n作業要求
Neural Transition-Based Dependency Parsing (44 points)
作業:基于神經網絡,轉移的依存解析器
目標:最大化UAS值(Unlabeled attachment score)
首先看清楚作業中的readme文件,確保有local_env.yml文件中所有的依賴項:
# 1. Activate your old environment: conda activate cs224n # 2. Install docopt conda install docopt # 3. Install pytorch, torchvision, and tqdm conda install pytorch torchvision -c pytorch conda install -c anaconda tqdm
1
2
3
4
5
6
7
8
9
10
11
12
如果想創建一個新虛擬環境,則:
# 1. Create an environment with dependencies specified in local_env.yml # (note that this can take some time depending on your laptop): conda env create -f local_env.yml # 2. Activate the new environment: conda activate cs224n_a3 # To deactivate an active environment, use conda deactivate
1
2
3
4
5
6
7
8
9
10
11
12
依存句法解析器基于依存句法分析,處理句子結構,可以參考上一講,依存句法解析器有基于轉移的、基于圖的、基于特征等等。本次作業是要求基于轉移的依存句法解析器。
At every step it maintains a partial parse, which is represented as follows:
A stack of words that are currently being processed.
A buffer of words yet to be processed.
A list of dependencies predicted by the parser
初始,棧中值有root,依賴項列表(隊列)為空,緩沖區按順序包含句子中所有單詞。每次操作,解析器進行一次轉換,以此類推,直到緩沖區隊列為空:
(1)Shift:將緩沖區隊列的元素入棧
(2)LEFT-ARC:將棧頂的兩棵依存子樹采用左弧合并;
(3)RIGHT-ARC:將棧頂的兩棵依存子樹采用右弧合并;
即每次操作,用依存句法解析器,作為一個分類器,求出三個動作的最大概率的那個,再進行對應動作的操作。
注意:在課程給出初始代碼parser_transitions.py文件中的unidirectional_predict函數的return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S",需要將is改為==,否則會報錯:
SyntaxWarning: "is" with a literal. Did you mean "=="? return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S"
1
2
三、具體題目
第1題
(1)(4分)句子:I parsed this sentence correctly.
問:用了什么轉換,添加了什么新依賴項(如果有的話),下面給了前三個步驟:
answer:
即依存句法樹為:
第2題
(2)(2分)一個含有n個單詞的句子,一共有多少步解析,用1-2句話簡要解釋。
答:2n步。因為句子中的每個單詞在從堆棧中刪除之前需要兩次轉換:SHIFT(入棧)和一個arc(旋轉操作)。解析過程中的每一步只能為一個單詞執行這兩種轉換中的一種。
第3題
(3)(6分)實現parser_transitions.py的PartialParse類中的構造函數__init__和parse_step函數,即實現解析器的轉換機制,可以運行python parser_transitions.py part_c來測試。
1)首先是parser_transitions.py的PartialParse類中的構造函數__init__:
這里的依存列表dependencies是元素為tuple元組的一個列表list,其中每個元組都表示一個依賴關系,即(head, dependent)。
注意:
(1)根結點規定是root結點。
(2)If you need to use the sentence object to initialize anything, make sure to not directly reference the sentence object. That is, remember to NOT modify the sentence object.
class PartialParse(object): def __init__(self, sentence): """Initializes this partial parse. @param sentence (list of str): The sentence to be parsed as a list of words. Your code should not modify the sentence. """ # The sentence being parsed is kept for bookkeeping purposes. Do NOT alter it in your code. self.sentence = sentence ### YOUR CODE HERE (3 Lines) ### Your code should initialize the following fields: ### self.stack: The current stack represented as a list with the top of the stack as the ### last element of the list. ### self.buffer: The current buffer represented as a list with the first item on the ### buffer as the first item of the list ### self.dependencies: The list of dependencies produced so far. Represented as a list of ### tuples where each tuple is of the form (head, dependent). ### Order for this list doesn't matter. ### ### Note: The root token should be represented with the string "ROOT" ### Note: If you need to use the sentence object to initialize anything, make sure to not directly ### reference the sentence object. That is, remember to NOT modify the sentence object. self.stack = ['ROOT'] self.buffer = sentence.copy() # shallow copy 淺拷貝 self.dependencies = [] ### END YOUR CODE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
2)接著是還是parser_transitions.py文件,其中的的PartialParse類中的parse_step函數:
這里的stack.pop(-2)指將stack內的倒數第二個元素出棧(雖然有點違背棧的定義hh),舉個栗子:
stack = [1] stack.append(2) stack.append(3) print(stack) # 即stack列表中的最后一個元素為3 print("棧的最后一個元素為:", stack[-1]) # 讓倒數第二個元素出棧,返回的是所要出棧的元素值 # 此處倒數第二個元素為2 print(stack.pop(-2)) # 重新打印一次棧內的元素 print(stack)
1
2
3
4
5
6
7
8
9
10
11
上面的結果為:
[1, 2, 3] 3 2 [1, 3]
1
2
3
4
回到這題的解法:
def parse_step(self, transition): """Performs a single parse step by applying the given transition to this partial parse @param transition (str): A string that equals "S", "LA", or "RA" representing the shift, left-arc, and right-arc transitions. You can assume the provided transition is a legal transition. """ ### YOUR CODE HERE (~7-12 Lines) ### TODO: ### Implement a single parsing step, i.e. the logic for the following as ### described in the pdf handout: ### 1. Shift ### 2. Left Arc ### 3. Right Arc if transition == 'S': # self.stack.append(self.buffer.pop(0)) # 等價于下面兩句 self.stack.append(self.buffer[0]) self.buffer.pop(0) elif transition == 'LA': # self.dependencies.append((self.stack[-1], self.stack.pop(-2))) # 上面這句等價于下面這兩句,即前者指向后者詞, self.dependencies.append((self.stack[-1], self.stack[-2])) self.stack.pop(-2) elif transition == 'RA': # self.dependencies.append((self.stack[-2], self.stack.pop(-1))) self.dependencies.append((self.stack[-2], self.stack[-1])) self.stack.pop(-1) ### END YOUR CODE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
3)對于PartialParse函數,我們可以用句子進行解析測試:
# 輸入要解析的句子: sentence = ["parse", "this", "sentence"] # 傳入進行轉換解析的操作列表: dependencies = PartialParse(sentence).parse(["S", "S", "S", "LA", "RA", "RA"]) # 對解析以后的依存關系排序: dependencies = tuple(sorted(dependencies)) # 期望解析成功的依存關系: expected = (('ROOT', 'parse'), ('parse', 'sentence'), ('sentence', 'this'))
1
2
3
4
5
6
7
8
栗子中的結果。從經過PartialParse函數解析后的dependencies變量值(排序后的)看出,和我們期望解析成功的依存關系expected是一毛一樣的。這里的dependencies是元素為tuple元組的一個列表list,其中每個元組都表示一個依賴關系,即(head, dependent)。
第4題:多個句子解析
class DummyModel類:
先把句子放到buffer緩存區隊列里面,DummyModel predict方法創建轉換操作:如果隊列中仍有元素,就執行shift操作,將隊列中的元素一個個送到stack棧中準備PK;如果隊列中無元素了,意味著隊列中的元素都入站了要進行PK,如棧中第一個元素(最先入棧的元素是right,DummyModel設置為RA,否則為LA操作(left的情況))。
可以先對minibatch_parse函數進行測試:
# 輸入要解析的多個句子列表: sentences = [["right", "arcs", "only"], ["right", "arcs", "only", "again"], ["left", "arcs", "only"], ["left", "arcs", "only", "again"]] # 批次解析: # DummyModel()模型中提供要轉換的動作。2是批次大小batch_size deps = minibatch_parse(sentences, DummyModel(), 2) # 期望解析的依存關系: #deps[0]:(('ROOT', 'right'), ('arcs', 'only'), ('right', 'arcs'))) #deps[1]: (('ROOT', 'right'), ('arcs', 'only'), ('only', 'again'), ('right', 'arcs'))) #deps[2]: (('only', 'ROOT'), ('only', 'arcs'), ('only', 'left'))) #deps[3]: (('again', 'ROOT'), ('again', 'arcs'), ('again', 'left'), ('again', 'only')))
1
2
3
4
5
6
7
8
9
10
11
12
13
可以看到多個句子列表進行解析后的結果deps值(如下),和我們期望的解析后的依存關系(上面的注釋部分)相同:
(4)(8分)當然可以每次分類器預測每次該執行的操作是哪個,但為了更高效,可以一次預測多次該執行的操作,即用下面的算法進行小批量解析:
實現parser_transitions.py文件的minibatch_parse函數,可以用python parser_transitions.py part_d進行測試。
def minibatch_parse(sentences, model, batch_size): """Parses a list of sentences in minibatches using a model. @param sentences (list of list of str): A list of sentences to be parsed (each sentence is a list of words and each word is of type string) @param model (ParserModel): The model that makes parsing decisions. It is assumed to have a function model.predict(partial_parses) that takes in a list of PartialParses as input and returns a list of transitions predicted for each parse. That is, after calling transitions = model.predict(partial_parses) transitions[i] will be the next transition to apply to partial_parses[i]. @param batch_size (int): The number of PartialParses to include in each minibatch @return dependencies (list of dependency lists):列表中的每個元素是對應句子的依賴項列表(順序對應) """ dependencies = [] ### YOUR CODE HERE (~8-10 Lines) ### TODO: ### Implement the minibatch parse algorithm. Note that the pseudocode for this algorithm is given in the pdf handout. ### ### Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g. ### unfinished_parses = partial_parses[:]. ### Here `unfinished_parses` is a shallow copy of `partial_parses`. ### In Python, a shallow copied list like `unfinished_parses` does not contain new instances ### of the object stored in `partial_parses`. Rather both lists refer to the same objects. ### In our case, `partial_parses` contains a list of partial parses. `unfinished_parses` ### contains references to the same objects. Thus, you should NOT use the `del` operator ### to remove objects from the `unfinished_parses` list. This will free the underlying memory that ### is being accessed by `partial_parses` and may cause your code to crash. partial_parses = [PartialParse(sentence) for sentence in sentences] unfinished_parses = partial_parses[:] # shallow copy while len(unfinished_parses) > 0: # 從unfinished parses中取出第一個batchsize的parses minibatch_partial_parses = unfinished_parses[:batch_size] # 模型預測minibatch中每個部分解析器的下一個轉換步驟 minibatch_transitions = model.predict(minibatch_partial_parses) # 根據預測結果,在minibatch中的各個局部解析,執行解析步驟 for transition, partial_parse in zip(minibatch_transitions, minibatch_partial_parses): partial_parse.parse_step(transition) # 從未完成的解析中刪除已完成的解析(空緩沖區和大小為1的堆棧)。 unfinished_parses = [ partial_parse for partial_parse in unfinished_parses if not (len(partial_parse.buffer) == 0 and len(partial_parse.stack) == 1) ] for partial_parse in partial_parses: dependencies.append(partial_parse.dependencies) ### END YOUR CODE return dependencies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
第5題
(5)(12分)模型訓練。
模型提取表征當前狀態的特征向量。我們將使用原始神經依存解析器論文(A Fast and Accurate Dependency Parser using Neural Networks)中提出的特征集。
在utils/parser_utils.py中已經實現了獲取這些特征的功能。這個特征向量由一系列標記組成(例如,堆棧中的最后一個單詞,緩沖區中的第一個單詞等)。它們能夠被表示為 w = [ w 1 , w 2 , … , w m ] \mathbf{w}=\left[w_{1}, w_{2}, \ldots, w_{m}\right] w=[w1 ,w2 ,…,wm ],其中m是特征個數,并且 0 ≤ w i < ∣ V ∣ 0 \leq w_{i}<|V| 0≤wi <∣V∣, ∣ V ∣ |V| ∣V∣是詞匯表的size。然后我們的網絡查找每個單詞的嵌入,并將它們連接到一個輸入向量: x = [ E w 1 , … , E w m ] ∈ R d m \mathbf{x}=\left[\mathbf{E}_{w_{1}}, \ldots, \mathbf{E}_{w_{m}}\right] \in \mathbb{R}^{d m} x=[Ew1 ,…,Ewm ]∈Rdm
E ∈ R ∣ V ∣ × d \mathbf{E} \in \mathbb{R}^{|V| \times d} E∈R∣V∣×d是embedding矩陣,其中每個列向量 E w \mathbf{E}_{w} Ew 是單詞 w w w的embedding。
網絡:
h = ReLU ? ( x W + b 1 ) l = h U + b 2 y ^ = softmax ? ( l )
hly^=ReLU(xW+b1)=hU+b2=softmax(l)
h
=
ReLU
?
(
x
W
+
b
1
)
l
=
h
U
+
b
2
y
^
=
softmax
?
(
l
)
要求:在parser_model.py文件中能找到基架模型實現該網絡,需要完成__init__、embedding_lookup、forward函數,然后完成run.py文件的train_for_epoch和train函數。最后執行python run.py訓練模型,和計算在測試集(Penn樹庫,用Universal Dependencies標記的)的預測效果。
注意:在本次作業中,需要實現linear layer和embedding layer,所以不要直接使用torch.nn.Linear和torch.nn.Embedding。
5.1 parser_model.py文件
解析:torch.index_select函數參考官方文檔,其參數為index_select(input, dim, index):
dim為0則按行索引;為1則按列索引。
index為索引矩陣,如dim為0,且tensor[0, 2]表示第0行和第2行
所有句子向量組成一個嵌入矩陣,而這里的embedding_lookup就是解決indice和embedding vector之間的映射關系,如之前寫RNN時one_hot_lookup,hello字符串就表示為input:x_data = [1, 0, 2, 2, 3]。
# 準備數據 idx2char = ['e', 'h', 'l', 'o'] # input數據是字符串hello x_data = [1, 0, 2, 2, 3] y_data = [3, 1, 2, 3, 2] one_hot_lookup = [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
1
2
3
4
5
6
7
8
9
10
def embedding_lookup(self, w): """ Utilize `w` to select embeddings from embedding matrix `self.embeddings` @param w (Tensor): input tensor of word indices (batch_size, n_features) @return x (Tensor): tensor of embeddings for words represented in w (batch_size, n_features * embed_size) """ ### YOUR CODE HERE (~1-4 Lines) ### TODO: ### 1) For each index `i` in `w`, select `i`th vector from self.embeddings ### 2) Reshape the tensor using `view` function if necessary ### ### Note: All embedding vectors are stacked and stored as a matrix. The model receives ### a list of indices representing a sequence of words, then it calls this lookup ### function to map indices to sequence of embeddings. ### ### This problem aims to test your understanding of embedding lookup, ### so DO NOT use any high level API like nn.Embedding ### (we are asking you to implement that!). Pay attention to tensor shapes ### and reshape if necessary. Make sure you know each tensor's shape before you run the code! ### ### Pytorch has some useful APIs for you, and you can use either one ### in this problem (except nn.Embedding). These docs might be helpful: ### Index select: https://pytorch.org/docs/stable/torch.html#torch.index_select ### Gather: https://pytorch.org/docs/stable/torch.html#torch.gather ### View: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view ### Flatten: https://pytorch.org/docs/stable/generated/torch.flatten.html x = torch.index_select(self.embeddings, 0, w.flatten()).reshape(w.shape[0], -1) ### END YOUR CODE return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
注意在pytorch中的交叉熵函數中已經包含了softmax函數了,所以不需要另外加上softmax函數了。
def forward(self, w): """ Run the model forward. Note that we will not apply the softmax function here because it is included in the loss function nn.CrossEntropyLoss PyTorch Notes: - Every nn.Module object (PyTorch model) has a `forward` function. - When you apply your nn.Module to an input tensor `w` this function is applied to the tensor. For example, if you created an instance of your ParserModel and applied it to some `w` as follows, the `forward` function would called on `w` and the result would be stored in the `output` variable: model = ParserModel() output = model(w) # this calls the forward function - For more details checkout: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward @param w (Tensor): input tensor of tokens (batch_size, n_features) @return logits (Tensor): tensor of predictions (output after applying the layers of the network) without applying softmax (batch_size, n_classes) """ ### YOUR CODE HERE (~3-5 lines) ### TODO: ### Complete the forward computation as described in write-up. In addition, include a dropout layer ### as decleared in `__init__` after ReLU function. ### ### Please see the following docs for support: ### Matrix product: https://pytorch.org/docs/stable/torch.html#torch.matmul ### ReLU: https://pytorch.org/docs/stable/nn.html?highlight=relu#torch.nn.functional.relu x = self.embedding_lookup(w) # (bs, feat * emb) h = F.relu(torch.matmul(x, self.embed_to_hidden_weight) + self.embed_to_hidden_bias) if self.training: h = self.dropout(h) logits = torch.matmul(h, self.hidden_to_logits_weight) + self.hidden_to_logits_bias ### END YOUR CODE return logits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
5.2 run.py文件
train_for_epoch和train函數。
train函數:
optimizer = optim.Adam(parser.model.parameters()) loss_func = nn.CrossEntropyLoss()
1
2
在上面的trian函數里,for循環中的每個epoch,都要運行一次train_for_epoch函數,返回UAS指標數值。注意只有在train時才用dropout層,如果是test階段是不需要用dropout層的。
train_for_epoch函數:
即向前傳遞,計算loss,反向傳播,按照TODO的提示寫即可:
def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size): """ Train the neural dependency parser for single epoch. @return dev_UAS (float): Unlabeled Attachment Score (UAS) for dev data """ # Places model in "train" mode, i.e. apply dropout layer parser.model.train() n_minibatches = math.ceil(len(train_data) / batch_size) loss_meter = AverageMeter() with tqdm(total=(n_minibatches)) as prog: for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size)): # 清空梯度 optimizer.zero_grad() # store loss for this batch here loss = 0. train_x = torch.from_numpy(train_x).long() train_y = torch.from_numpy(train_y.nonzero()[1]).long() ### YOUR CODE HERE (~4-10 lines) ### TODO: ### 1) Run train_x forward through model to produce `logits` ### 2) Use the `loss_func` parameter to apply the PyTorch CrossEntropyLoss function. ### This will take `logits` and `train_y` as inputs. It will output the CrossEntropyLoss ### between softmax(`logits`) and `train_y`. Remember that softmax(`logits`) ### are the predictions (y^ from the PDF). ### 3) Backprop losses ### 4) Take step with the optimizer ### Please see the following docs for support: ### Optimizer Step: https://pytorch.org/docs/stable/optim.html#optimizer-step logits = parser.model.forward(train_x) loss = loss_func(logits, target=train_y) loss.backward() optimizer.step() ### END YOUR CODE prog.update(1) loss_meter.update(loss.item()) print ("Average Train Loss: {}".format(loss_meter.avg)) print("Evaluating on dev set",) parser.model.eval() # Places model in "eval" mode, i.e. don't apply dropout layer dev_UAS, _ = parser.parse(dev_data) print("- dev UAS: {:.2f}".format(dev_UAS * 100.0)) return dev_UAS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
運行代碼的最終結果:
dev UAS: 88.60 (dev驗證集)
test UAS: 89.08
其中UAS是(Unlabeled attachment score)。
基于神經網絡的依存句法解析器
神經網絡模型使用【全連接神經網絡】加sotfmax分類(句法依存特征==>嵌入詞向量==>Relu(xW + b1)=>Dropout =>pred(h_dropU + b2)===>softmax_cross_entropy_with_logits)
1,load_and_preprocess_data函數返回解析器,詞嵌入矩陣,訓練集(特征提取),驗證集,測試集
parser, embeddings, train_examples, dev_set, test_set = load_and_preprocess_data(debug)
2,創建模型類實例model = ParserModel(config, embeddings),其中ParserModel繼承Model
model初始化傳入config, embeddings參數,調用父類Model的build方法,在子類ParserModel重載實現add_placeholders()、add_prediction_op()、add_loss_op(self.pred)、add_training_op(self.loss)。
3,模型訓練
model.fit(session, saver, parser, train_examples, dev_set)
4,測試集解析:
UAS, dependencies = parser.parse(test_set)
四、實驗過程和結果
這里的進度條是通過tqdm工具包實現的,進度條的結果格式為:已用時間,預計總共用時時,每輪所用時間。
SyntaxWarning: "is" with a literal. Did you mean "=="? return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S" ================================================================================ INITIALIZING ================================================================================ Loading data... took 2.58 seconds Building parser... took 1.66 seconds Loading pretrained embeddings... took 8.80 seconds Vectorizing data... took 1.95 seconds Preprocessing training data... took 65.94 seconds took 0.18 seconds ================================================================================ TRAINING ================================================================================ Epoch 1 out of 10 100%|██████████| 1848/1848 [02:07<00:00, 14.54it/s] Average Train Loss: 0.1781648173605725 Evaluating on dev set 1445850it [00:00, 28400918.10it/s] - dev UAS: 84.76 New best dev UAS! Saving model. Epoch 2 out of 10 100%|██████████| 1848/1848 [02:21<00:00, 13.09it/s] Average Train Loss: 0.11059159756480873 Evaluating on dev set 1445850it [00:00, 21319509.36it/s] - dev UAS: 86.57 New best dev UAS! Saving model. Epoch 3 out of 10 100%|██████████| 1848/1848 [02:29<00:00, 12.35it/s] Average Train Loss: 0.09602350440255297 Evaluating on dev set 1445850it [00:00, 21010828.57it/s] - dev UAS: 87.23 New best dev UAS! Saving model. Epoch 4 out of 10 100%|██████████| 1848/1848 [02:24<00:00, 12.78it/s] Average Train Loss: 0.08655059076765012 Evaluating on dev set 1445850it [00:00, 18410020.64it/s] - dev UAS: 88.03 New best dev UAS! Saving model. Epoch 5 out of 10 100%|██████████| 1848/1848 [02:29<00:00, 12.32it/s] Average Train Loss: 0.07943204295664251 Evaluating on dev set 1445850it [00:00, 24345468.35it/s] - dev UAS: 88.25 New best dev UAS! Saving model. Epoch 6 out of 10 100%|██████████| 1848/1848 [02:28<00:00, 12.42it/s] Average Train Loss: 0.07376304407124266 Evaluating on dev set 1445850it [00:00, 23765859.77it/s] - dev UAS: 88.06 Epoch 7 out of 10 100%|██████████| 1848/1848 [02:11<00:00, 14.08it/s] Average Train Loss: 0.06907538355638583 Evaluating on dev set 1445850it [00:00, 16358657.93it/s] - dev UAS: 88.15 Epoch 8 out of 10 100%|██████████| 1848/1848 [02:12<00:00, 13.92it/s] Average Train Loss: 0.06480039135468277 Evaluating on dev set 1445850it [00:00, 20698658.75it/s] - dev UAS: 88.45 New best dev UAS! Saving model. Epoch 9 out of 10 100%|██████████| 1848/1848 [02:31<00:00, 12.22it/s] Average Train Loss: 0.061141976250085606 Evaluating on dev set 1445850it [00:00, 22635715.12it/s] - dev UAS: 88.41 Epoch 10 out of 10 100%|██████████| 1848/1848 [02:18<00:00, 13.36it/s] Average Train Loss: 0.05778654704870277 Evaluating on dev set 1445850it [00:00, 30163164.76it/s] - dev UAS: 88.60 New best dev UAS! Saving model. ================================================================================ TESTING ================================================================================ Restoring the best model weights found on the dev set Final evaluation on test set 2919736it [00:00, 31476695.98it/s] - test UAS: 89.08 Done!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Reference
(1)詳解Transition-based Dependency parser基于轉移的依存句法解析器
(2)斯坦福大學CS224N課程作業
(3)CS224N Lecture5:依存句法分析
(4)段智華大佬的assignment3作業筆記
機器學習 深度學習
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。