Replication of some chapter 3 results

Author

Kim Young Jin

Published

June 28, 2024

from micrograd import Neuron, Value
import matplotlib.pyplot as plt
import mnist_loader  # noqa
import torch
import random
import torch.nn as nn  # noqa
# We are inputting 1 and want an output of 0

for w, b in [(0.6, 0.9), (2.0, 2.0)]:
    neuron = Neuron(1)
    neuron.w = [Value(w)]
    neuron.b = Value(b)

    losses = []
    for i in range(300):
        pred = neuron([1.0])
        loss = (pred - Value(0)) ** 2
        losses.append(loss.data)
        for p in neuron.parameters():
            p.grad = 0
        loss.backward()
        for p in neuron.parameters():
            p.data = p.data - 0.15 * p.grad

    # Add label
    plt.plot(losses, label=f"w={w}, b={b}")

# Display legend
plt.legend()

We can see that the neuron learns very slowly in the beginning, when predicted output and actual output are the furthest apart. That shouldn’t be the case. The model should learn faster the further away our predictions are from the actual value. We can use cross-entropy loss to remedy this.

# We are inputting 1 and want an output of 0
for w, b in [(0.6, 0.9), (2.0, 2.0)]:
    neuron = Neuron(1)
    neuron.w = [Value(w)]
    neuron.b = Value(b)

    losses = []
    for i in range(300):
        pred = neuron([1.0])
        # Log loss
        loss = -Value(0) * pred.log() - (1 - Value(0)) * (1 - pred).log()
        losses.append(loss.data)
        for p in neuron.parameters():
            p.grad = 0
        loss.backward()
        for p in neuron.parameters():
            p.data = p.data - 0.15 * p.grad

    # Add label
    plt.plot(losses, label=f"w={w}, b={b}")

# Display legend
plt.legend()

We can see that the problem faced earlier has been fixed.

Regularization

Next, we try running some experiments in pytorch to test the effects of regularization.

import network2
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
# Convert zip iterators to lists
training_data = list(training_data)
validation_data = list(validation_data)
test_data = list(test_data)
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.large_weight_initializer()
net.SGD(
    training_data[:1000],
    400,
    10,
    0.5,
    evaluation_data=test_data,
    monitor_evaluation_accuracy=True,
    monitor_training_cost=True,
)
Epoch 0 training complete
Cost on training data: 1.9157265803666599
Accuracy on evaluation data: 5336 / 10000
Epoch 1 training complete
Cost on training data: 1.4233085154513578
Accuracy on evaluation data: 6481 / 10000
Epoch 2 training complete
Cost on training data: 1.140610742579291
Accuracy on evaluation data: 7009 / 10000
Epoch 3 training complete
Cost on training data: 0.9673757700312096
Accuracy on evaluation data: 7278 / 10000
Epoch 4 training complete
Cost on training data: 0.8219014211106735
Accuracy on evaluation data: 7518 / 10000
Epoch 5 training complete
Cost on training data: 0.7009795108242491
Accuracy on evaluation data: 7621 / 10000
Epoch 6 training complete
Cost on training data: 0.6210731479476688
Accuracy on evaluation data: 7749 / 10000
Epoch 7 training complete
Cost on training data: 0.5649499666413196
Accuracy on evaluation data: 7741 / 10000
Epoch 8 training complete
Cost on training data: 0.4967537573921582
Accuracy on evaluation data: 7913 / 10000
Epoch 9 training complete
Cost on training data: 0.4456219552812798
Accuracy on evaluation data: 7890 / 10000
Epoch 10 training complete
Cost on training data: 0.39901535860495846
Accuracy on evaluation data: 7977 / 10000
Epoch 11 training complete
Cost on training data: 0.36146464324057803
Accuracy on evaluation data: 7938 / 10000
Epoch 12 training complete
Cost on training data: 0.3545073579313579
Accuracy on evaluation data: 7913 / 10000
Epoch 13 training complete
Cost on training data: 0.3087184094679663
Accuracy on evaluation data: 8051 / 10000
Epoch 14 training complete
Cost on training data: 0.2797147873623128
Accuracy on evaluation data: 8115 / 10000
Epoch 15 training complete
Cost on training data: 0.2525545832423591
Accuracy on evaluation data: 8133 / 10000
Epoch 16 training complete
Cost on training data: 0.2380955942826455
Accuracy on evaluation data: 8096 / 10000
Epoch 17 training complete
Cost on training data: 0.22646415893206082
Accuracy on evaluation data: 8108 / 10000
Epoch 18 training complete
Cost on training data: 0.20683416913226718
Accuracy on evaluation data: 8138 / 10000
Epoch 19 training complete
Cost on training data: 0.19279183350114984
Accuracy on evaluation data: 8127 / 10000
Epoch 20 training complete
Cost on training data: 0.1866037912492253
Accuracy on evaluation data: 8145 / 10000
Epoch 21 training complete
Cost on training data: 0.17397172190964394
Accuracy on evaluation data: 8126 / 10000
Epoch 22 training complete
Cost on training data: 0.16138418421655057
Accuracy on evaluation data: 8127 / 10000
Epoch 23 training complete
Cost on training data: 0.15258576699007645
Accuracy on evaluation data: 8159 / 10000
Epoch 24 training complete
Cost on training data: 0.14559589206163331
Accuracy on evaluation data: 8160 / 10000
Epoch 25 training complete
Cost on training data: 0.13733030674883276
Accuracy on evaluation data: 8176 / 10000
Epoch 26 training complete
Cost on training data: 0.13286504240839203
Accuracy on evaluation data: 8163 / 10000
Epoch 27 training complete
Cost on training data: 0.12490575508106312
Accuracy on evaluation data: 8185 / 10000
Epoch 28 training complete
Cost on training data: 0.11897385331053419
Accuracy on evaluation data: 8192 / 10000
Epoch 29 training complete
Cost on training data: 0.11416971608153519
Accuracy on evaluation data: 8190 / 10000
Epoch 30 training complete
Cost on training data: 0.11001761063509545
Accuracy on evaluation data: 8187 / 10000
Epoch 31 training complete
Cost on training data: 0.10569402966303736
Accuracy on evaluation data: 8204 / 10000
Epoch 32 training complete
Cost on training data: 0.1004963762997966
Accuracy on evaluation data: 8208 / 10000
Epoch 33 training complete
Cost on training data: 0.09764112227836957
Accuracy on evaluation data: 8180 / 10000
Epoch 34 training complete
Cost on training data: 0.09319934983369883
Accuracy on evaluation data: 8193 / 10000
Epoch 35 training complete
Cost on training data: 0.08868491774835854
Accuracy on evaluation data: 8206 / 10000
Epoch 36 training complete
Cost on training data: 0.08595086097925883
Accuracy on evaluation data: 8207 / 10000
Epoch 37 training complete
Cost on training data: 0.08350067214392239
Accuracy on evaluation data: 8198 / 10000
Epoch 38 training complete
Cost on training data: 0.07869753555787781
Accuracy on evaluation data: 8223 / 10000
Epoch 39 training complete
Cost on training data: 0.07594792345922877
Accuracy on evaluation data: 8217 / 10000
Epoch 40 training complete
Cost on training data: 0.07346669031585368
Accuracy on evaluation data: 8211 / 10000
Epoch 41 training complete
Cost on training data: 0.0708839567187532
Accuracy on evaluation data: 8221 / 10000
Epoch 42 training complete
Cost on training data: 0.06831180691768503
Accuracy on evaluation data: 8218 / 10000
Epoch 43 training complete
Cost on training data: 0.06711002154610325
Accuracy on evaluation data: 8236 / 10000
Epoch 44 training complete
Cost on training data: 0.06405338526915547
Accuracy on evaluation data: 8215 / 10000
Epoch 45 training complete
Cost on training data: 0.06160733031909577
Accuracy on evaluation data: 8232 / 10000
Epoch 46 training complete
Cost on training data: 0.0596664860482716
Accuracy on evaluation data: 8220 / 10000
Epoch 47 training complete
Cost on training data: 0.05782624278400594
Accuracy on evaluation data: 8228 / 10000
Epoch 48 training complete
Cost on training data: 0.05620444418301355
Accuracy on evaluation data: 8224 / 10000
Epoch 49 training complete
Cost on training data: 0.05465434524748508
Accuracy on evaluation data: 8224 / 10000
Epoch 50 training complete
Cost on training data: 0.053404822360543044
Accuracy on evaluation data: 8236 / 10000
Epoch 51 training complete
Cost on training data: 0.051435838493430454
Accuracy on evaluation data: 8220 / 10000
Epoch 52 training complete
Cost on training data: 0.04995871675143006
Accuracy on evaluation data: 8222 / 10000
Epoch 53 training complete
Cost on training data: 0.049178511975360266
Accuracy on evaluation data: 8232 / 10000
Epoch 54 training complete
Cost on training data: 0.04761860543521613
Accuracy on evaluation data: 8231 / 10000
Epoch 55 training complete
Cost on training data: 0.045694060305748366
Accuracy on evaluation data: 8236 / 10000
Epoch 56 training complete
Cost on training data: 0.04471523134867321
Accuracy on evaluation data: 8231 / 10000
Epoch 57 training complete
Cost on training data: 0.04353039157951212
Accuracy on evaluation data: 8234 / 10000
Epoch 58 training complete
Cost on training data: 0.042274507860818385
Accuracy on evaluation data: 8225 / 10000
Epoch 59 training complete
Cost on training data: 0.04128529084302505
Accuracy on evaluation data: 8232 / 10000
Epoch 60 training complete
Cost on training data: 0.040370416208562215
Accuracy on evaluation data: 8228 / 10000
Epoch 61 training complete
Cost on training data: 0.03927593215490957
Accuracy on evaluation data: 8234 / 10000
Epoch 62 training complete
Cost on training data: 0.03849301313041344
Accuracy on evaluation data: 8234 / 10000
Epoch 63 training complete
Cost on training data: 0.03784698326101495
Accuracy on evaluation data: 8241 / 10000
Epoch 64 training complete
Cost on training data: 0.036834324900328834
Accuracy on evaluation data: 8233 / 10000
Epoch 65 training complete
Cost on training data: 0.036043977294875976
Accuracy on evaluation data: 8234 / 10000
Epoch 66 training complete
Cost on training data: 0.03532049078790165
Accuracy on evaluation data: 8224 / 10000
Epoch 67 training complete
Cost on training data: 0.03457195553625183
Accuracy on evaluation data: 8234 / 10000
Epoch 68 training complete
Cost on training data: 0.03396945778541524
Accuracy on evaluation data: 8233 / 10000
Epoch 69 training complete
Cost on training data: 0.033330702513722874
Accuracy on evaluation data: 8235 / 10000
Epoch 70 training complete
Cost on training data: 0.03261064682887517
Accuracy on evaluation data: 8238 / 10000
Epoch 71 training complete
Cost on training data: 0.03190709529638142
Accuracy on evaluation data: 8241 / 10000
Epoch 72 training complete
Cost on training data: 0.03152396913807024
Accuracy on evaluation data: 8235 / 10000
Epoch 73 training complete
Cost on training data: 0.03076630409249701
Accuracy on evaluation data: 8237 / 10000
Epoch 74 training complete
Cost on training data: 0.03023149206147074
Accuracy on evaluation data: 8241 / 10000
Epoch 75 training complete
Cost on training data: 0.029662199460815355
Accuracy on evaluation data: 8235 / 10000
Epoch 76 training complete
Cost on training data: 0.029187429963762536
Accuracy on evaluation data: 8245 / 10000
Epoch 77 training complete
Cost on training data: 0.028691340398401113
Accuracy on evaluation data: 8236 / 10000
Epoch 78 training complete
Cost on training data: 0.028152612115837413
Accuracy on evaluation data: 8246 / 10000
Epoch 79 training complete
Cost on training data: 0.02781349540189489
Accuracy on evaluation data: 8244 / 10000
Epoch 80 training complete
Cost on training data: 0.027304669134411593
Accuracy on evaluation data: 8250 / 10000
Epoch 81 training complete
Cost on training data: 0.026856672884772514
Accuracy on evaluation data: 8242 / 10000
Epoch 82 training complete
Cost on training data: 0.02639586449219745
Accuracy on evaluation data: 8240 / 10000
Epoch 83 training complete
Cost on training data: 0.02595796469117669
Accuracy on evaluation data: 8243 / 10000
Epoch 84 training complete
Cost on training data: 0.025575379849845398
Accuracy on evaluation data: 8248 / 10000
Epoch 85 training complete
Cost on training data: 0.025208345723867195
Accuracy on evaluation data: 8247 / 10000
Epoch 86 training complete
Cost on training data: 0.024804570034370994
Accuracy on evaluation data: 8253 / 10000
Epoch 87 training complete
Cost on training data: 0.024434796921913336
Accuracy on evaluation data: 8251 / 10000
Epoch 88 training complete
Cost on training data: 0.024103135768802273
Accuracy on evaluation data: 8245 / 10000
Epoch 89 training complete
Cost on training data: 0.02376708064146453
Accuracy on evaluation data: 8255 / 10000
Epoch 90 training complete
Cost on training data: 0.023434898537771164
Accuracy on evaluation data: 8248 / 10000
Epoch 91 training complete
Cost on training data: 0.023070941073319938
Accuracy on evaluation data: 8236 / 10000
Epoch 92 training complete
Cost on training data: 0.022738641763154207
Accuracy on evaluation data: 8246 / 10000
Epoch 93 training complete
Cost on training data: 0.022405034123781458
Accuracy on evaluation data: 8248 / 10000
Epoch 94 training complete
Cost on training data: 0.022116731931907656
Accuracy on evaluation data: 8243 / 10000
Epoch 95 training complete
Cost on training data: 0.021811183501928306
Accuracy on evaluation data: 8240 / 10000
Epoch 96 training complete
Cost on training data: 0.021542550179585496
Accuracy on evaluation data: 8254 / 10000
Epoch 97 training complete
Cost on training data: 0.0212327173512337
Accuracy on evaluation data: 8247 / 10000
Epoch 98 training complete
Cost on training data: 0.020955587687164007
Accuracy on evaluation data: 8247 / 10000
Epoch 99 training complete
Cost on training data: 0.02070081988279502
Accuracy on evaluation data: 8243 / 10000
Epoch 100 training complete
Cost on training data: 0.020464101024792077
Accuracy on evaluation data: 8246 / 10000
Epoch 101 training complete
Cost on training data: 0.020183177431458525
Accuracy on evaluation data: 8241 / 10000
Epoch 102 training complete
Cost on training data: 0.01990947657685897
Accuracy on evaluation data: 8249 / 10000
Epoch 103 training complete
Cost on training data: 0.019673095002806538
Accuracy on evaluation data: 8235 / 10000
Epoch 104 training complete
Cost on training data: 0.01945071220597259
Accuracy on evaluation data: 8239 / 10000
Epoch 105 training complete
Cost on training data: 0.019183272328162273
Accuracy on evaluation data: 8240 / 10000
Epoch 106 training complete
Cost on training data: 0.01896155749194874
Accuracy on evaluation data: 8253 / 10000
Epoch 107 training complete
Cost on training data: 0.018737027644866
Accuracy on evaluation data: 8246 / 10000
Epoch 108 training complete
Cost on training data: 0.01852884659595162
Accuracy on evaluation data: 8245 / 10000
Epoch 109 training complete
Cost on training data: 0.018326055157791767
Accuracy on evaluation data: 8243 / 10000
Epoch 110 training complete
Cost on training data: 0.01809023758910947
Accuracy on evaluation data: 8248 / 10000
Epoch 111 training complete
Cost on training data: 0.017889669482253606
Accuracy on evaluation data: 8249 / 10000
Epoch 112 training complete
Cost on training data: 0.017689472412734882
Accuracy on evaluation data: 8244 / 10000
Epoch 113 training complete
Cost on training data: 0.01748919382769571
Accuracy on evaluation data: 8255 / 10000
Epoch 114 training complete
Cost on training data: 0.01729858085590458
Accuracy on evaluation data: 8250 / 10000
Epoch 115 training complete
Cost on training data: 0.017126279843091046
Accuracy on evaluation data: 8253 / 10000
Epoch 116 training complete
Cost on training data: 0.016934041161704855
Accuracy on evaluation data: 8253 / 10000
Epoch 117 training complete
Cost on training data: 0.01674109717372653
Accuracy on evaluation data: 8254 / 10000
Epoch 118 training complete
Cost on training data: 0.01655962677581138
Accuracy on evaluation data: 8245 / 10000
Epoch 119 training complete
Cost on training data: 0.01639757707029055
Accuracy on evaluation data: 8245 / 10000
Epoch 120 training complete
Cost on training data: 0.01621579584970799
Accuracy on evaluation data: 8253 / 10000
Epoch 121 training complete
Cost on training data: 0.01605936365039096
Accuracy on evaluation data: 8248 / 10000
Epoch 122 training complete
Cost on training data: 0.01589021235514923
Accuracy on evaluation data: 8256 / 10000
Epoch 123 training complete
Cost on training data: 0.01572518834641798
Accuracy on evaluation data: 8256 / 10000
Epoch 124 training complete
Cost on training data: 0.015569978429947886
Accuracy on evaluation data: 8247 / 10000
Epoch 125 training complete
Cost on training data: 0.015407319294389883
Accuracy on evaluation data: 8250 / 10000
Epoch 126 training complete
Cost on training data: 0.015251287024967676
Accuracy on evaluation data: 8252 / 10000
Epoch 127 training complete
Cost on training data: 0.015108016005986392
Accuracy on evaluation data: 8249 / 10000
Epoch 128 training complete
Cost on training data: 0.014959029113495606
Accuracy on evaluation data: 8251 / 10000
Epoch 129 training complete
Cost on training data: 0.014821946295991628
Accuracy on evaluation data: 8246 / 10000
Epoch 130 training complete
Cost on training data: 0.014676781773088376
Accuracy on evaluation data: 8250 / 10000
Epoch 131 training complete
Cost on training data: 0.014547077394996423
Accuracy on evaluation data: 8258 / 10000
Epoch 132 training complete
Cost on training data: 0.014406349446049009
Accuracy on evaluation data: 8260 / 10000
Epoch 133 training complete
Cost on training data: 0.014274582991979683
Accuracy on evaluation data: 8260 / 10000
Epoch 134 training complete
Cost on training data: 0.01415044622375008
Accuracy on evaluation data: 8266 / 10000
Epoch 135 training complete
Cost on training data: 0.014012607992043
Accuracy on evaluation data: 8260 / 10000
Epoch 136 training complete
Cost on training data: 0.01389321569049412
Accuracy on evaluation data: 8264 / 10000
Epoch 137 training complete
Cost on training data: 0.013766664380162422
Accuracy on evaluation data: 8251 / 10000
Epoch 138 training complete
Cost on training data: 0.013653798693873268
Accuracy on evaluation data: 8256 / 10000
Epoch 139 training complete
Cost on training data: 0.01352502677394385
Accuracy on evaluation data: 8261 / 10000
Epoch 140 training complete
Cost on training data: 0.013409418654958455
Accuracy on evaluation data: 8269 / 10000
Epoch 141 training complete
Cost on training data: 0.013301199019208685
Accuracy on evaluation data: 8268 / 10000
Epoch 142 training complete
Cost on training data: 0.013180497870486077
Accuracy on evaluation data: 8270 / 10000
Epoch 143 training complete
Cost on training data: 0.013077935972272886
Accuracy on evaluation data: 8268 / 10000
Epoch 144 training complete
Cost on training data: 0.01296117787556624
Accuracy on evaluation data: 8263 / 10000
Epoch 145 training complete
Cost on training data: 0.012856299557458047
Accuracy on evaluation data: 8260 / 10000
Epoch 146 training complete
Cost on training data: 0.012751451883080927
Accuracy on evaluation data: 8263 / 10000
Epoch 147 training complete
Cost on training data: 0.012645519491190157
Accuracy on evaluation data: 8268 / 10000
Epoch 148 training complete
Cost on training data: 0.012545166466142342
Accuracy on evaluation data: 8267 / 10000
Epoch 149 training complete
Cost on training data: 0.012447508071661858
Accuracy on evaluation data: 8262 / 10000
Epoch 150 training complete
Cost on training data: 0.01234312271672617
Accuracy on evaluation data: 8270 / 10000
Epoch 151 training complete
Cost on training data: 0.012247949541335238
Accuracy on evaluation data: 8265 / 10000
Epoch 152 training complete
Cost on training data: 0.012152139704774108
Accuracy on evaluation data: 8268 / 10000
Epoch 153 training complete
Cost on training data: 0.012064314828212068
Accuracy on evaluation data: 8265 / 10000
Epoch 154 training complete
Cost on training data: 0.011966400675534445
Accuracy on evaluation data: 8273 / 10000
Epoch 155 training complete
Cost on training data: 0.011874914838803683
Accuracy on evaluation data: 8271 / 10000
Epoch 156 training complete
Cost on training data: 0.011782724235167658
Accuracy on evaluation data: 8273 / 10000
Epoch 157 training complete
Cost on training data: 0.0116966319554314
Accuracy on evaluation data: 8270 / 10000
Epoch 158 training complete
Cost on training data: 0.011610256479557782
Accuracy on evaluation data: 8273 / 10000
Epoch 159 training complete
Cost on training data: 0.011520105258763945
Accuracy on evaluation data: 8271 / 10000
Epoch 160 training complete
Cost on training data: 0.011439755468064356
Accuracy on evaluation data: 8274 / 10000
Epoch 161 training complete
Cost on training data: 0.011357655493793929
Accuracy on evaluation data: 8268 / 10000
Epoch 162 training complete
Cost on training data: 0.011283409102421852
Accuracy on evaluation data: 8276 / 10000
Epoch 163 training complete
Cost on training data: 0.01118870672659996
Accuracy on evaluation data: 8269 / 10000
Epoch 164 training complete
Cost on training data: 0.01110691983469005
Accuracy on evaluation data: 8270 / 10000
Epoch 165 training complete
Cost on training data: 0.011027037584351687
Accuracy on evaluation data: 8268 / 10000
Epoch 166 training complete
Cost on training data: 0.0109480839360458
Accuracy on evaluation data: 8270 / 10000
Epoch 167 training complete
Cost on training data: 0.010868154795374474
Accuracy on evaluation data: 8272 / 10000
Epoch 168 training complete
Cost on training data: 0.010794628856790733
Accuracy on evaluation data: 8269 / 10000
Epoch 169 training complete
Cost on training data: 0.010718020864434652
Accuracy on evaluation data: 8275 / 10000
Epoch 170 training complete
Cost on training data: 0.010640652611700866
Accuracy on evaluation data: 8272 / 10000
Epoch 171 training complete
Cost on training data: 0.010568505850681333
Accuracy on evaluation data: 8276 / 10000
Epoch 172 training complete
Cost on training data: 0.010495144281480608
Accuracy on evaluation data: 8270 / 10000
Epoch 173 training complete
Cost on training data: 0.010423459700472372
Accuracy on evaluation data: 8276 / 10000
Epoch 174 training complete
Cost on training data: 0.01035215785574867
Accuracy on evaluation data: 8274 / 10000
Epoch 175 training complete
Cost on training data: 0.010281598389256895
Accuracy on evaluation data: 8273 / 10000
Epoch 176 training complete
Cost on training data: 0.010213743367284523
Accuracy on evaluation data: 8275 / 10000
Epoch 177 training complete
Cost on training data: 0.010144601049374632
Accuracy on evaluation data: 8273 / 10000
Epoch 178 training complete
Cost on training data: 0.010075490292771319
Accuracy on evaluation data: 8277 / 10000
Epoch 179 training complete
Cost on training data: 0.010008294371522152
Accuracy on evaluation data: 8277 / 10000
Epoch 180 training complete
Cost on training data: 0.009942008567899162
Accuracy on evaluation data: 8274 / 10000
Epoch 181 training complete
Cost on training data: 0.009878198710323437
Accuracy on evaluation data: 8274 / 10000
Epoch 182 training complete
Cost on training data: 0.009813496530298511
Accuracy on evaluation data: 8278 / 10000
Epoch 183 training complete
Cost on training data: 0.009745402673000931
Accuracy on evaluation data: 8280 / 10000
Epoch 184 training complete
Cost on training data: 0.009683709284427975
Accuracy on evaluation data: 8280 / 10000
Epoch 185 training complete
Cost on training data: 0.009620426739445657
Accuracy on evaluation data: 8279 / 10000
Epoch 186 training complete
Cost on training data: 0.009557455763009055
Accuracy on evaluation data: 8277 / 10000
Epoch 187 training complete
Cost on training data: 0.00949805534433109
Accuracy on evaluation data: 8277 / 10000
Epoch 188 training complete
Cost on training data: 0.009438038102481235
Accuracy on evaluation data: 8274 / 10000
Epoch 189 training complete
Cost on training data: 0.009375645959080825
Accuracy on evaluation data: 8278 / 10000
Epoch 190 training complete
Cost on training data: 0.009315332442636128
Accuracy on evaluation data: 8274 / 10000
Epoch 191 training complete
Cost on training data: 0.0092563441875046
Accuracy on evaluation data: 8277 / 10000
Epoch 192 training complete
Cost on training data: 0.009197103377941369
Accuracy on evaluation data: 8276 / 10000
Epoch 193 training complete
Cost on training data: 0.00913937219535242
Accuracy on evaluation data: 8276 / 10000
Epoch 194 training complete
Cost on training data: 0.009083971707609098
Accuracy on evaluation data: 8279 / 10000
Epoch 195 training complete
Cost on training data: 0.00902560657927742
Accuracy on evaluation data: 8280 / 10000
Epoch 196 training complete
Cost on training data: 0.008970855935428792
Accuracy on evaluation data: 8278 / 10000
Epoch 197 training complete
Cost on training data: 0.008913885902286431
Accuracy on evaluation data: 8274 / 10000
Epoch 198 training complete
Cost on training data: 0.008859597006419211
Accuracy on evaluation data: 8275 / 10000
Epoch 199 training complete
Cost on training data: 0.008805134195939951
Accuracy on evaluation data: 8277 / 10000
Epoch 200 training complete
Cost on training data: 0.008753320472002191
Accuracy on evaluation data: 8276 / 10000
Epoch 201 training complete
Cost on training data: 0.008699193168168104
Accuracy on evaluation data: 8276 / 10000
Epoch 202 training complete
Cost on training data: 0.008646628714748658
Accuracy on evaluation data: 8278 / 10000
Epoch 203 training complete
Cost on training data: 0.008596697512196652
Accuracy on evaluation data: 8282 / 10000
Epoch 204 training complete
Cost on training data: 0.00854450520511543
Accuracy on evaluation data: 8282 / 10000
Epoch 205 training complete
Cost on training data: 0.008494266012365406
Accuracy on evaluation data: 8281 / 10000
Epoch 206 training complete
Cost on training data: 0.00844535278211304
Accuracy on evaluation data: 8280 / 10000
Epoch 207 training complete
Cost on training data: 0.008396311920896578
Accuracy on evaluation data: 8280 / 10000
Epoch 208 training complete
Cost on training data: 0.008348879753283944
Accuracy on evaluation data: 8284 / 10000
Epoch 209 training complete
Cost on training data: 0.00830025892889786
Accuracy on evaluation data: 8278 / 10000
Epoch 210 training complete
Cost on training data: 0.008253635655113563
Accuracy on evaluation data: 8285 / 10000
Epoch 211 training complete
Cost on training data: 0.008206207699730876
Accuracy on evaluation data: 8282 / 10000
Epoch 212 training complete
Cost on training data: 0.008159949913241944
Accuracy on evaluation data: 8282 / 10000
Epoch 213 training complete
Cost on training data: 0.008115473505970949
Accuracy on evaluation data: 8279 / 10000
Epoch 214 training complete
Cost on training data: 0.008070457971559742
Accuracy on evaluation data: 8277 / 10000
Epoch 215 training complete
Cost on training data: 0.008025276871048191
Accuracy on evaluation data: 8281 / 10000
Epoch 216 training complete
Cost on training data: 0.00798175692971702
Accuracy on evaluation data: 8277 / 10000
Epoch 217 training complete
Cost on training data: 0.007937385276313748
Accuracy on evaluation data: 8286 / 10000
Epoch 218 training complete
Cost on training data: 0.00789409095460702
Accuracy on evaluation data: 8287 / 10000
Epoch 219 training complete
Cost on training data: 0.007851964134100609
Accuracy on evaluation data: 8284 / 10000
Epoch 220 training complete
Cost on training data: 0.007810620015641798
Accuracy on evaluation data: 8283 / 10000
Epoch 221 training complete
Cost on training data: 0.007768607301514054
Accuracy on evaluation data: 8280 / 10000
Epoch 222 training complete
Cost on training data: 0.007727871181417616
Accuracy on evaluation data: 8283 / 10000
Epoch 223 training complete
Cost on training data: 0.00768651891795349
Accuracy on evaluation data: 8284 / 10000
Epoch 224 training complete
Cost on training data: 0.007645607395488211
Accuracy on evaluation data: 8286 / 10000
Epoch 225 training complete
Cost on training data: 0.007606159909610643
Accuracy on evaluation data: 8283 / 10000
Epoch 226 training complete
Cost on training data: 0.0075671767015055395
Accuracy on evaluation data: 8285 / 10000
Epoch 227 training complete
Cost on training data: 0.007528717093426312
Accuracy on evaluation data: 8284 / 10000
Epoch 228 training complete
Cost on training data: 0.007490312469768559
Accuracy on evaluation data: 8286 / 10000
Epoch 229 training complete
Cost on training data: 0.007451488872162558
Accuracy on evaluation data: 8289 / 10000
Epoch 230 training complete
Cost on training data: 0.0074139164053383805
Accuracy on evaluation data: 8287 / 10000
Epoch 231 training complete
Cost on training data: 0.00737624578357679
Accuracy on evaluation data: 8288 / 10000
Epoch 232 training complete
Cost on training data: 0.007339102825272781
Accuracy on evaluation data: 8285 / 10000
Epoch 233 training complete
Cost on training data: 0.007303170786097757
Accuracy on evaluation data: 8284 / 10000
Epoch 234 training complete
Cost on training data: 0.007266443130721465
Accuracy on evaluation data: 8288 / 10000
Epoch 235 training complete
Cost on training data: 0.0072306431437006245
Accuracy on evaluation data: 8288 / 10000
Epoch 236 training complete
Cost on training data: 0.007195464529128684
Accuracy on evaluation data: 8288 / 10000
Epoch 237 training complete
Cost on training data: 0.007160239334472621
Accuracy on evaluation data: 8291 / 10000
Epoch 238 training complete
Cost on training data: 0.007125011344912007
Accuracy on evaluation data: 8287 / 10000
Epoch 239 training complete
Cost on training data: 0.0070907315932205815
Accuracy on evaluation data: 8286 / 10000
Epoch 240 training complete
Cost on training data: 0.0070564370106393146
Accuracy on evaluation data: 8286 / 10000
Epoch 241 training complete
Cost on training data: 0.007022949284325297
Accuracy on evaluation data: 8287 / 10000
Epoch 242 training complete
Cost on training data: 0.006989332090196003
Accuracy on evaluation data: 8285 / 10000
Epoch 243 training complete
Cost on training data: 0.0069575643021553045
Accuracy on evaluation data: 8286 / 10000
Epoch 244 training complete
Cost on training data: 0.006922972969011052
Accuracy on evaluation data: 8287 / 10000
Epoch 245 training complete
Cost on training data: 0.006890322585898007
Accuracy on evaluation data: 8289 / 10000
Epoch 246 training complete
Cost on training data: 0.0068582171355924875
Accuracy on evaluation data: 8287 / 10000
Epoch 247 training complete
Cost on training data: 0.006826825664746198
Accuracy on evaluation data: 8287 / 10000
Epoch 248 training complete
Cost on training data: 0.006795318387755124
Accuracy on evaluation data: 8286 / 10000
Epoch 249 training complete
Cost on training data: 0.006764382520680391
Accuracy on evaluation data: 8289 / 10000
Epoch 250 training complete
Cost on training data: 0.006732631530410944
Accuracy on evaluation data: 8289 / 10000
Epoch 251 training complete
Cost on training data: 0.006701933020287373
Accuracy on evaluation data: 8289 / 10000
Epoch 252 training complete
Cost on training data: 0.006671259538895656
Accuracy on evaluation data: 8289 / 10000
Epoch 253 training complete
Cost on training data: 0.006641157579902452
Accuracy on evaluation data: 8285 / 10000
Epoch 254 training complete
Cost on training data: 0.0066109929829551144
Accuracy on evaluation data: 8290 / 10000
Epoch 255 training complete
Cost on training data: 0.006581506524839147
Accuracy on evaluation data: 8287 / 10000
Epoch 256 training complete
Cost on training data: 0.006551805690698094
Accuracy on evaluation data: 8287 / 10000
Epoch 257 training complete
Cost on training data: 0.006523026610429851
Accuracy on evaluation data: 8287 / 10000
Epoch 258 training complete
Cost on training data: 0.006494546730728464
Accuracy on evaluation data: 8288 / 10000
Epoch 259 training complete
Cost on training data: 0.0064660179659774344
Accuracy on evaluation data: 8288 / 10000
Epoch 260 training complete
Cost on training data: 0.0064367914723149336
Accuracy on evaluation data: 8285 / 10000
Epoch 261 training complete
Cost on training data: 0.006408727168911711
Accuracy on evaluation data: 8288 / 10000
Epoch 262 training complete
Cost on training data: 0.006380707310155465
Accuracy on evaluation data: 8288 / 10000
Epoch 263 training complete
Cost on training data: 0.006354243273849885
Accuracy on evaluation data: 8287 / 10000
Epoch 264 training complete
Cost on training data: 0.006326277361867922
Accuracy on evaluation data: 8286 / 10000
Epoch 265 training complete
Cost on training data: 0.0062991969040432005
Accuracy on evaluation data: 8288 / 10000
Epoch 266 training complete
Cost on training data: 0.006271358916423154
Accuracy on evaluation data: 8287 / 10000
Epoch 267 training complete
Cost on training data: 0.006244483116143672
Accuracy on evaluation data: 8287 / 10000
Epoch 268 training complete
Cost on training data: 0.006218052193244496
Accuracy on evaluation data: 8288 / 10000
Epoch 269 training complete
Cost on training data: 0.006191649878248733
Accuracy on evaluation data: 8287 / 10000
Epoch 270 training complete
Cost on training data: 0.006165531628267799
Accuracy on evaluation data: 8286 / 10000
Epoch 271 training complete
Cost on training data: 0.006140146955646749
Accuracy on evaluation data: 8289 / 10000
Epoch 272 training complete
Cost on training data: 0.006113950325261941
Accuracy on evaluation data: 8286 / 10000
Epoch 273 training complete
Cost on training data: 0.006088633003724945
Accuracy on evaluation data: 8287 / 10000
Epoch 274 training complete
Cost on training data: 0.0060632176283099675
Accuracy on evaluation data: 8285 / 10000
Epoch 275 training complete
Cost on training data: 0.006038103695593386
Accuracy on evaluation data: 8286 / 10000
Epoch 276 training complete
Cost on training data: 0.006013333670245106
Accuracy on evaluation data: 8285 / 10000
Epoch 277 training complete
Cost on training data: 0.005988708172878443
Accuracy on evaluation data: 8284 / 10000
Epoch 278 training complete
Cost on training data: 0.005964067741801058
Accuracy on evaluation data: 8287 / 10000
Epoch 279 training complete
Cost on training data: 0.005939554584149414
Accuracy on evaluation data: 8289 / 10000
Epoch 280 training complete
Cost on training data: 0.005915545598152728
Accuracy on evaluation data: 8291 / 10000
Epoch 281 training complete
Cost on training data: 0.005891857072004096
Accuracy on evaluation data: 8290 / 10000
Epoch 282 training complete
Cost on training data: 0.00586778250095322
Accuracy on evaluation data: 8291 / 10000
Epoch 283 training complete
Cost on training data: 0.005844640036610796
Accuracy on evaluation data: 8289 / 10000
Epoch 284 training complete
Cost on training data: 0.005820750862614823
Accuracy on evaluation data: 8285 / 10000
Epoch 285 training complete
Cost on training data: 0.005797367894445975
Accuracy on evaluation data: 8288 / 10000
Epoch 286 training complete
Cost on training data: 0.005774598955034867
Accuracy on evaluation data: 8294 / 10000
Epoch 287 training complete
Cost on training data: 0.005751373286306451
Accuracy on evaluation data: 8289 / 10000
Epoch 288 training complete
Cost on training data: 0.005728814078936483
Accuracy on evaluation data: 8290 / 10000
Epoch 289 training complete
Cost on training data: 0.005706300804766538
Accuracy on evaluation data: 8291 / 10000
Epoch 290 training complete
Cost on training data: 0.005683950365914777
Accuracy on evaluation data: 8292 / 10000
Epoch 291 training complete
Cost on training data: 0.005662042864370976
Accuracy on evaluation data: 8292 / 10000
Epoch 292 training complete
Cost on training data: 0.005639337813036541
Accuracy on evaluation data: 8294 / 10000
Epoch 293 training complete
Cost on training data: 0.005617549865176806
Accuracy on evaluation data: 8294 / 10000
Epoch 294 training complete
Cost on training data: 0.00559550286933176
Accuracy on evaluation data: 8295 / 10000
Epoch 295 training complete
Cost on training data: 0.005574252235835078
Accuracy on evaluation data: 8294 / 10000
Epoch 296 training complete
Cost on training data: 0.005552555131129523
Accuracy on evaluation data: 8294 / 10000
Epoch 297 training complete
Cost on training data: 0.005531417672113423
Accuracy on evaluation data: 8296 / 10000
Epoch 298 training complete
Cost on training data: 0.005510311184376747
Accuracy on evaluation data: 8295 / 10000
Epoch 299 training complete
Cost on training data: 0.005488967115247259
Accuracy on evaluation data: 8296 / 10000
Epoch 300 training complete
Cost on training data: 0.005468213013422956
Accuracy on evaluation data: 8296 / 10000
Epoch 301 training complete
Cost on training data: 0.00544741925827183
Accuracy on evaluation data: 8296 / 10000
Epoch 302 training complete
Cost on training data: 0.005426749424850801
Accuracy on evaluation data: 8295 / 10000
Epoch 303 training complete
Cost on training data: 0.005406443086656321
Accuracy on evaluation data: 8293 / 10000
Epoch 304 training complete
Cost on training data: 0.005386234965373989
Accuracy on evaluation data: 8294 / 10000
Epoch 305 training complete
Cost on training data: 0.005365939021066804
Accuracy on evaluation data: 8296 / 10000
Epoch 306 training complete
Cost on training data: 0.005345641025678909
Accuracy on evaluation data: 8296 / 10000
Epoch 307 training complete
Cost on training data: 0.005325901343876197
Accuracy on evaluation data: 8296 / 10000
Epoch 308 training complete
Cost on training data: 0.005306269327524977
Accuracy on evaluation data: 8296 / 10000
Epoch 309 training complete
Cost on training data: 0.005286343727428884
Accuracy on evaluation data: 8295 / 10000
Epoch 310 training complete
Cost on training data: 0.005266895148305276
Accuracy on evaluation data: 8294 / 10000
Epoch 311 training complete
Cost on training data: 0.005247517707361536
Accuracy on evaluation data: 8297 / 10000
Epoch 312 training complete
Cost on training data: 0.005228179453774824
Accuracy on evaluation data: 8296 / 10000
Epoch 313 training complete
Cost on training data: 0.005209324835678492
Accuracy on evaluation data: 8297 / 10000
Epoch 314 training complete
Cost on training data: 0.005190128908498655
Accuracy on evaluation data: 8298 / 10000
Epoch 315 training complete
Cost on training data: 0.005171385294364427
Accuracy on evaluation data: 8300 / 10000
Epoch 316 training complete
Cost on training data: 0.00515268691928425
Accuracy on evaluation data: 8299 / 10000
Epoch 317 training complete
Cost on training data: 0.005133891651082554
Accuracy on evaluation data: 8303 / 10000
Epoch 318 training complete
Cost on training data: 0.005115622974893162
Accuracy on evaluation data: 8300 / 10000
Epoch 319 training complete
Cost on training data: 0.005097437333308749
Accuracy on evaluation data: 8303 / 10000
Epoch 320 training complete
Cost on training data: 0.005079048777339515
Accuracy on evaluation data: 8301 / 10000
Epoch 321 training complete
Cost on training data: 0.005061337321066521
Accuracy on evaluation data: 8300 / 10000
Epoch 322 training complete
Cost on training data: 0.005042854658846221
Accuracy on evaluation data: 8297 / 10000
Epoch 323 training complete
Cost on training data: 0.005025381949058676
Accuracy on evaluation data: 8300 / 10000
Epoch 324 training complete
Cost on training data: 0.005007264397397749
Accuracy on evaluation data: 8299 / 10000
Epoch 325 training complete
Cost on training data: 0.004989667240309711
Accuracy on evaluation data: 8300 / 10000
Epoch 326 training complete
Cost on training data: 0.004972418969779527
Accuracy on evaluation data: 8297 / 10000
Epoch 327 training complete
Cost on training data: 0.004954857529917982
Accuracy on evaluation data: 8298 / 10000
Epoch 328 training complete
Cost on training data: 0.004937466818368719
Accuracy on evaluation data: 8298 / 10000
Epoch 329 training complete
Cost on training data: 0.004920286690389597
Accuracy on evaluation data: 8298 / 10000
Epoch 330 training complete
Cost on training data: 0.004903642835898288
Accuracy on evaluation data: 8297 / 10000
Epoch 331 training complete
Cost on training data: 0.004886593368479418
Accuracy on evaluation data: 8297 / 10000
Epoch 332 training complete
Cost on training data: 0.004869638991650194
Accuracy on evaluation data: 8296 / 10000
Epoch 333 training complete
Cost on training data: 0.00485271148272832
Accuracy on evaluation data: 8297 / 10000
Epoch 334 training complete
Cost on training data: 0.004836033610728998
Accuracy on evaluation data: 8295 / 10000
Epoch 335 training complete
Cost on training data: 0.004819477012448158
Accuracy on evaluation data: 8296 / 10000
Epoch 336 training complete
Cost on training data: 0.00480316030175109
Accuracy on evaluation data: 8295 / 10000
Epoch 337 training complete
Cost on training data: 0.00478705986224669
Accuracy on evaluation data: 8297 / 10000
Epoch 338 training complete
Cost on training data: 0.004770778022837434
Accuracy on evaluation data: 8295 / 10000
Epoch 339 training complete
Cost on training data: 0.004754551562360567
Accuracy on evaluation data: 8296 / 10000
Epoch 340 training complete
Cost on training data: 0.004738621303450828
Accuracy on evaluation data: 8297 / 10000
Epoch 341 training complete
Cost on training data: 0.004722850165936068
Accuracy on evaluation data: 8294 / 10000
Epoch 342 training complete
Cost on training data: 0.004707135056049125
Accuracy on evaluation data: 8298 / 10000
Epoch 343 training complete
Cost on training data: 0.004691459445012149
Accuracy on evaluation data: 8298 / 10000
Epoch 344 training complete
Cost on training data: 0.004675963184852911
Accuracy on evaluation data: 8297 / 10000
Epoch 345 training complete
Cost on training data: 0.00466052074112378
Accuracy on evaluation data: 8297 / 10000
Epoch 346 training complete
Cost on training data: 0.004645277381835308
Accuracy on evaluation data: 8297 / 10000
Epoch 347 training complete
Cost on training data: 0.00462996261494318
Accuracy on evaluation data: 8297 / 10000
Epoch 348 training complete
Cost on training data: 0.004614739791104975
Accuracy on evaluation data: 8297 / 10000
Epoch 349 training complete
Cost on training data: 0.004599992358616343
Accuracy on evaluation data: 8295 / 10000
Epoch 350 training complete
Cost on training data: 0.004584734088606013
Accuracy on evaluation data: 8297 / 10000
Epoch 351 training complete
Cost on training data: 0.004570103992380071
Accuracy on evaluation data: 8295 / 10000
Epoch 352 training complete
Cost on training data: 0.004555271513090721
Accuracy on evaluation data: 8296 / 10000
Epoch 353 training complete
Cost on training data: 0.004540581544866581
Accuracy on evaluation data: 8297 / 10000
Epoch 354 training complete
Cost on training data: 0.004526186229498569
Accuracy on evaluation data: 8298 / 10000
Epoch 355 training complete
Cost on training data: 0.004511761068871661
Accuracy on evaluation data: 8298 / 10000
Epoch 356 training complete
Cost on training data: 0.004497118379147558
Accuracy on evaluation data: 8294 / 10000
Epoch 357 training complete
Cost on training data: 0.004482847142019097
Accuracy on evaluation data: 8294 / 10000
Epoch 358 training complete
Cost on training data: 0.004468448747955489
Accuracy on evaluation data: 8293 / 10000
Epoch 359 training complete
Cost on training data: 0.004454301662470557
Accuracy on evaluation data: 8294 / 10000
Epoch 360 training complete
Cost on training data: 0.004440241078412804
Accuracy on evaluation data: 8296 / 10000
Epoch 361 training complete
Cost on training data: 0.004426393304710121
Accuracy on evaluation data: 8294 / 10000
Epoch 362 training complete
Cost on training data: 0.004412696739676035
Accuracy on evaluation data: 8294 / 10000
Epoch 363 training complete
Cost on training data: 0.004398766430765669
Accuracy on evaluation data: 8292 / 10000
Epoch 364 training complete
Cost on training data: 0.0043851297529393434
Accuracy on evaluation data: 8293 / 10000
Epoch 365 training complete
Cost on training data: 0.004371278417418942
Accuracy on evaluation data: 8294 / 10000
Epoch 366 training complete
Cost on training data: 0.004357756426617245
Accuracy on evaluation data: 8295 / 10000
Epoch 367 training complete
Cost on training data: 0.004344284032422745
Accuracy on evaluation data: 8296 / 10000
Epoch 368 training complete
Cost on training data: 0.004330980365147555
Accuracy on evaluation data: 8293 / 10000
Epoch 369 training complete
Cost on training data: 0.004317686829298213
Accuracy on evaluation data: 8293 / 10000
Epoch 370 training complete
Cost on training data: 0.0043043472696091154
Accuracy on evaluation data: 8294 / 10000
Epoch 371 training complete
Cost on training data: 0.0042911592000915155
Accuracy on evaluation data: 8293 / 10000
Epoch 372 training complete
Cost on training data: 0.004278081739233593
Accuracy on evaluation data: 8294 / 10000
Epoch 373 training complete
Cost on training data: 0.0042650820843650825
Accuracy on evaluation data: 8294 / 10000
Epoch 374 training complete
Cost on training data: 0.004252218806671443
Accuracy on evaluation data: 8294 / 10000
Epoch 375 training complete
Cost on training data: 0.004239365142552146
Accuracy on evaluation data: 8294 / 10000
Epoch 376 training complete
Cost on training data: 0.004226469181877731
Accuracy on evaluation data: 8294 / 10000
Epoch 377 training complete
Cost on training data: 0.004213789008565988
Accuracy on evaluation data: 8293 / 10000
Epoch 378 training complete
Cost on training data: 0.004201082977746353
Accuracy on evaluation data: 8294 / 10000
Epoch 379 training complete
Cost on training data: 0.004188461670676879
Accuracy on evaluation data: 8296 / 10000
Epoch 380 training complete
Cost on training data: 0.0041759358215385746
Accuracy on evaluation data: 8296 / 10000
Epoch 381 training complete
Cost on training data: 0.004163631798612785
Accuracy on evaluation data: 8292 / 10000
Epoch 382 training complete
Cost on training data: 0.0041512850871808
Accuracy on evaluation data: 8295 / 10000
Epoch 383 training complete
Cost on training data: 0.004138751495244101
Accuracy on evaluation data: 8295 / 10000
Epoch 384 training complete
Cost on training data: 0.0041265318885868865
Accuracy on evaluation data: 8295 / 10000
Epoch 385 training complete
Cost on training data: 0.004114300724664924
Accuracy on evaluation data: 8296 / 10000
Epoch 386 training complete
Cost on training data: 0.004102243418451178
Accuracy on evaluation data: 8294 / 10000
Epoch 387 training complete
Cost on training data: 0.004090173483661543
Accuracy on evaluation data: 8295 / 10000
Epoch 388 training complete
Cost on training data: 0.004078249277886824
Accuracy on evaluation data: 8295 / 10000
Epoch 389 training complete
Cost on training data: 0.0040663651513213744
Accuracy on evaluation data: 8294 / 10000
Epoch 390 training complete
Cost on training data: 0.004054492619413702
Accuracy on evaluation data: 8292 / 10000
Epoch 391 training complete
Cost on training data: 0.004042732274433442
Accuracy on evaluation data: 8292 / 10000
Epoch 392 training complete
Cost on training data: 0.004030998674520286
Accuracy on evaluation data: 8297 / 10000
Epoch 393 training complete
Cost on training data: 0.0040193176064631094
Accuracy on evaluation data: 8292 / 10000
Epoch 394 training complete
Cost on training data: 0.004007769817727777
Accuracy on evaluation data: 8293 / 10000
Epoch 395 training complete
Cost on training data: 0.003996267221335551
Accuracy on evaluation data: 8293 / 10000
Epoch 396 training complete
Cost on training data: 0.0039846249761465655
Accuracy on evaluation data: 8292 / 10000
Epoch 397 training complete
Cost on training data: 0.003973323580052852
Accuracy on evaluation data: 8297 / 10000
Epoch 398 training complete
Cost on training data: 0.003961815179733587
Accuracy on evaluation data: 8293 / 10000
Epoch 399 training complete
Cost on training data: 0.0039505331603134535
Accuracy on evaluation data: 8293 / 10000
([],
 [5336,
  6481,
  7009,
  7278,
  7518,
  7621,
  7749,
  7741,
  7913,
  7890,
  7977,
  7938,
  7913,
  8051,
  8115,
  8133,
  8096,
  8108,
  8138,
  8127,
  8145,
  8126,
  8127,
  8159,
  8160,
  8176,
  8163,
  8185,
  8192,
  8190,
  8187,
  8204,
  8208,
  8180,
  8193,
  8206,
  8207,
  8198,
  8223,
  8217,
  8211,
  8221,
  8218,
  8236,
  8215,
  8232,
  8220,
  8228,
  8224,
  8224,
  8236,
  8220,
  8222,
  8232,
  8231,
  8236,
  8231,
  8234,
  8225,
  8232,
  8228,
  8234,
  8234,
  8241,
  8233,
  8234,
  8224,
  8234,
  8233,
  8235,
  8238,
  8241,
  8235,
  8237,
  8241,
  8235,
  8245,
  8236,
  8246,
  8244,
  8250,
  8242,
  8240,
  8243,
  8248,
  8247,
  8253,
  8251,
  8245,
  8255,
  8248,
  8236,
  8246,
  8248,
  8243,
  8240,
  8254,
  8247,
  8247,
  8243,
  8246,
  8241,
  8249,
  8235,
  8239,
  8240,
  8253,
  8246,
  8245,
  8243,
  8248,
  8249,
  8244,
  8255,
  8250,
  8253,
  8253,
  8254,
  8245,
  8245,
  8253,
  8248,
  8256,
  8256,
  8247,
  8250,
  8252,
  8249,
  8251,
  8246,
  8250,
  8258,
  8260,
  8260,
  8266,
  8260,
  8264,
  8251,
  8256,
  8261,
  8269,
  8268,
  8270,
  8268,
  8263,
  8260,
  8263,
  8268,
  8267,
  8262,
  8270,
  8265,
  8268,
  8265,
  8273,
  8271,
  8273,
  8270,
  8273,
  8271,
  8274,
  8268,
  8276,
  8269,
  8270,
  8268,
  8270,
  8272,
  8269,
  8275,
  8272,
  8276,
  8270,
  8276,
  8274,
  8273,
  8275,
  8273,
  8277,
  8277,
  8274,
  8274,
  8278,
  8280,
  8280,
  8279,
  8277,
  8277,
  8274,
  8278,
  8274,
  8277,
  8276,
  8276,
  8279,
  8280,
  8278,
  8274,
  8275,
  8277,
  8276,
  8276,
  8278,
  8282,
  8282,
  8281,
  8280,
  8280,
  8284,
  8278,
  8285,
  8282,
  8282,
  8279,
  8277,
  8281,
  8277,
  8286,
  8287,
  8284,
  8283,
  8280,
  8283,
  8284,
  8286,
  8283,
  8285,
  8284,
  8286,
  8289,
  8287,
  8288,
  8285,
  8284,
  8288,
  8288,
  8288,
  8291,
  8287,
  8286,
  8286,
  8287,
  8285,
  8286,
  8287,
  8289,
  8287,
  8287,
  8286,
  8289,
  8289,
  8289,
  8289,
  8285,
  8290,
  8287,
  8287,
  8287,
  8288,
  8288,
  8285,
  8288,
  8288,
  8287,
  8286,
  8288,
  8287,
  8287,
  8288,
  8287,
  8286,
  8289,
  8286,
  8287,
  8285,
  8286,
  8285,
  8284,
  8287,
  8289,
  8291,
  8290,
  8291,
  8289,
  8285,
  8288,
  8294,
  8289,
  8290,
  8291,
  8292,
  8292,
  8294,
  8294,
  8295,
  8294,
  8294,
  8296,
  8295,
  8296,
  8296,
  8296,
  8295,
  8293,
  8294,
  8296,
  8296,
  8296,
  8296,
  8295,
  8294,
  8297,
  8296,
  8297,
  8298,
  8300,
  8299,
  8303,
  8300,
  8303,
  8301,
  8300,
  8297,
  8300,
  8299,
  8300,
  8297,
  8298,
  8298,
  8298,
  8297,
  8297,
  8296,
  8297,
  8295,
  8296,
  8295,
  8297,
  8295,
  8296,
  8297,
  8294,
  8298,
  8298,
  8297,
  8297,
  8297,
  8297,
  8297,
  8295,
  8297,
  8295,
  8296,
  8297,
  8298,
  8298,
  8294,
  8294,
  8293,
  8294,
  8296,
  8294,
  8294,
  8292,
  8293,
  8294,
  8295,
  8296,
  8293,
  8293,
  8294,
  8293,
  8294,
  8294,
  8294,
  8294,
  8294,
  8293,
  8294,
  8296,
  8296,
  8292,
  8295,
  8295,
  8295,
  8296,
  8294,
  8295,
  8295,
  8294,
  8292,
  8292,
  8297,
  8292,
  8293,
  8293,
  8292,
  8297,
  8293,
  8293],
 [1.9157265803666599,
  1.4233085154513578,
  1.140610742579291,
  0.9673757700312096,
  0.8219014211106735,
  0.7009795108242491,
  0.6210731479476688,
  0.5649499666413196,
  0.4967537573921582,
  0.4456219552812798,
  0.39901535860495846,
  0.36146464324057803,
  0.3545073579313579,
  0.3087184094679663,
  0.2797147873623128,
  0.2525545832423591,
  0.2380955942826455,
  0.22646415893206082,
  0.20683416913226718,
  0.19279183350114984,
  0.1866037912492253,
  0.17397172190964394,
  0.16138418421655057,
  0.15258576699007645,
  0.14559589206163331,
  0.13733030674883276,
  0.13286504240839203,
  0.12490575508106312,
  0.11897385331053419,
  0.11416971608153519,
  0.11001761063509545,
  0.10569402966303736,
  0.1004963762997966,
  0.09764112227836957,
  0.09319934983369883,
  0.08868491774835854,
  0.08595086097925883,
  0.08350067214392239,
  0.07869753555787781,
  0.07594792345922877,
  0.07346669031585368,
  0.0708839567187532,
  0.06831180691768503,
  0.06711002154610325,
  0.06405338526915547,
  0.06160733031909577,
  0.0596664860482716,
  0.05782624278400594,
  0.05620444418301355,
  0.05465434524748508,
  0.053404822360543044,
  0.051435838493430454,
  0.04995871675143006,
  0.049178511975360266,
  0.04761860543521613,
  0.045694060305748366,
  0.04471523134867321,
  0.04353039157951212,
  0.042274507860818385,
  0.04128529084302505,
  0.040370416208562215,
  0.03927593215490957,
  0.03849301313041344,
  0.03784698326101495,
  0.036834324900328834,
  0.036043977294875976,
  0.03532049078790165,
  0.03457195553625183,
  0.03396945778541524,
  0.033330702513722874,
  0.03261064682887517,
  0.03190709529638142,
  0.03152396913807024,
  0.03076630409249701,
  0.03023149206147074,
  0.029662199460815355,
  0.029187429963762536,
  0.028691340398401113,
  0.028152612115837413,
  0.02781349540189489,
  0.027304669134411593,
  0.026856672884772514,
  0.02639586449219745,
  0.02595796469117669,
  0.025575379849845398,
  0.025208345723867195,
  0.024804570034370994,
  0.024434796921913336,
  0.024103135768802273,
  0.02376708064146453,
  0.023434898537771164,
  0.023070941073319938,
  0.022738641763154207,
  0.022405034123781458,
  0.022116731931907656,
  0.021811183501928306,
  0.021542550179585496,
  0.0212327173512337,
  0.020955587687164007,
  0.02070081988279502,
  0.020464101024792077,
  0.020183177431458525,
  0.01990947657685897,
  0.019673095002806538,
  0.01945071220597259,
  0.019183272328162273,
  0.01896155749194874,
  0.018737027644866,
  0.01852884659595162,
  0.018326055157791767,
  0.01809023758910947,
  0.017889669482253606,
  0.017689472412734882,
  0.01748919382769571,
  0.01729858085590458,
  0.017126279843091046,
  0.016934041161704855,
  0.01674109717372653,
  0.01655962677581138,
  0.01639757707029055,
  0.01621579584970799,
  0.01605936365039096,
  0.01589021235514923,
  0.01572518834641798,
  0.015569978429947886,
  0.015407319294389883,
  0.015251287024967676,
  0.015108016005986392,
  0.014959029113495606,
  0.014821946295991628,
  0.014676781773088376,
  0.014547077394996423,
  0.014406349446049009,
  0.014274582991979683,
  0.01415044622375008,
  0.014012607992043,
  0.01389321569049412,
  0.013766664380162422,
  0.013653798693873268,
  0.01352502677394385,
  0.013409418654958455,
  0.013301199019208685,
  0.013180497870486077,
  0.013077935972272886,
  0.01296117787556624,
  0.012856299557458047,
  0.012751451883080927,
  0.012645519491190157,
  0.012545166466142342,
  0.012447508071661858,
  0.01234312271672617,
  0.012247949541335238,
  0.012152139704774108,
  0.012064314828212068,
  0.011966400675534445,
  0.011874914838803683,
  0.011782724235167658,
  0.0116966319554314,
  0.011610256479557782,
  0.011520105258763945,
  0.011439755468064356,
  0.011357655493793929,
  0.011283409102421852,
  0.01118870672659996,
  0.01110691983469005,
  0.011027037584351687,
  0.0109480839360458,
  0.010868154795374474,
  0.010794628856790733,
  0.010718020864434652,
  0.010640652611700866,
  0.010568505850681333,
  0.010495144281480608,
  0.010423459700472372,
  0.01035215785574867,
  0.010281598389256895,
  0.010213743367284523,
  0.010144601049374632,
  0.010075490292771319,
  0.010008294371522152,
  0.009942008567899162,
  0.009878198710323437,
  0.009813496530298511,
  0.009745402673000931,
  0.009683709284427975,
  0.009620426739445657,
  0.009557455763009055,
  0.00949805534433109,
  0.009438038102481235,
  0.009375645959080825,
  0.009315332442636128,
  0.0092563441875046,
  0.009197103377941369,
  0.00913937219535242,
  0.009083971707609098,
  0.00902560657927742,
  0.008970855935428792,
  0.008913885902286431,
  0.008859597006419211,
  0.008805134195939951,
  0.008753320472002191,
  0.008699193168168104,
  0.008646628714748658,
  0.008596697512196652,
  0.00854450520511543,
  0.008494266012365406,
  0.00844535278211304,
  0.008396311920896578,
  0.008348879753283944,
  0.00830025892889786,
  0.008253635655113563,
  0.008206207699730876,
  0.008159949913241944,
  0.008115473505970949,
  0.008070457971559742,
  0.008025276871048191,
  0.00798175692971702,
  0.007937385276313748,
  0.00789409095460702,
  0.007851964134100609,
  0.007810620015641798,
  0.007768607301514054,
  0.007727871181417616,
  0.00768651891795349,
  0.007645607395488211,
  0.007606159909610643,
  0.0075671767015055395,
  0.007528717093426312,
  0.007490312469768559,
  0.007451488872162558,
  0.0074139164053383805,
  0.00737624578357679,
  0.007339102825272781,
  0.007303170786097757,
  0.007266443130721465,
  0.0072306431437006245,
  0.007195464529128684,
  0.007160239334472621,
  0.007125011344912007,
  0.0070907315932205815,
  0.0070564370106393146,
  0.007022949284325297,
  0.006989332090196003,
  0.0069575643021553045,
  0.006922972969011052,
  0.006890322585898007,
  0.0068582171355924875,
  0.006826825664746198,
  0.006795318387755124,
  0.006764382520680391,
  0.006732631530410944,
  0.006701933020287373,
  0.006671259538895656,
  0.006641157579902452,
  0.0066109929829551144,
  0.006581506524839147,
  0.006551805690698094,
  0.006523026610429851,
  0.006494546730728464,
  0.0064660179659774344,
  0.0064367914723149336,
  0.006408727168911711,
  0.006380707310155465,
  0.006354243273849885,
  0.006326277361867922,
  0.0062991969040432005,
  0.006271358916423154,
  0.006244483116143672,
  0.006218052193244496,
  0.006191649878248733,
  0.006165531628267799,
  0.006140146955646749,
  0.006113950325261941,
  0.006088633003724945,
  0.0060632176283099675,
  0.006038103695593386,
  0.006013333670245106,
  0.005988708172878443,
  0.005964067741801058,
  0.005939554584149414,
  0.005915545598152728,
  0.005891857072004096,
  0.00586778250095322,
  0.005844640036610796,
  0.005820750862614823,
  0.005797367894445975,
  0.005774598955034867,
  0.005751373286306451,
  0.005728814078936483,
  0.005706300804766538,
  0.005683950365914777,
  0.005662042864370976,
  0.005639337813036541,
  0.005617549865176806,
  0.00559550286933176,
  0.005574252235835078,
  0.005552555131129523,
  0.005531417672113423,
  0.005510311184376747,
  0.005488967115247259,
  0.005468213013422956,
  0.00544741925827183,
  0.005426749424850801,
  0.005406443086656321,
  0.005386234965373989,
  0.005365939021066804,
  0.005345641025678909,
  0.005325901343876197,
  0.005306269327524977,
  0.005286343727428884,
  0.005266895148305276,
  0.005247517707361536,
  0.005228179453774824,
  0.005209324835678492,
  0.005190128908498655,
  0.005171385294364427,
  0.00515268691928425,
  0.005133891651082554,
  0.005115622974893162,
  0.005097437333308749,
  0.005079048777339515,
  0.005061337321066521,
  0.005042854658846221,
  0.005025381949058676,
  0.005007264397397749,
  0.004989667240309711,
  0.004972418969779527,
  0.004954857529917982,
  0.004937466818368719,
  0.004920286690389597,
  0.004903642835898288,
  0.004886593368479418,
  0.004869638991650194,
  0.00485271148272832,
  0.004836033610728998,
  0.004819477012448158,
  0.00480316030175109,
  0.00478705986224669,
  0.004770778022837434,
  0.004754551562360567,
  0.004738621303450828,
  0.004722850165936068,
  0.004707135056049125,
  0.004691459445012149,
  0.004675963184852911,
  0.00466052074112378,
  0.004645277381835308,
  0.00462996261494318,
  0.004614739791104975,
  0.004599992358616343,
  0.004584734088606013,
  0.004570103992380071,
  0.004555271513090721,
  0.004540581544866581,
  0.004526186229498569,
  0.004511761068871661,
  0.004497118379147558,
  0.004482847142019097,
  0.004468448747955489,
  0.004454301662470557,
  0.004440241078412804,
  0.004426393304710121,
  0.004412696739676035,
  0.004398766430765669,
  0.0043851297529393434,
  0.004371278417418942,
  0.004357756426617245,
  0.004344284032422745,
  0.004330980365147555,
  0.004317686829298213,
  0.0043043472696091154,
  0.0042911592000915155,
  0.004278081739233593,
  0.0042650820843650825,
  0.004252218806671443,
  0.004239365142552146,
  0.004226469181877731,
  0.004213789008565988,
  0.004201082977746353,
  0.004188461670676879,
  0.0041759358215385746,
  0.004163631798612785,
  0.0041512850871808,
  0.004138751495244101,
  0.0041265318885868865,
  0.004114300724664924,
  0.004102243418451178,
  0.004090173483661543,
  0.004078249277886824,
  0.0040663651513213744,
  0.004054492619413702,
  0.004042732274433442,
  0.004030998674520286,
  0.0040193176064631094,
  0.004007769817727777,
  0.003996267221335551,
  0.0039846249761465655,
  0.003973323580052852,
  0.003961815179733587,
  0.0039505331603134535],
 [])
class VanillaMNIST(nn.Module):
    def __init__(self):
        super().__init__()

        self.linear = nn.Sequential(
            nn.Linear(784, 30), nn.Sigmoid(), nn.Linear(30, 10), nn.Sigmoid()
        )

    def forward(self, x):
        return self.linear(x)
# Training loop
training_data = training_data[:1000]
evaluation_data = test_data

n = len(training_data)
epochs = 400
mini_batch_size = 10
alpha = 0.5
model = VanillaMNIST()

costs = []
accuracies = []

# Experiment: comparing regularized and non-regularized models
for j in range(epochs):
    random.shuffle(training_data)
    mini_batches = [
        training_data[k : k + mini_batch_size] for k in range(0, n, mini_batch_size)
    ]
    epoch_loss = 0
    for mini_batch in mini_batches:
        total_loss = torch.tensor([0.0])
        for x, y in mini_batch:
            x = torch.tensor(x).squeeze(1)
            y = torch.tensor(y).squeeze(1)

            # Calculate log loss and add to loss
            y_hat = model(x)
            loss = y * torch.log(y_hat) + (1 - y) * torch.log(1 - y_hat)
            total_loss += sum(loss)

        log_loss = -(1 / len(mini_batch)) * total_loss
        model.zero_grad()
        log_loss.backward()
        for p in model.parameters():
            p.data = p.data - alpha * p.grad
        epoch_loss += log_loss.item()

    # Evaluate model on evaluation_data
    correct = 0
    for x, y in evaluation_data:
        x = torch.tensor(x).squeeze(1)
        y_hat = model(x)
        if torch.argmax(y_hat) == y:
            correct += 1

    costs.append(epoch_loss / len(mini_batches))
    accuracies.append(correct / len(evaluation_data))
    print(
        f"Epoch {j}, Loss: {epoch_loss / len(mini_batches)}, Accuracy: {correct/len(evaluation_data)}"
    )
Epoch 0, Loss: 2.4531275963783266, Accuracy: 0.7109
Epoch 1, Loss: 1.2405188855528833, Accuracy: 0.8002
Epoch 2, Loss: 0.8996775972843171, Accuracy: 0.8283
Epoch 3, Loss: 0.6996856625378132, Accuracy: 0.8622
Epoch 4, Loss: 0.5463169002532959, Accuracy: 0.8649
Epoch 5, Loss: 0.46642671793699264, Accuracy: 0.874
Epoch 6, Loss: 0.3940893569588661, Accuracy: 0.8774
Epoch 7, Loss: 0.31663369677960873, Accuracy: 0.8706
Epoch 8, Loss: 0.26746165841817854, Accuracy: 0.8696
Epoch 9, Loss: 0.22563948720693588, Accuracy: 0.8668
Epoch 10, Loss: 0.1941769213229418, Accuracy: 0.8728
Epoch 11, Loss: 0.16567592516541482, Accuracy: 0.8714
Epoch 12, Loss: 0.14149875968694686, Accuracy: 0.8695
Epoch 13, Loss: 0.12648260202258826, Accuracy: 0.8735
Epoch 14, Loss: 0.11159584682434798, Accuracy: 0.8771
Epoch 15, Loss: 0.10187144234776496, Accuracy: 0.8752
Epoch 16, Loss: 0.090857108309865, Accuracy: 0.8723
Epoch 17, Loss: 0.0825225205719471, Accuracy: 0.8732
Epoch 18, Loss: 0.07501107199117542, Accuracy: 0.8708
Epoch 19, Loss: 0.06901620700955391, Accuracy: 0.8744
Epoch 20, Loss: 0.06235401973128319, Accuracy: 0.8715
Epoch 21, Loss: 0.05752668173983693, Accuracy: 0.8731
Epoch 22, Loss: 0.053382117561995984, Accuracy: 0.871
Epoch 23, Loss: 0.048515842566266655, Accuracy: 0.8706
Epoch 24, Loss: 0.045617444226518276, Accuracy: 0.8739
Epoch 25, Loss: 0.042514854790642854, Accuracy: 0.8726
Epoch 26, Loss: 0.040018187407404184, Accuracy: 0.8749
Epoch 27, Loss: 0.037688900977373124, Accuracy: 0.8745
Epoch 28, Loss: 0.03576532278209925, Accuracy: 0.8738
Epoch 29, Loss: 0.03388645425438881, Accuracy: 0.871
Epoch 30, Loss: 0.031943229897879066, Accuracy: 0.872
Epoch 31, Loss: 0.030804977854713797, Accuracy: 0.8725
Epoch 32, Loss: 0.029267427837476136, Accuracy: 0.8724
Epoch 33, Loss: 0.027988012125715615, Accuracy: 0.8727
Epoch 34, Loss: 0.02692528400570154, Accuracy: 0.8734
Epoch 35, Loss: 0.02578953033313155, Accuracy: 0.8739
Epoch 36, Loss: 0.02473074817098677, Accuracy: 0.8727
Epoch 37, Loss: 0.023948485157452525, Accuracy: 0.8734
Epoch 38, Loss: 0.023083161497488618, Accuracy: 0.8722
Epoch 39, Loss: 0.022216263893060386, Accuracy: 0.8724
Epoch 40, Loss: 0.021428094282746314, Accuracy: 0.8729
Epoch 41, Loss: 0.0207216667663306, Accuracy: 0.8725
Epoch 42, Loss: 0.020075792497955262, Accuracy: 0.8734
Epoch 43, Loss: 0.019438890577293932, Accuracy: 0.8726
Epoch 44, Loss: 0.018888883148320018, Accuracy: 0.8729
Epoch 45, Loss: 0.018310073078610004, Accuracy: 0.8743
Epoch 46, Loss: 0.01786278264131397, Accuracy: 0.8723
Epoch 47, Loss: 0.017336098682135342, Accuracy: 0.8726
Epoch 48, Loss: 0.016839303341694176, Accuracy: 0.8724
Epoch 49, Loss: 0.016438655094243585, Accuracy: 0.8735
Epoch 50, Loss: 0.016002707018051298, Accuracy: 0.8733
Epoch 51, Loss: 0.015571096451021732, Accuracy: 0.8734
Epoch 52, Loss: 0.015197516572661697, Accuracy: 0.8726
Epoch 53, Loss: 0.01483210429083556, Accuracy: 0.8721
Epoch 54, Loss: 0.0144432037579827, Accuracy: 0.8726
Epoch 55, Loss: 0.014115709997713565, Accuracy: 0.8724
Epoch 56, Loss: 0.013824242195114493, Accuracy: 0.8727
Epoch 57, Loss: 0.013513712235726416, Accuracy: 0.8724
Epoch 58, Loss: 0.013189801815897226, Accuracy: 0.8726
Epoch 59, Loss: 0.012927961926907301, Accuracy: 0.8727
Epoch 60, Loss: 0.012648520213551818, Accuracy: 0.8728
Epoch 61, Loss: 0.01237380321836099, Accuracy: 0.8719
Epoch 62, Loss: 0.01211765622952953, Accuracy: 0.8729
Epoch 63, Loss: 0.011900372395757586, Accuracy: 0.8733
Epoch 64, Loss: 0.01165993848349899, Accuracy: 0.8731
Epoch 65, Loss: 0.011433309789281339, Accuracy: 0.8723
Epoch 66, Loss: 0.011227149807382375, Accuracy: 0.8728
Epoch 67, Loss: 0.01101183719234541, Accuracy: 0.8725
Epoch 68, Loss: 0.010828937229234725, Accuracy: 0.8726
Epoch 69, Loss: 0.01060554861323908, Accuracy: 0.8718
Epoch 70, Loss: 0.010421425142558291, Accuracy: 0.8725
Epoch 71, Loss: 0.010229339357465506, Accuracy: 0.8725
Epoch 72, Loss: 0.0100756605155766, Accuracy: 0.8721
Epoch 73, Loss: 0.009899040868040174, Accuracy: 0.8715
Epoch 74, Loss: 0.009749302093405276, Accuracy: 0.8715
Epoch 75, Loss: 0.009569611989427358, Accuracy: 0.8733
Epoch 76, Loss: 0.009425292597152293, Accuracy: 0.8727
Epoch 77, Loss: 0.009264195358846337, Accuracy: 0.8721
Epoch 78, Loss: 0.009127193087479099, Accuracy: 0.8724
Epoch 79, Loss: 0.00898524905089289, Accuracy: 0.8723
Epoch 80, Loss: 0.008832463757134973, Accuracy: 0.8723
Epoch 81, Loss: 0.008708758614957332, Accuracy: 0.8713
Epoch 82, Loss: 0.008584543475881218, Accuracy: 0.8721
Epoch 83, Loss: 0.008455280263442546, Accuracy: 0.8722
Epoch 84, Loss: 0.0083364762715064, Accuracy: 0.8723
Epoch 85, Loss: 0.008208162419032305, Accuracy: 0.8729
Epoch 86, Loss: 0.008105757315643131, Accuracy: 0.8724
Epoch 87, Loss: 0.007989216134883464, Accuracy: 0.8722
Epoch 88, Loss: 0.007875825720839202, Accuracy: 0.8722
Epoch 89, Loss: 0.007766608237288892, Accuracy: 0.8721
Epoch 90, Loss: 0.00767501711146906, Accuracy: 0.8722
Epoch 91, Loss: 0.007565572921885177, Accuracy: 0.8719
Epoch 92, Loss: 0.0074633844604250045, Accuracy: 0.8718
Epoch 93, Loss: 0.007358712801942602, Accuracy: 0.8719
Epoch 94, Loss: 0.007279909494100139, Accuracy: 0.8724
Epoch 95, Loss: 0.007189764491049573, Accuracy: 0.8724
Epoch 96, Loss: 0.007097371772397309, Accuracy: 0.8716
Epoch 97, Loss: 0.0070083232445176695, Accuracy: 0.8723
Epoch 98, Loss: 0.0069213293492794035, Accuracy: 0.8721
Epoch 99, Loss: 0.006838312244508415, Accuracy: 0.8727
Epoch 100, Loss: 0.006754095115466044, Accuracy: 0.8717
Epoch 101, Loss: 0.006674795488361269, Accuracy: 0.8716
Epoch 102, Loss: 0.006585179039975628, Accuracy: 0.8728
Epoch 103, Loss: 0.006517461003968492, Accuracy: 0.8721
Epoch 104, Loss: 0.006449149603722617, Accuracy: 0.8722
Epoch 105, Loss: 0.006372452431824058, Accuracy: 0.8722
Epoch 106, Loss: 0.0063019146304577585, Accuracy: 0.8722
Epoch 107, Loss: 0.006214031298877671, Accuracy: 0.8722
Epoch 108, Loss: 0.006165632943157107, Accuracy: 0.8726
Epoch 109, Loss: 0.006090059682028368, Accuracy: 0.8726
Epoch 110, Loss: 0.006027454623254016, Accuracy: 0.8723
Epoch 111, Loss: 0.005960115757770837, Accuracy: 0.8725
Epoch 112, Loss: 0.005901065523503348, Accuracy: 0.8723
Epoch 113, Loss: 0.005839881401043385, Accuracy: 0.8726
Epoch 114, Loss: 0.005779328050557524, Accuracy: 0.8723
Epoch 115, Loss: 0.005719172375975177, Accuracy: 0.8725
Epoch 116, Loss: 0.005660945548443124, Accuracy: 0.8724
Epoch 117, Loss: 0.0056067858287133275, Accuracy: 0.872
Epoch 118, Loss: 0.005553687383653596, Accuracy: 0.8721
Epoch 119, Loss: 0.005491173712071032, Accuracy: 0.8723
Epoch 120, Loss: 0.005438129890244454, Accuracy: 0.8724
Epoch 121, Loss: 0.0053864287014584985, Accuracy: 0.8724
Epoch 122, Loss: 0.005330148183275014, Accuracy: 0.8723
Epoch 123, Loss: 0.005280320981983095, Accuracy: 0.8724
Epoch 124, Loss: 0.005229571026284248, Accuracy: 0.8722
Epoch 125, Loss: 0.0051804867188911885, Accuracy: 0.8726
Epoch 126, Loss: 0.005136047217529267, Accuracy: 0.8724
Epoch 127, Loss: 0.005084951400058344, Accuracy: 0.8727
Epoch 128, Loss: 0.005040062270127237, Accuracy: 0.8724
Epoch 129, Loss: 0.00499373213853687, Accuracy: 0.8726
Epoch 130, Loss: 0.004950132307130844, Accuracy: 0.872
Epoch 131, Loss: 0.004904215610586107, Accuracy: 0.8723
Epoch 132, Loss: 0.004856875179102644, Accuracy: 0.8723
Epoch 133, Loss: 0.004813711573369801, Accuracy: 0.8721
Epoch 134, Loss: 0.00477546792360954, Accuracy: 0.8722
Epoch 135, Loss: 0.004735157394316047, Accuracy: 0.8721
Epoch 136, Loss: 0.004693779923254624, Accuracy: 0.8723
Epoch 137, Loss: 0.004653818474616855, Accuracy: 0.8725
Epoch 138, Loss: 0.004610908387112431, Accuracy: 0.8723
Epoch 139, Loss: 0.004574819676345214, Accuracy: 0.8725
Epoch 140, Loss: 0.004536065114079974, Accuracy: 0.8723
Epoch 141, Loss: 0.004500375627540052, Accuracy: 0.8723
Epoch 142, Loss: 0.004460379459196702, Accuracy: 0.8722
Epoch 143, Loss: 0.004422831879928708, Accuracy: 0.8723
Epoch 144, Loss: 0.004387139801401645, Accuracy: 0.872
Epoch 145, Loss: 0.0043527864816132935, Accuracy: 0.8723
Epoch 146, Loss: 0.004314843084430322, Accuracy: 0.8725
Epoch 147, Loss: 0.004283120817271993, Accuracy: 0.8724
Epoch 148, Loss: 0.004246901428559795, Accuracy: 0.8719
Epoch 149, Loss: 0.004213269703905098, Accuracy: 0.8723
Epoch 150, Loss: 0.004179239843506366, Accuracy: 0.872
Epoch 151, Loss: 0.004146985519910231, Accuracy: 0.8722
Epoch 152, Loss: 0.004116269942605868, Accuracy: 0.8722
Epoch 153, Loss: 0.004082613876089453, Accuracy: 0.8722
Epoch 154, Loss: 0.004052915449719876, Accuracy: 0.872
Epoch 155, Loss: 0.004019025803427212, Accuracy: 0.8726
Epoch 156, Loss: 0.003991376632475294, Accuracy: 0.8722
Epoch 157, Loss: 0.00395896740956232, Accuracy: 0.8723
Epoch 158, Loss: 0.003932468290440738, Accuracy: 0.8721
Epoch 159, Loss: 0.003902717604069039, Accuracy: 0.8721
Epoch 160, Loss: 0.0038727907254360616, Accuracy: 0.8722
Epoch 161, Loss: 0.003844907159218565, Accuracy: 0.8718
Epoch 162, Loss: 0.0038146300194785, Accuracy: 0.8719
Epoch 163, Loss: 0.0037885506550082936, Accuracy: 0.8718
Epoch 164, Loss: 0.003761927291052416, Accuracy: 0.8717
Epoch 165, Loss: 0.0037343191372929143, Accuracy: 0.8715
Epoch 166, Loss: 0.0037076471466571093, Accuracy: 0.8722
Epoch 167, Loss: 0.003683471523690969, Accuracy: 0.8718
Epoch 168, Loss: 0.0036561070953030138, Accuracy: 0.8716
Epoch 169, Loss: 0.0036313842370873316, Accuracy: 0.8715
Epoch 170, Loss: 0.003605616603163071, Accuracy: 0.8715
Epoch 171, Loss: 0.003582297961693257, Accuracy: 0.8714
Epoch 172, Loss: 0.0035547465935815126, Accuracy: 0.8717
Epoch 173, Loss: 0.0035323927376884967, Accuracy: 0.8716
Epoch 174, Loss: 0.003508404188323766, Accuracy: 0.8719
Epoch 175, Loss: 0.003486423323629424, Accuracy: 0.8715
Epoch 176, Loss: 0.003464667713851668, Accuracy: 0.8713
Epoch 177, Loss: 0.003438445977517404, Accuracy: 0.8715
Epoch 178, Loss: 0.0034181723801884802, Accuracy: 0.8714
Epoch 179, Loss: 0.0033946451381780206, Accuracy: 0.8714
Epoch 180, Loss: 0.003375308957183734, Accuracy: 0.8714
Epoch 181, Loss: 0.003352334484225139, Accuracy: 0.8714
Epoch 182, Loss: 0.0033308669796679167, Accuracy: 0.8714
Epoch 183, Loss: 0.003310680742142722, Accuracy: 0.8712
Epoch 184, Loss: 0.003289305975777097, Accuracy: 0.8714
Epoch 185, Loss: 0.0032678735826630143, Accuracy: 0.8716
Epoch 186, Loss: 0.003249023495009169, Accuracy: 0.8715
Epoch 187, Loss: 0.003228127608308569, Accuracy: 0.8712
Epoch 188, Loss: 0.0032079139054985717, Accuracy: 0.8714
Epoch 189, Loss: 0.0031888568628346547, Accuracy: 0.871
Epoch 190, Loss: 0.0031706344644771888, Accuracy: 0.8711
Epoch 191, Loss: 0.003150515223969705, Accuracy: 0.8713
Epoch 192, Loss: 0.003133080011466518, Accuracy: 0.8714
Epoch 193, Loss: 0.0031122158240759743, Accuracy: 0.8712
Epoch 194, Loss: 0.0030953169090207665, Accuracy: 0.8713
Epoch 195, Loss: 0.003074698029085994, Accuracy: 0.8712
Epoch 196, Loss: 0.003058700527762994, Accuracy: 0.871
Epoch 197, Loss: 0.0030408232571789997, Accuracy: 0.8713
Epoch 198, Loss: 0.0030220175167778507, Accuracy: 0.8713
Epoch 199, Loss: 0.0030064428329933434, Accuracy: 0.8713
Epoch 200, Loss: 0.0029877432656940073, Accuracy: 0.871
Epoch 201, Loss: 0.0029709012963576244, Accuracy: 0.8711
Epoch 202, Loss: 0.002955302106565796, Accuracy: 0.8712
Epoch 203, Loss: 0.002937178130960092, Accuracy: 0.8713
Epoch 204, Loss: 0.0029213359107961878, Accuracy: 0.871
Epoch 205, Loss: 0.002905478093889542, Accuracy: 0.8712
Epoch 206, Loss: 0.002889775988296606, Accuracy: 0.8712
Epoch 207, Loss: 0.002872818151372485, Accuracy: 0.8711
Epoch 208, Loss: 0.0028572181588970126, Accuracy: 0.8714
Epoch 209, Loss: 0.0028422262269305067, Accuracy: 0.8713
Epoch 210, Loss: 0.002825307647581212, Accuracy: 0.8715
Epoch 211, Loss: 0.002811451707384549, Accuracy: 0.8714
Epoch 212, Loss: 0.0027960397017886864, Accuracy: 0.8712
Epoch 213, Loss: 0.002781112891389057, Accuracy: 0.8713
Epoch 214, Loss: 0.00276749431330245, Accuracy: 0.8712
Epoch 215, Loss: 0.002751429205527529, Accuracy: 0.8711
Epoch 216, Loss: 0.0027380616537993775, Accuracy: 0.8712
Epoch 217, Loss: 0.002723842751001939, Accuracy: 0.8713
Epoch 218, Loss: 0.002709630559547804, Accuracy: 0.8713
Epoch 219, Loss: 0.002695045521832071, Accuracy: 0.8714
Epoch 220, Loss: 0.0026809486822457983, Accuracy: 0.871
Epoch 221, Loss: 0.00266716729325708, Accuracy: 0.8709
Epoch 222, Loss: 0.002654139047372155, Accuracy: 0.871
Epoch 223, Loss: 0.002639865797245875, Accuracy: 0.8711
Epoch 224, Loss: 0.0026271248620469124, Accuracy: 0.871
Epoch 225, Loss: 0.002613979387097061, Accuracy: 0.8709
Epoch 226, Loss: 0.00260116049204953, Accuracy: 0.871
Epoch 227, Loss: 0.0025885800941614434, Accuracy: 0.871
Epoch 228, Loss: 0.0025751701486296952, Accuracy: 0.871
Epoch 229, Loss: 0.0025622550456319004, Accuracy: 0.8709
Epoch 230, Loss: 0.002550723711028695, Accuracy: 0.871
Epoch 231, Loss: 0.002537417007260956, Accuracy: 0.8712
Epoch 232, Loss: 0.0025241345341783018, Accuracy: 0.871
Epoch 233, Loss: 0.0025120416458230465, Accuracy: 0.8712
Epoch 234, Loss: 0.002500651905138511, Accuracy: 0.871
Epoch 235, Loss: 0.0024891715066041797, Accuracy: 0.8712
Epoch 236, Loss: 0.0024773248785641046, Accuracy: 0.8713
Epoch 237, Loss: 0.0024639483547070993, Accuracy: 0.8711
Epoch 238, Loss: 0.002453864697017707, Accuracy: 0.8713
Epoch 239, Loss: 0.002442161334911361, Accuracy: 0.8713
Epoch 240, Loss: 0.00243002197064925, Accuracy: 0.8714
Epoch 241, Loss: 0.002419068419840187, Accuracy: 0.8712
Epoch 242, Loss: 0.002407979649724439, Accuracy: 0.8712
Epoch 243, Loss: 0.0023968364484608174, Accuracy: 0.8711
Epoch 244, Loss: 0.0023848529922543093, Accuracy: 0.8712
Epoch 245, Loss: 0.002374072760867421, Accuracy: 0.871
Epoch 246, Loss: 0.002364567670156248, Accuracy: 0.8709
Epoch 247, Loss: 0.0023526177264284343, Accuracy: 0.8708
Epoch 248, Loss: 0.002342516577336937, Accuracy: 0.8709
Epoch 249, Loss: 0.002331469123600982, Accuracy: 0.871
Epoch 250, Loss: 0.0023213181976461782, Accuracy: 0.8709
Epoch 251, Loss: 0.0023112335213227196, Accuracy: 0.8709
Epoch 252, Loss: 0.0023000263603171335, Accuracy: 0.8709
Epoch 253, Loss: 0.002289841834572144, Accuracy: 0.871
Epoch 254, Loss: 0.00228015550179407, Accuracy: 0.871
Epoch 255, Loss: 0.0022699701314559204, Accuracy: 0.8711
Epoch 256, Loss: 0.0022601132368436083, Accuracy: 0.871
Epoch 257, Loss: 0.0022504444426158445, Accuracy: 0.8708
Epoch 258, Loss: 0.0022413146717008203, Accuracy: 0.8709
Epoch 259, Loss: 0.002230835007503629, Accuracy: 0.8709
Epoch 260, Loss: 0.0022212238347856326, Accuracy: 0.8709
Epoch 261, Loss: 0.00221187682938762, Accuracy: 0.8709
Epoch 262, Loss: 0.002202078797854483, Accuracy: 0.8708
Epoch 263, Loss: 0.002193367148283869, Accuracy: 0.8709
Epoch 264, Loss: 0.002183799280319363, Accuracy: 0.871
Epoch 265, Loss: 0.002174963945435593, Accuracy: 0.8709
Epoch 266, Loss: 0.002165203649783507, Accuracy: 0.871
Epoch 267, Loss: 0.0021560277676326224, Accuracy: 0.8711
Epoch 268, Loss: 0.0021474329760530963, Accuracy: 0.8711
Epoch 269, Loss: 0.002138752254541032, Accuracy: 0.8712
Epoch 270, Loss: 0.002129816602100618, Accuracy: 0.871
Epoch 271, Loss: 0.0021205379627645014, Accuracy: 0.8709
Epoch 272, Loss: 0.0021121595607837664, Accuracy: 0.8712
Epoch 273, Loss: 0.0021040490386076273, Accuracy: 0.871
Epoch 274, Loss: 0.002094379081390798, Accuracy: 0.871
Epoch 275, Loss: 0.0020858890347881243, Accuracy: 0.871
Epoch 276, Loss: 0.0020783076633233578, Accuracy: 0.8712
Epoch 277, Loss: 0.0020697774388827384, Accuracy: 0.8713
Epoch 278, Loss: 0.00206155615975149, Accuracy: 0.8712
Epoch 279, Loss: 0.0020534948847489433, Accuracy: 0.8712
Epoch 280, Loss: 0.002044400015147403, Accuracy: 0.8712
Epoch 281, Loss: 0.0020370573655236514, Accuracy: 0.871
Epoch 282, Loss: 0.002028559994651005, Accuracy: 0.8712
Epoch 283, Loss: 0.002021205201162957, Accuracy: 0.8711
Epoch 284, Loss: 0.0020127742271870376, Accuracy: 0.871
Epoch 285, Loss: 0.002004965897940565, Accuracy: 0.8712
Epoch 286, Loss: 0.0019970442206249574, Accuracy: 0.871
Epoch 287, Loss: 0.0019892251683631913, Accuracy: 0.8711
Epoch 288, Loss: 0.0019814619119279085, Accuracy: 0.8711
Epoch 289, Loss: 0.0019742965916520914, Accuracy: 0.8711
Epoch 290, Loss: 0.0019663532212143764, Accuracy: 0.871
Epoch 291, Loss: 0.0019586466933833434, Accuracy: 0.8711
Epoch 292, Loss: 0.0019517541368259117, Accuracy: 0.8712
Epoch 293, Loss: 0.001944709583185613, Accuracy: 0.8712
Epoch 294, Loss: 0.0019366715245996602, Accuracy: 0.8711
Epoch 295, Loss: 0.0019297550624469296, Accuracy: 0.8713
Epoch 296, Loss: 0.0019225001736776902, Accuracy: 0.8712
Epoch 297, Loss: 0.0019149558906792663, Accuracy: 0.8713
Epoch 298, Loss: 0.001907810341217555, Accuracy: 0.8713
Epoch 299, Loss: 0.0019006688188528643, Accuracy: 0.8713
Epoch 300, Loss: 0.0018933655810542405, Accuracy: 0.8711
Epoch 301, Loss: 0.0018862318247556686, Accuracy: 0.8713
Epoch 302, Loss: 0.0018795065802987665, Accuracy: 0.8713
Epoch 303, Loss: 0.0018727180315181613, Accuracy: 0.8712
Epoch 304, Loss: 0.0018659405471407808, Accuracy: 0.8713
Epoch 305, Loss: 0.0018592538722441532, Accuracy: 0.8713
Epoch 306, Loss: 0.0018523339219973423, Accuracy: 0.871
Epoch 307, Loss: 0.0018459917162545025, Accuracy: 0.8712
Epoch 308, Loss: 0.0018391731262090616, Accuracy: 0.8714
Epoch 309, Loss: 0.0018326091690687462, Accuracy: 0.871
Epoch 310, Loss: 0.001825822796090506, Accuracy: 0.8713
Epoch 311, Loss: 0.0018191748426761478, Accuracy: 0.8711
Epoch 312, Loss: 0.0018127130664652214, Accuracy: 0.8711
Epoch 313, Loss: 0.0018068717484129593, Accuracy: 0.8711
Epoch 314, Loss: 0.0018000303907319904, Accuracy: 0.8711
Epoch 315, Loss: 0.0017939521314110608, Accuracy: 0.871
Epoch 316, Loss: 0.001787262193101924, Accuracy: 0.8709
Epoch 317, Loss: 0.001781275784887839, Accuracy: 0.8711
Epoch 318, Loss: 0.0017752518854103983, Accuracy: 0.8711
Epoch 319, Loss: 0.0017690376954851673, Accuracy: 0.871
Epoch 320, Loss: 0.0017626609787112103, Accuracy: 0.8712
Epoch 321, Loss: 0.0017566242528846488, Accuracy: 0.8711
Epoch 322, Loss: 0.0017507885175291448, Accuracy: 0.871
Epoch 323, Loss: 0.0017445094406139106, Accuracy: 0.8711
Epoch 324, Loss: 0.001738639596151188, Accuracy: 0.8712
Epoch 325, Loss: 0.0017330055718775838, Accuracy: 0.8711
Epoch 326, Loss: 0.0017273072953685187, Accuracy: 0.8712
Epoch 327, Loss: 0.0017207844328368082, Accuracy: 0.8713
Epoch 328, Loss: 0.0017152234722743742, Accuracy: 0.8712
Epoch 329, Loss: 0.0017093861455214209, Accuracy: 0.8712
Epoch 330, Loss: 0.001703717683267314, Accuracy: 0.8711
Epoch 331, Loss: 0.0016980954818427562, Accuracy: 0.8711
Epoch 332, Loss: 0.0016924311791080982, Accuracy: 0.871
Epoch 333, Loss: 0.0016867760970490054, Accuracy: 0.871
Epoch 334, Loss: 0.0016811643372057006, Accuracy: 0.8711
Epoch 335, Loss: 0.001675440715334844, Accuracy: 0.871
Epoch 336, Loss: 0.00167006987525383, Accuracy: 0.8709
Epoch 337, Loss: 0.001664695209765341, Accuracy: 0.8711
Epoch 338, Loss: 0.0016590836699469946, Accuracy: 0.8709
Epoch 339, Loss: 0.0016538941190810874, Accuracy: 0.8711
Epoch 340, Loss: 0.0016486171932774596, Accuracy: 0.8711
Epoch 341, Loss: 0.0016430574402329513, Accuracy: 0.8711
Epoch 342, Loss: 0.0016376413294347002, Accuracy: 0.8712
Epoch 343, Loss: 0.0016324285630253143, Accuracy: 0.8711
Epoch 344, Loss: 0.0016271665808744729, Accuracy: 0.871
Epoch 345, Loss: 0.0016221266047796235, Accuracy: 0.871
Epoch 346, Loss: 0.0016166865325067193, Accuracy: 0.8711
Epoch 347, Loss: 0.001611373133782763, Accuracy: 0.8712
Epoch 348, Loss: 0.0016063527099322529, Accuracy: 0.8711
Epoch 349, Loss: 0.0016020132141420617, Accuracy: 0.8711
Epoch 350, Loss: 0.0015968477178830653, Accuracy: 0.8711
Epoch 351, Loss: 0.001591335761186201, Accuracy: 0.8711
Epoch 352, Loss: 0.001586663743655663, Accuracy: 0.8711
Epoch 353, Loss: 0.0015814493678044529, Accuracy: 0.8711
Epoch 354, Loss: 0.001576767333317548, Accuracy: 0.8711
Epoch 355, Loss: 0.001571656679152511, Accuracy: 0.8711
Epoch 356, Loss: 0.0015664439179818145, Accuracy: 0.8712
Epoch 357, Loss: 0.0015619764442089945, Accuracy: 0.8712
Epoch 358, Loss: 0.0015573275077622383, Accuracy: 0.871
Epoch 359, Loss: 0.0015525633713696153, Accuracy: 0.8711
Epoch 360, Loss: 0.0015474984649335966, Accuracy: 0.8712
Epoch 361, Loss: 0.001542620529071428, Accuracy: 0.8712
Epoch 362, Loss: 0.0015381045304820873, Accuracy: 0.8712
Epoch 363, Loss: 0.001533472722803708, Accuracy: 0.8713
Epoch 364, Loss: 0.0015287217087461614, Accuracy: 0.8711
Epoch 365, Loss: 0.0015244885924039408, Accuracy: 0.8712
Epoch 366, Loss: 0.0015197245820309036, Accuracy: 0.8713
Epoch 367, Loss: 0.0015152887077420018, Accuracy: 0.8712
Epoch 368, Loss: 0.0015105630664038472, Accuracy: 0.8713
Epoch 369, Loss: 0.0015060471434844658, Accuracy: 0.8713
Epoch 370, Loss: 0.0015017509550671092, Accuracy: 0.8713
Epoch 371, Loss: 0.0014970530266873539, Accuracy: 0.8713
Epoch 372, Loss: 0.0014926615337026306, Accuracy: 0.8713
Epoch 373, Loss: 0.0014883424041909166, Accuracy: 0.8713
Epoch 374, Loss: 0.0014837359666125848, Accuracy: 0.8713
Epoch 375, Loss: 0.0014795465845963917, Accuracy: 0.8712
Epoch 376, Loss: 0.0014751584330224433, Accuracy: 0.8712
Epoch 377, Loss: 0.0014710961113451049, Accuracy: 0.8712
Epoch 378, Loss: 0.0014666162710636854, Accuracy: 0.8713
Epoch 379, Loss: 0.0014625303848879413, Accuracy: 0.8713
Epoch 380, Loss: 0.0014581719791749493, Accuracy: 0.871
Epoch 381, Loss: 0.001453880798071623, Accuracy: 0.8712
Epoch 382, Loss: 0.001449954669806175, Accuracy: 0.871
Epoch 383, Loss: 0.0014454539271537214, Accuracy: 0.871
Epoch 384, Loss: 0.00144140433287248, Accuracy: 0.871
Epoch 385, Loss: 0.0014372784172883258, Accuracy: 0.8712
Epoch 386, Loss: 0.0014331030694302172, Accuracy: 0.8713
Epoch 387, Loss: 0.0014291053081979044, Accuracy: 0.8711
Epoch 388, Loss: 0.0014250443107448518, Accuracy: 0.8711
Epoch 389, Loss: 0.001421023474249523, Accuracy: 0.8713
Epoch 390, Loss: 0.001416933377913665, Accuracy: 0.8713
Epoch 391, Loss: 0.0014130184188252315, Accuracy: 0.871
Epoch 392, Loss: 0.0014092936442466452, Accuracy: 0.8713
Epoch 393, Loss: 0.001405337260221131, Accuracy: 0.8711
Epoch 394, Loss: 0.001401318472926505, Accuracy: 0.8709
Epoch 395, Loss: 0.0013972655023098924, Accuracy: 0.871
Epoch 396, Loss: 0.0013935671478975565, Accuracy: 0.8711
Epoch 397, Loss: 0.0013894930999958888, Accuracy: 0.8711
Epoch 398, Loss: 0.0013857105001807213, Accuracy: 0.8711
Epoch 399, Loss: 0.0013816894940100611, Accuracy: 0.8709
plt.plot(costs, label="Cost")
plt.title("Cost")
plt.show()
plt.plot(accuracies, label="Accuracy")
plt.title("Accuracy")
plt.show()

We can see that the accuracy plateaus after a certain point even though costs seems to keep going down.

# Training loop
training_data = training_data[:1000]
evaluation_data = test_data

n = len(training_data)
epochs = 100
mini_batch_size = 10
alpha = 0.5
lambd = 0.1
model = VanillaMNIST()

costs = []
accuracies = []

# Experiment: comparing regularized and non-regularized models
for j in range(epochs):
    random.shuffle(training_data)
    mini_batches = [
        training_data[k : k + mini_batch_size] for k in range(0, n, mini_batch_size)
    ]
    epoch_loss = 0
    for mini_batch in mini_batches:
        total_loss = torch.tensor([0.0])
        for x, y in mini_batch:
            x = torch.tensor(x).squeeze(1)
            y = torch.tensor(y).squeeze(1)

            # Calculate log loss and add to loss
            y_hat = model(x)
            loss = y * torch.log(y_hat) + (1 - y) * torch.log(1 - y_hat)
            total_loss += sum(loss)

        m = len(mini_batch)
        l2_reg = sum((p**2).sum() for p in model.parameters())
        log_loss = -(1 / m) * total_loss + (lambd / (2 * m)) * l2_reg
        model.zero_grad()
        log_loss.backward()
        for p in model.parameters():
            p.data = p.data - alpha * p.grad
        epoch_loss += log_loss.item()

    # Evaluate model on evaluation_data
    correct = 0
    for x, y in evaluation_data:
        x = torch.tensor(x).squeeze(1)
        y_hat = model(x)
        if torch.argmax(y_hat) == y:
            correct += 1

    costs.append(epoch_loss / len(mini_batches))
    accuracies.append(correct / len(evaluation_data))
    print(
        f"Epoch {j}, Loss: {epoch_loss / len(mini_batches)}, Accuracy: {correct/len(evaluation_data)}"
    )
Epoch 0, Loss: 2.811732646226883, Accuracy: 0.688
Epoch 1, Loss: 2.098552111387253, Accuracy: 0.7897
Epoch 2, Loss: 1.9765801417827606, Accuracy: 0.806
Epoch 3, Loss: 1.9365905106067658, Accuracy: 0.8352
Epoch 4, Loss: 1.9170798242092133, Accuracy: 0.8167
Epoch 5, Loss: 1.910514018535614, Accuracy: 0.82
Epoch 6, Loss: 1.8914796423912048, Accuracy: 0.8404
Epoch 7, Loss: 1.8967450559139252, Accuracy: 0.8208
Epoch 8, Loss: 1.8929740846157075, Accuracy: 0.7972
Epoch 9, Loss: 1.884774169921875, Accuracy: 0.8036
Epoch 10, Loss: 1.8786686623096467, Accuracy: 0.8362
Epoch 11, Loss: 1.8796998131275178, Accuracy: 0.8405
Epoch 12, Loss: 1.8953367912769317, Accuracy: 0.7959
Epoch 13, Loss: 1.8880756640434264, Accuracy: 0.8247
Epoch 14, Loss: 1.8752827227115632, Accuracy: 0.7868
Epoch 15, Loss: 1.8846649825572968, Accuracy: 0.8347
Epoch 16, Loss: 1.8643178629875183, Accuracy: 0.8252
Epoch 17, Loss: 1.8734481954574584, Accuracy: 0.8158
Epoch 18, Loss: 1.882199420928955, Accuracy: 0.8237
Epoch 19, Loss: 1.8695294904708861, Accuracy: 0.8312
Epoch 20, Loss: 1.8701487267017365, Accuracy: 0.7798
Epoch 21, Loss: 1.8746571886539458, Accuracy: 0.8275
Epoch 22, Loss: 1.8737100040912629, Accuracy: 0.8015
Epoch 23, Loss: 1.8749737966060638, Accuracy: 0.7958
Epoch 24, Loss: 1.8804458105564117, Accuracy: 0.7597
Epoch 25, Loss: 1.8606840109825133, Accuracy: 0.8365
Epoch 26, Loss: 1.8901359486579894, Accuracy: 0.8173
Epoch 27, Loss: 1.8789975214004517, Accuracy: 0.8171
Epoch 28, Loss: 1.8712555146217347, Accuracy: 0.8295
Epoch 29, Loss: 1.8660310840606689, Accuracy: 0.8339
Epoch 30, Loss: 1.8687487435340882, Accuracy: 0.8134
Epoch 31, Loss: 1.8768685281276702, Accuracy: 0.85
Epoch 32, Loss: 1.868555223941803, Accuracy: 0.8298
Epoch 33, Loss: 1.8685732638835908, Accuracy: 0.8234
Epoch 34, Loss: 1.8782653164863587, Accuracy: 0.8134
Epoch 35, Loss: 1.8778847789764403, Accuracy: 0.7236
Epoch 36, Loss: 1.862051661014557, Accuracy: 0.7965
Epoch 37, Loss: 1.8617102301120758, Accuracy: 0.8355
Epoch 38, Loss: 1.859102840423584, Accuracy: 0.8115
Epoch 39, Loss: 1.8760147869586945, Accuracy: 0.8351
Epoch 40, Loss: 1.8777650582790375, Accuracy: 0.8285
Epoch 41, Loss: 1.8769587755203248, Accuracy: 0.8204
Epoch 42, Loss: 1.887664693593979, Accuracy: 0.8441
Epoch 43, Loss: 1.894026916027069, Accuracy: 0.7511
Epoch 44, Loss: 1.8941694283485413, Accuracy: 0.842
Epoch 45, Loss: 1.8728662788867951, Accuracy: 0.8325
Epoch 46, Loss: 1.88304829955101, Accuracy: 0.8531
Epoch 47, Loss: 1.8651540327072142, Accuracy: 0.816
Epoch 48, Loss: 1.868752690553665, Accuracy: 0.8364
Epoch 49, Loss: 1.8535161626338958, Accuracy: 0.839
Epoch 50, Loss: 1.8771829319000244, Accuracy: 0.8512
Epoch 51, Loss: 1.8845791721343994, Accuracy: 0.7985
Epoch 52, Loss: 1.8726356601715088, Accuracy: 0.8214
Epoch 53, Loss: 1.8638122189044952, Accuracy: 0.8263
Epoch 54, Loss: 1.8748895335197449, Accuracy: 0.8187
Epoch 55, Loss: 1.8802372539043426, Accuracy: 0.8083
Epoch 56, Loss: 1.8966482651233674, Accuracy: 0.83
Epoch 57, Loss: 1.8658703136444093, Accuracy: 0.8292
Epoch 58, Loss: 1.8631226658821105, Accuracy: 0.8379
Epoch 59, Loss: 1.8755247700214386, Accuracy: 0.849
Epoch 60, Loss: 1.8732005119323731, Accuracy: 0.7769
Epoch 61, Loss: 1.8729154205322265, Accuracy: 0.8411
Epoch 62, Loss: 1.8653748643398285, Accuracy: 0.8209
Epoch 63, Loss: 1.8790792655944824, Accuracy: 0.8025
Epoch 64, Loss: 1.8639093244075775, Accuracy: 0.829
Epoch 65, Loss: 1.874306182861328, Accuracy: 0.7657
Epoch 66, Loss: 1.873477020263672, Accuracy: 0.8424
Epoch 67, Loss: 1.8652974033355714, Accuracy: 0.8212
Epoch 68, Loss: 1.8774346148967742, Accuracy: 0.8436
Epoch 69, Loss: 1.8679407298564912, Accuracy: 0.8155
Epoch 70, Loss: 1.8858467376232146, Accuracy: 0.8535
Epoch 71, Loss: 1.8650659394264222, Accuracy: 0.8127
Epoch 72, Loss: 1.889530326128006, Accuracy: 0.8398
Epoch 73, Loss: 1.8544638645648956, Accuracy: 0.8284
Epoch 74, Loss: 1.8877349662780762, Accuracy: 0.8206
Epoch 75, Loss: 1.8774409627914428, Accuracy: 0.835
Epoch 76, Loss: 1.8813735735416413, Accuracy: 0.8131
Epoch 77, Loss: 1.892322566509247, Accuracy: 0.8273
Epoch 78, Loss: 1.8682674264907837, Accuracy: 0.8324
Epoch 79, Loss: 1.8690711200237273, Accuracy: 0.845
Epoch 80, Loss: 1.877561490535736, Accuracy: 0.8352
Epoch 81, Loss: 1.8552240777015685, Accuracy: 0.8346
Epoch 82, Loss: 1.8648957848548888, Accuracy: 0.8352
Epoch 83, Loss: 1.864109970331192, Accuracy: 0.843
Epoch 84, Loss: 1.8665581059455871, Accuracy: 0.8332
Epoch 85, Loss: 1.859978528022766, Accuracy: 0.8399
Epoch 86, Loss: 1.8604642963409423, Accuracy: 0.7741
Epoch 87, Loss: 1.8703574883937835, Accuracy: 0.8261
Epoch 88, Loss: 1.8701434445381164, Accuracy: 0.8374
Epoch 89, Loss: 1.8783128380775451, Accuracy: 0.8478
Epoch 90, Loss: 1.8842784976959228, Accuracy: 0.8366
Epoch 91, Loss: 1.864534913301468, Accuracy: 0.7543
Epoch 92, Loss: 1.889243975877762, Accuracy: 0.8016
Epoch 93, Loss: 1.8540599608421326, Accuracy: 0.8119
Epoch 94, Loss: 1.8750174713134766, Accuracy: 0.8096
Epoch 95, Loss: 1.850728349685669, Accuracy: 0.8362
Epoch 96, Loss: 1.8600666892528535, Accuracy: 0.8332
Epoch 97, Loss: 1.870409198999405, Accuracy: 0.8343
Epoch 98, Loss: 1.8518181240558624, Accuracy: 0.8012
Epoch 99, Loss: 1.8821170127391815, Accuracy: 0.8406
plt.plot(costs, label="Cost")
plt.title("Cost")
plt.show()
plt.plot(accuracies, label="Accuracy")
plt.title("Accuracy")
plt.show()

Regularization doesn’t seem to be working. Need to figure out what’s going wrong.