References#
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724–1734. Doha, Qatar, October 2014. Association for Computational Linguistics. URL: https://aclanthology.org/D14-1179/, doi:10.3115/v1/D14-1179.
J. L. Elman. Finding structure in time. Cognitive Science, 14:213–252, 1990.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 11 1997. URL: https://doi.org/10.1162/neco.1997.9.8.1735, arXiv:https://direct.mit.edu/neco/article-pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf, doi:10.1162/neco.1997.9.8.1735.
Warren Mcculloch and Walter Pitts. A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:127–147, 1943.
M. Minsky and S. Papert. Perceptrons. Cambridge, MA: MIT Press, 1969.
Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Reviews, 65(6):386–408, November 1958. URL: http://www.ncbi.nlm.nih.gov/pubmed/13602029.
David Rumelhart, James McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press, Cambridge, MA, 1986.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, 6000–6010. Red Hook, NY, USA, 2017. Curran Associates Inc.