Дипломная работа на тему "ТЮМГУ | Разработка программного обеспечения транскрипции барабанов на основе сверточных нейронных сетей"

Работа на тему: Разработка программного обеспечения транскрипции барабанов на основе сверточных нейронных сетей
Оценка: хорошо.
Оригинальность работы на момент публикации 50+% на антиплагиат.ру.
Ниже прилагаю все данные для покупки.
https://studentu24.ru/list/suppliers/Anastasiya1---1326

Описание работы

МИНИСТЕРСТВО НАУКИ И ВЫСШЕГО ОБРАЗОВАНИЯ РОССИЙСКОЙ ФЕДЕРАЦИИ
Федеральное государственное автономное образовательное учреждение высшего образования
«ТЮМЕНСКИЙ ГОСУДАРСТВЕННЫЙ УНИВЕРСИТЕТ»
ШКОЛА ПЕРСПЕКТИВНЫХ ИССЛЕДОВАНИЙ (SAS)

ВЫПУСКНАЯ КВАЛИФИКАЦИОННАЯ РАБОТА
бакалаврская работа
DEVELOPMENT OF DRUM TRANSCRIPTION SOFTWARE BASED ON CONVOLUTION NEURAL NETWORKS / РАЗРАБОТКА ПРОГРАММНОГО ОБЕСПЕЧЕНИЯ ТРАНСКРИПЦИИ БАРАБАНОВ НА ОСНОВЕ СВЕРТОЧНЫХ НЕЙРОННЫХ СЕТЕЙ

09.03.03 Прикладная информатика
Профиль «Информационные технологии и системный анализ»

Тюмень 2023

MINISTRY OF SCIENCE AND HIGHER EDUCATION OF RUSSIAN FEDERATION
Federal Autonomous Educational Institution of Higher Professional Education
«UNIVERSITY OF TYUMEN»

RECOMMENDED FOR А

UNDERGRADUATE THESIS
DEVELOPMENT OF DRUM TRANSCRIPTION SOFTWARE BASED ON CONVOLUTIONAL NEURAL NETWORKS /РАЗРАБОТКА ПРОГРАММНОГО ОБЕСПЕЧЕНИЯ ТРАНСКРИПЦИИ БАРАБАНОВ НА ОСНОВЕ СВЕРТОЧНЫХ НЕЙРОННЫХ СЕТЕЙ

09.03.03 Applied informatics
Major «Information Technologies and Systems Analysis»

Tyumen 2023

TABLE OF CONTENTS
LIST OF ABBREVIATIONS 4
INTRODUCTION 5
PREMISE OF THE WORK 7
CHAPTER 1. DATASET 8
1.1. AVAILABLE DATASETS 8
1.2. SONG SELECTION 10
1.3. FREQUENCY ANALYSIS 13
1.4. FEATURE SELECTION 22
1.5. SOURCE SEPARATION 24
1.6. PREPROCESSING & LABELING 25
CHAPTER 2. MODEL 30
2.1. CONVOLUTIONAL NEURAL NETWORKS 30
2.2. MODEL ARCHITECTURE 33
2.3. HYPERPARAMETERS 35
2.4. ITERATIVE LABELING 37
2.5. RESULTS & PERFORMANCE 38
CHAPTER 3. SOFTWARE DESIGN 42
3.1. SURVEY CASE STUDY 42
3.2. BENCHMARKING 46
3.3. APPLICATION IDEOLOGY 50
3.4. UX/UI 52
CHAPTER 4. FUTURE ENDEAVORS 54
4.1. ALTERNATIVES & LIMITATIONS 54
4.2. FUTURE IMPROVEMENTS 56
CONCLUSION 57
BIBLIOGRAPHY 58

LIST OF ABBREVIATIONS
AI – Artificial Intelligence
ADT – Automatic Drum Transcription dB – Decibel
CNN – Convolutional Neural Network CPU – Central Processing Unit
CRNN – Convolutional-Recurrent Neural Network DAW – Digital Audio Workstation
DNN – Deep Neural Network
GeMM – General Matrix Multiply GPU – Graphics Processing Unit Hz – Hertz
MFCC – Mel-Frequency Cepstral Coefficients MIDI – Musical Instrument Digital Interface MIR – Music Information Retrieval
RNN – Recurrent Neural Network ReLU – Rectified Linear Unit
STFT – Short-Time Fourier Transform UI – User Interface
UX – User Experience ViT – Vision Transformer ZCR – Zero Crossing Rate

INTRODUCTION
Machine learning and AI automation are slowly but surely integrating into our daily lives, and this transformation is visible now more than ever. Take for instance ChatGPT, [Training language models to follow instructions with human feedback, 2022] the generative pre-trained transformer model, that has shaken up the academic world and placed it firmly on its head. It can write articles, blog posts, summaries, simulate a job interview, write a program, give advice, and converse with you almost like a human being would. Consequently, the biggest question from this ordeal that arises is: what sort of skills do we need to cultivate when technological advancements devalue skills and talents useful in the past? Do I need to learn how to program, or can I use code snippets the chatbot gives me? Do I still need to be a skilled writer, or could I delegate some aspects of that to an AI? Similarly, the job of creating music scores from performances comes under scrutiny, and this is the focus of my project: generating drum notation from raw audio data and utilizing a Convolutional Neural Network (CNN) model and critical thought as the main cruxes for teaching beginners.
Time and time again you will find that it is incredibly tempting to become complacent with new technology such as this: you might just use the code a generative model offered without reviewing it first, leading to severe security issues; you might take some information it gives for granted which can lead you on the wrong path; you might develop bad practices just by blindly following orders. Other times you think to yourself, “I can do way better than that,” taking the core premises of the outputs and restructuring them in different formations. AI models have the potential to give you some of the essentials, leaving you to take apart these essentials and put them back together in a way that makes sense for you. The main point I try to rectify is that it remains quintessential to maintain critical thought when utilizing AI-driven software. Therefore, my philosophy for developing the project I named DrummerScore is that while it does not have to be perfect, it does have to work well enough to provide the user with drum scores they could then critically reconsider on their own.
In Chapter 1, I explain the need to create my own dataset and give a brief introduction to frequency analysis. Next, we evaluate which spectral and temporal features would be most beneficial for our task of CNN drum sound identification. After choosing the features, we decide on the preprocessing & labeling techniques to act as input for our model. Chapter 2 focuses on the methodology and architecture used in creating the CNN model. Subsequently, I explain the process of iterative labeling, and finally the performance of the resulting model. Chapter 3 delves into the software design aspect, covering a series of surveys taken by drummers in a local drumming school to learn how they create drum scores. Additionally, software benchmarking will be performed to see whether similar solutions exist to analyze their features, such as audio visualization. Based on the results, I formulate an application ideology and a potential UX/UI mockup. Finally, in Chapter 4, future improvements, alternatives, and limitations of the project are discussed. The conclusion summarizes the findings and insights gained from the research.
The significance of my project is that it is among the first to attempt to take a trained Automatic Drum Transcription (ADT) neural network into an easily accessible software for drummers of any skill level. Another aspect that will be talked about is audio visualization methods and how in current scoring software they are very limited, something DrummerScore also aims to alleviate. I believe remedying these aspects and adding the additional component of neural network assisted scoring will both enhance the software’s teaching capacity and its accessibility.

1. Bakhuis D. pigeonXT - Quickly annotate data in Jupyter Lab [Website] / D. Bakhuis.
2. Callender L. Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset / L. Callender, C. Hawthorne, J. Engel. 2020. P. 1-11.
3. Casadesus J. Tux Guitar [Website] / J. Casadesus.
4. Chabaud M. Automatic drum transcription [Website] / M. Chabaud.
5. Cough Recognition Based on Mel-Spectrogram and Convolutional Neural Network / Q. Zhou [et al.] // Frontiers in Robotics and AI. 2021. Vol. 8. P. 1-7.
6. Defossez A. Hybrid Spectrogram and Waveform Source Separation / A. Defossez. 2021. P. 1-11.
7. Dittmar C. Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition. / C. Dittmar, D. Gartner // DAFx. 2014. P. 187-194.
8. Drumstik [Website].
9. Gajhede N. Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples / N. Gajhede, O. Beck,
H. Purwins // Proceedings of the Audio Mostly 2016 AM ’16: Audio Mostly 2016. Norrkoping Sweden: ACM, 2016. P. 111-115.
10. Germanidis A. pigeon - Quickly annotate data on Jupyter [Website] / A. Germanidis.
11. Giannakopoulos T. Audio Features / T. Giannakopoulos, A. Pikrakis // Introduction to Audio Analysis. Elsevier, 2014. P. 59-103.
12. GPU Computing Revolution: CUDA / R.S. Dehal [et al.] // 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). Greater Noida (UP), India: IEEE, 2018. GPU Computing Revolution. P. 197-201.
13. Groove Scribe [Website].
14. Holz T. Automatic drum transcription with deep neural networks / T. Holz. 2022. P. 1-63.
15. Jones J. Gear Review: Groove Scribe – The Free Transcription Tool That Will Change Your Life [Website] / J. Jones.
16. Kay A. Introduction and Review of Statistics / A. Kay // Operational Amplifier Noise. Elsevier, 2012. P. 1-11.
17. Kingma D.P. Adam: A Method for Stochastic Optimization. Adam / D.P. Kingma, J. Ba arXiv:1412.6980 [cs]. arXiv, 2017. P. 1-15.
18. Kuhn M. Applied Predictive Modeling / M. Kuhn, K. Johnson. New York, NY: Springer New York, 2013. P. 419.
19. Kumar A. Difference: Binary, Multiclass & Multi-label Classification - Data Analytics [Website] / A. Kumar.
20. Learning to Groove with Inverse Sequence Transformations / J. Gillick [et al.] arXiv:1905.06118 [cs, eess, stat]. arXiv, 2019. P. 1-11.
21. librosa: Audio and Music Signal Analysis in Python / B. McFee [et al.] // Python in Science Conference. Austin, Texas, 2015. librosa. P. 18-24.
22. Lopez-Poveda E.A. Development of Fundamental Aspects of Human Auditory Perception / E.A. Lopez-Poveda // Development of Auditory and Vestibular Systems. Elsevier, 2014. P. 287-314.
23. Lunaverus. AnthemScore - Automatic Music Transcription Software [Website] / Lunaverus.
24. matplotlib [Website] / T.A. Caswell [et al.].
25. MuseScore [Website].
26. Music type classification by spectral contrast feature / Dan-Ning Jiang [et al.] // Proceedings. IEEE International Conference on Multimedia and Expo IEEE International Conference on Multimedia and Expo (ICME). Lausanne, Switzerland: IEEE, 2002. P. 113-116.
27. O’Shaughnessy D. Speech communication: human and machine: Addison- Wesley series in electrical engineering. Speech communication / D. O’Shaughnessy. Reading, Mass: Addison-Wesley Pub. Co, 1987. P. 1-568.
28. O’Shea K. An Introduction to Convolutional Neural Networks / K. O’Shea, R. Nash arXiv:1511.08458 [cs]. arXiv, 2015. P. 1-11.
29. Pakhomov D. DrummerScore - Automatic Drum Transcription (ADT) [Website]/ D. Pakhomov.
30. Plotly Technologies Inc. Collaborative data science [Website] / Plotly Technologies Inc.
31. PyTorch: An Imperative Style, High-Performance Deep Learning Library / A. Paszke [et al.] // Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019. Vol. 32. PyTorch. P. 1-12.
32. Reid L. Musink [Website] / L. Reid.
33. Roeder L. Netron, Visualizer for neural network, deep learning, and machine learning models [Website] / L. Roeder.
34. Rupali G. Analysis of MFCC and Multitaper MFCC Feature Extraction Methods/ G. Rupali, S.K. Bhatia // International Journal of Computer Applications. 2015. Vol. 131. № 4. P. 7-10.
35. Songsterr [Website].
36. Stevens S.S. A Scale for the Measurement of the Psychological Magnitude Pitch/ S.S. Stevens, J. Volkmann, E.B. Newman // The Journal of the Acoustical Society of America. 1937. Vol. 8. № 3. P. 185-190.
37. Toontrack. Superior Drummer 3 [Website] / Toontrack.
38. Training language models to follow instructions with human feedback / L. Ouyang [et al.] arXiv:2203.02155 [cs]. arXiv, 2022. P. 1-68.
39. Vause R. Daily Drum [Website] / R. Vause.
40. Vogl R. Towards multi-instrument drum transcription / R. Vogl, G. Widmer, P. Knees arXiv:1806.06676 [cs, eess]. arXiv, 2018. P. 1-8.
41. Writing Drums in Musink Pro [Website].
42. Yang J. Spectral contrast enhancement: Algorithms and comparisons / J. Yang,
F.-L. Luo, A. Nehorai // Speech Communication. 2003. Vol. 39. Spectral contrast enhancement. № 1-2. P. 33-46.
43. YoshiMan. Building an Audio Classification Model for Automatic Drum Transcription — Here’s What I Learnt [Website] / YoshiMan.

НЕ НАШЛИ, ЧТО ИСКАЛИ? МОЖЕМ ПОМОЧЬ.

СТАТЬ ЗАКАЗЧИКОМ