Skip to content
View DrStef's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Ottawa, ON, Canada
  • 13:16 (UTC -12:00)

Block or report DrStef

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DrStef/README.md

Machine Learning - Deep Learning Projects

Advanced projects

This section contains Research and Development projects in Machine Learning and Deep Learning that require original developments. They call on our expertise in Digital Signal Processing, Optimization, Calculus, Linear Algebra.


        Automatic Environmental Sound Classification (ESC) leverages the ESC-50 dataset (and its ESC-10 subset) developed by Karol Piczak, as detailed in his paper titled: "ESC: Dataset for Environmental Sound Classification." by Karol J. Piczak. 2015. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). Association for Computing Machinery, New York, NY, USA, 1015–1018. https://doi.org/10.1145/2733373.2806390"

        This dataset serves as a foundation for research in audio event recognition.

        Advancements in ESC Using Multi-Feature CNNs:

        We propose a two-stages classification approach with Multi-feature Convolutional Neural Networks (CNNs), achieving near-perfect accuracy rates, specifically reaching up to 99%. This high accuracy is attributed to innovative pre-processing techniques that combine mel-spectrograms with complex wavelet transforms (CWT).

        Resolution of Remaining Classification Challenges:

        A notable challenge in ESC-10 sound classification was the confusion between "sea waves" and "rain" sounds. This issue was addressed by developing an original transformation of the complex CWT, termed aT-CWT. This transformation replaces the phase component of the CWT for stationary and pseudo-stationary sounds with a Gaussian distribution, enhancing the model's ability to differentiate between similar sounding environmental events.
        By integrating the aT-CWT transformation, the multi-feature CNN model has now achieved 100% accuracy in classifying environmental sounds from the ESC-10 dataset.


          We develop an automatic unsupervised classification model or automatic diagnosis model for detecting failures or breakdowns of industrial machinery based on their acoustics characteristics, recorded with a 8-microphones circular array.

          The model is based on the MIMII dataset made available by Hitachi, Ltd. under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
          https://zenodo.org/records/3384388

          Unlike most classification models found in literature, this study somewhat deviates from the initial challenge's rules: classification of noisy signals. However, since we have access to multiple channels, it makes practical sense to denoise the signals before initiating the classification process. Thus, the challenge here is transforming the 8-microphone array into a "sensor" for monitoring industrial machinery sounds in noisy environments. Then, we apply the classification model to these denoised signals to automatically identify anomalies, failures, or breakdowns.

          Rather than classifying various types of machines (pumps, fans, valves, sliders), our focus will be:

          • Concentrating on a specific machine type: valves.
          • Denoising the recordings using MVDR beamforming combined with a custom, fixed Generalized Sidelobe Canceler (GSC).
          • Applying unsupervised classification techniques (auto-encoder, etc.) to two sets of signals: single microphone recordings and the denoised GSC output, to detect defective valves and demonstrate the benefits of MVDR beamforming combined with GSC.

          Applications
          - Rotating machinery Failure Detection: bearings, motors,rotors.
          - HVAC Fault detection and diagnosis (FDD): pumps, compressors, valves.

          In this project, we are developing effective methods for classifying mitochondrial genomes (DNA sequences) using Digital Signal Processing (DSP), Machine Learning (ML), and Deep Learning (DL). This research is ongoing, and we plan to publish our results regularly. As a starting point, we analyzed the paper titled:
          "ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels" by Gurjit S. Randhawa , Kathleen A. Hill and Lila Kari. https://doi.org/10.1186/s12864-019-5571-y

          The alignment-free DNA sequence classification approach: ML-DSP, proposed by Gurjit S. Randhawa has proven to be very effective.
          By introducing a simple alignment technique alongside short Fast Fourier Transforms (FFTs), termed ML-FFT + SoftAlign, we have surpassed the performance of ML-DSP, particularly with challenging datasets such as those from Fungi and Insects.

        • Deep Learning and Digital Signal Processing: Voice Activity Detection (VAD)

        • Machine Learning and Digital Signal Processing: Sound Source Localization (SSL)



      Standard projects

      This section is a portfolio of Machine Learning projects with Python and various visualization and analysis tools. Most of these projects were carried out within the framework of IBM certifications. They are presented with Jupyter Notebooks.
      Some projects have been improved by incorporating more in-depth data analysis, better graphs, advanced ML techniques.

          In this project, we predict if the Falcon 9 first stage will land successfully. Project includes: SpaceX data collection, Data Wrangling, Webscraping, EDA with SQL Queries & Data visualization, SpaceX Launch Records Dashboard, Launch Sites Locations Analysis with Folium, Machine Learning classification with optimization of hyperparameters and selection of best model: KNN, Decision Tree, SVM, Logistic Regression.
          A widerange of small projects with various ML techniques, prediction, supervised and unsupervised classification: Linear Regression, Polynomial Regression, Non-Linear Regression, Recommandation Systems, KNN, Customer Segmentation with K-Means, Hierarchical Clustering, Density-Based Clustering, Logistic Regression.
          The project consists of finding the best model for predicting home prices in King County, USA in Washington State, based on a dataset of homes sold between May 2014 and May 2015. Prediction accuracy was improved by implementing a spline regression model.
          One Jupyter Notebook includes interactive Folium maps (interactive maps will not display on Github).

          Loan Status Prediction using Supervised Classification Algorithms: KNN, Decision Tree, SVM, Logistic Regression.

      Data Analysis

          Old dataset on housing prices derived from the U.S. Census Service to present insights based on our experience in Statistics. Median value of houses bounded by the Charles river, of owner-occupied units built before 1940, relationship between Nitric oxide concentrations and the proportion of non-retail business acres per town, impact of weighted distance to the five Boston employment centres on the median value of owner-occupied homes.

          Dataset: car dataset including various makes, specifications and prices.
          After cleaning the dataset, running statistics, identifying the most relevant variables, we develop several models that will predict the price of a car using a set of features/variables.

          Word Cloud

          Folium with markers

          Choropleth

        • Databases and SQL for Data Science

        • Stock extraction & vizualisation - yFinance, Webscraping



      Digital Signal Processing


      Modeling and Scientific Computing


          "Figure 8" toroid

          Gyroid

          Truncated cuboctahedron

          Helicoid-Catenoid

        • Linear Algebra problems




      🔭 I’m currently working on advanced projects in ML & DL
      👯 I’m looking to collaborate on Digital Signal Processing, Machine Learning, Deep Learning
      📫 How to reach me: [email protected]

      Popular repositories Loading

      1. Deep-Learning-and-Digital-Signal-Processing-for-Environmental-Sound-Classification Deep-Learning-and-Digital-Signal-Processing-for-Environmental-Sound-Classification Public

        Automatic environmental sound classification (ESC) based on ESC-50 dataset (and ESC-10 subset)

        Jupyter Notebook 8

      2. Complex-3D-surfaces-with-Matplotlib Complex-3D-surfaces-with-Matplotlib Public

        Visualization with Matplotlib: 3D surfaces

        Jupyter Notebook 3

      3. Data-Visualization-with-Python Data-Visualization-with-Python Public

        Data analysis and visualization with Python.

        Jupyter Notebook 2

      4. Applied_Data_Science_Capstone_SpaceX_IBM Applied_Data_Science_Capstone_SpaceX_IBM Public

        Data Science Project: SpaceX Falcon 9 First Stage Landing Prediction.

        Jupyter Notebook 2 2

      5. MIMII-Unsupervised-classification-of-valve-sounds MIMII-Unsupervised-classification-of-valve-sounds Public

        Malfunctioning Industrial Machine Investigation and Inspection

        2

      6. Python_Project_for_Data_Science_IBM Python_Project_for_Data_Science_IBM Public

        Jupyter Notebook 1