About me

My name is Joao Marcos Visotaky Junior

I have a bachelor degree in Computer Science from UNESP (Universidade Estadual Paulista) and work on personal Data Science projects to gain experience in solving business problems.

As the projects below demonstrate, I am able to implement a complete Data Science project (“end-to-end”), from obtaining business requirements to publishing it in the cloud, including creating tools to access the Machine Learning Models for no-tech people.

On my work as a Data Scientist i look to improving the company's decision-making process using Data Science tools.

Skills

Programming Languages and Databases

  • Python with Data Analysis focus.
  • Web Scraping with Python.
  • SQL for data storage and extraction.
  • PySpark and Databricks for Big Data projects.
  • Databases: SQLite, PostgreSQL, MySQL.

Statistics and Machine Learning

  • Descriptive Statistics (location, dispersion, asymmetry, kurtosis and density).
  • Algorithms: Supervised Learning (Regression, Classification and Learn to Rank).
  • Data balancing, attribute selection and dimensionality reduction techniques.
  • Algorithm performance metrics (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, Accumulative, ROC and AUC curves).
  • Machine Learning Stack: Sklearn and Scipy.

Data Visualization

  • Visualization Libraries: Matplotlib, Plotly, Seaborn, Folio.
  • Visualization Tools: Telegram Bot, Streamlit.

Software Engineering

  • Git, Github, Gitlab, Virtual Environment, Cookiecutter.
  • Streamlit API, Flask and Python API’s.
  • MLOps: DVC, Heroku, AWS (S3, EC2, Lambda Function), Databricks.

Professional Experience

April/2022: Data Scientist at KPMG Brazil

Building data solutions to large brazilian companies on the Industrials sector.

Oct/21 - Mar/22: 03 Complete Data Science Projects

Construction of data solutions for business problems using public data from Data Science competitions, where I approached the problem from the conception of the business challenge to the publication of the Machine Learning model using Cloud Computing tools.

2013-2021: Judicial Clerk

Worked in Accounting and Court Cases Distribution. Responsible for the preparation of judicial calculations.

2013: Trading System optimization using genetic algorithm

Graduation Project for the Computer Science degree. In this work I analyzed the profitability of a genetic algorithm dedicated to buying and selling brazilian largest oil company (Petrobras) shares.

Data Science Projects

Exploratory Data Analysis of Real Estate Data

In this project I used data analysis tools to identify which properties were below market price.

Tools used:

  • Git and Github
  • Python, Pandas, Matplotlib, Numpy and Seaborn.
  • Jupyter Notebook and VS Code.
  • Interactive maps with Plotly and Folium.
  • Heroku Cloud.
  • Streamlit Python Framework.

Sales Forecast of a Drugstore Chain

In this project I used Regression Machine Learning algorithms to build sales estimates for the next 06 weeks. Estimates were made for each store. In addition, I created a bot on Telegram that allows accessing forecasts for each store.

Tools used:

  • Git and Github
  • Python, Pandas, Matplotlib, Numpy and Seaborn.
  • Jupyter Notebook and VS Code.
  • Boruta Feature Selector.
  • Linear Regression, Random Forest and XGBoost.
  • Hyperparameter Fine Tuning.
  • Model Deployment on Heroku Cloud.
  • Telegram Bot to access forecasts.

Optimizing the Marketing Budget with Machine Learning

Detect the customers of a health insurer who are most likely to buy another insurance (cross-sell)

Tools Used:

  • Git, Gitlab and Github
  • Python, Pandas, Matplotlib, Numpy and Seaborn.
  • Jupyter Notebook and VS Code.
  • ExtraTrees as Features Selector.
  • Linear Regression, RandomForest, ExtraTrees and KNN (K-Nearest Neighbors) Models.
  • Data Storage: AWS S3 bucket.
  • Deployment: Docker (container with Lambda function) + AWS (S3 Bucket and Lambda).
  • Model Visualization: Streamlit app hosted on AWS EC2 instance.

Machine Learning Optimization on Databricks

Using Databricks tools to train, test and optimize various Machine Learning models

Tools Used:

  • Python, Pandas and Pyspark.
  • Linear Regression, RandomForest and KNN (K-Nearest Neighbors) Models.
  • Model Optimization: Hyperopt Library
  • Model and Experiments Registry: MLFlow

Get in touch