About me

My name is Joao Marcos Visotaky Junior

I have a bachelor degree in Computer Science from UNESP (Universidade Estadual Paulista) and work on personal Data Science projects to gain experience in solving business problems.

As the projects below demonstrate, I am able to implement a complete Data Science project (“end-to-end”), from obtaining business requirements to publishing it in the cloud, including creating tools to access the Machine Learning Models for no-tech people.

On my work as a Data Scientist i look to improving the company's decision-making process using Data Science tools.

Skills

Programming Languages and Databases

Python with Data Analysis focus.
Web Scraping with Python.
SQL for data storage and extraction.
PySpark and Databricks for Big Data projects.
Databases: SQLite, PostgreSQL, MySQL.

Statistics and Machine Learning

Descriptive Statistics (location, dispersion, asymmetry, kurtosis and density).
Algorithms: Supervised Learning (Regression, Classification and Learn to Rank).
Data balancing, attribute selection and dimensionality reduction techniques.
Algorithm performance metrics (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, Accumulative, ROC and AUC curves).
Machine Learning Stack: Sklearn and Scipy.

Data Visualization

Visualization Libraries: Matplotlib, Plotly, Seaborn, Folio.
Visualization Tools: Telegram Bot, Streamlit.

Software Engineering

Git, Github, Gitlab, Virtual Environment, Cookiecutter.
Streamlit API, Flask and Python API’s.
MLOps: DVC, Heroku, AWS (S3, EC2, Lambda Function), Databricks.

Professional Experience

April/2022: Data Scientist at KPMG Brazil

Building data solutions to large brazilian companies on the Industrials sector.

Oct/21 - Mar/22: 03 Complete Data Science Projects

Construction of data solutions for business problems using public data from Data Science competitions, where I approached the problem from the conception of the business challenge to the publication of the Machine Learning model using Cloud Computing tools.

2013-2021: Judicial Clerk

Worked in Accounting and Court Cases Distribution. Responsible for the preparation of judicial calculations.

2013: Trading System optimization using genetic algorithm

Graduation Project for the Computer Science degree. In this work I analyzed the profitability of a genetic algorithm dedicated to buying and selling brazilian largest oil company (Petrobras) shares.

Data Science Projects

Exploratory Data Analysis of Real Estate Data

In this project I used data analysis tools to identify which properties were below market price.

Tools used:

Git and Github
Python, Pandas, Matplotlib, Numpy and Seaborn.
Jupyter Notebook and VS Code.
Interactive maps with Plotly and Folium.
Heroku Cloud.
Streamlit Python Framework.

DETAILS

Sales Forecast of a Drugstore Chain

In this project I used Regression Machine Learning algorithms to build sales estimates for the next 06 weeks. Estimates were made for each store. In addition, I created a bot on Telegram that allows accessing forecasts for each store.

Tools used:

Git and Github
Python, Pandas, Matplotlib, Numpy and Seaborn.
Jupyter Notebook and VS Code.
Boruta Feature Selector.
Linear Regression, Random Forest and XGBoost.
Hyperparameter Fine Tuning.
Model Deployment on Heroku Cloud.
Telegram Bot to access forecasts.

DETAILS

Optimizing the Marketing Budget with Machine Learning

Detect the customers of a health insurer who are most likely to buy another insurance (cross-sell)

Tools Used:

Git, Gitlab and Github
Python, Pandas, Matplotlib, Numpy and Seaborn.
Jupyter Notebook and VS Code.
ExtraTrees as Features Selector.
Linear Regression, RandomForest, ExtraTrees and KNN (K-Nearest Neighbors) Models.
Data Storage: AWS S3 bucket.
Deployment: Docker (container with Lambda function) + AWS (S3 Bucket and Lambda).
Model Visualization: Streamlit app hosted on AWS EC2 instance.

DETAILS

Machine Learning Optimization on Databricks

Using Databricks tools to train, test and optimize various Machine Learning models

Tools Used:

Python, Pandas and Pyspark.
Linear Regression, RandomForest and KNN (K-Nearest Neighbors) Models.
Model Optimization: Hyperopt Library
Model and Experiments Registry: MLFlow

DETAILS

Welcome to my Data Science portfolio

About me

My name is Joao Marcos Visotaky Junior

Skills

Programming Languages and Databases

Statistics and Machine Learning

Data Visualization

Software Engineering

Professional Experience

April/2022: Data Scientist at KPMG Brazil

Oct/21 - Mar/22: 03 Complete Data Science Projects

2013-2021: Judicial Clerk

2013: Trading System optimization using genetic algorithm

Data Science Projects

Exploratory Data Analysis of Real Estate Data

Tools used:

Sales Forecast of a Drugstore Chain

Tools used:

Optimizing the Marketing Budget with Machine Learning

Tools Used:

Machine Learning Optimization on Databricks

Tools Used:

Get in touch