## Statistical Analysis: Pandas and Seaborn on a Kaggle Dataset

When doing Statistical Analysis, curiosity and intuition are two of a Data Scientist’s most powerful tools. The third one may be Pandas.

Tag: python

When doing Statistical Analysis, curiosity and intuition are two of a Data Scientist’s most powerful tools. The third one may be Pandas.

Today we’ll leverage Python’s Pandas framework for Data Analysis, and Seaborn for Data Visualization. Sometimes when facing a Data problem, we must first dive into the Dataset and learn about it. Its properties, its variables’ distributions — we need to immerse in the domain.

Web Scraping with ScraPy comes into the scene whenever you need to generate your own dataset. Sometimes Kaggle is not enough.

Sometimes you open a big Dataset with Python’s Pandas, try to get a few metrics, and the whole thing just freezes horribly. Dask Dataframes may solve your problem.

As a Data Scientist, I spend about a third of my time looking at data and trying to get meaningful insights, the discipline some call exploratory data analysis. These are the tools I use the most. Today we will be looking at two awesome tools, following closely the code I uploaded on this github project. One is Jupyter Notebooks, and the other is a Python Framework called Pandas.