Performing Analysis of Meteorological Data
Performing Analysis of Meteorological Data
By: Hrishikesh Dherange
Introduction:
Hello Folks, Welcome to my blog where we will perform analysis on the meteorological Data i.e. the Weather Data and the Data is solely taken from Krackin.com and the link for the same is: "https://www.kaggle.com/muthuj7/weather-dataset" all the operations will be done using the standard python libraries like Numpy, Pandas to perform analysis and matplotlib and seaborn library for the visualization.
Methodology:
So, Lets start by importing the dataset in the Google Collaboratory mainly known as 'Colab' we can use any interpreter, depends on the personal preference. As Google colab is an cloud based interpreter importing the dataset includes one extra step like uploading the data first to the colab and then reading that data in our traditional method. Here's how,
This will prompt you to select the dataset from your local computer and you can read the same data for your analysis. After uploading the data, we need to read the data and store it in a variable.
Hence the uploaded data is now stored in the variable 'dataset' on which we can perform analysis and also the visualization.
Coming to the data cleaning task, it's a good practice to firstly check if there's any null values present in our dataset which most of the times is the common scenario that impacts our analysis results or accuracy. So, Lets check it out.
Executing the above cell would prompt us with the actual result with the number of rows contain the null values for each and every feature in our dataset, Lets see if any of our feature has the null value.
Formatted Date 0 Summary 0 Precip Type 517 Temperature (C) 0 Apparent Temperature (C) 0 Humidity 0 Wind Speed (km/h) 0 Wind Bearing (degrees) 0 Visibility (km) 0 Pressure (millibars) 0 Daily Summary 0 dtype: int64
so here we can see that the feature named Precip Type contains the null values with 517 rows, there are various ways to deal with the Null Values but here in this case it depends on the importance of the feature if we really do need in our analysis. for now we will be performing analysis only on the Humidity and the Apparent Temperature (C).
Let's take a look at our dataset,
So here we can see clearly that in our data the time is not a proper format and really hard to get that in this format also we won't be able to do proper analysis as well Visualization.
Formatted Date | Summary | Precip Type | Temperature (C) | Apparent Temperature (C) | Humidity | Wind Speed (km/h) | Wind Bearing (degrees) | Visibility (km) | Pressure (millibars) | Daily Summary | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2006-04-01 00:00:00.000 +0200 | Partly Cloudy | rain | 9.472222 | 7.388889 | 0.89 | 14.1197 | 251 | 15.8263 | 1015.13 | Partly cloudy throughout the day. |
1 | 2006-04-01 01:00:00.000 +0200 | Partly Cloudy | rain | 9.355556 | 7.227778 | 0.86 | 14.2646 | 259 | 15.8263 | 1015.63 | Partly cloudy throughout the day. |
2 | 2006-04-01 02:00:00.000 +0200 | Mostly Cloudy | rain | 9.377778 | 9.377778 | 0.89 | 3.9284 | 204 | 14.9569 | 1015.94 | Partly cloudy throughout the day. |
3 | 2006-04-01 03:00:00.000 +0200 | Partly Cloudy | rain | 8.288889 | 5.944444 | 0.83 | 14.1036 | 269 | 15.8263 | 1016.41 | Partly cloudy throughout the day. |
4 | 2006-04-01 04:00:00.000 +0200 | Mostly Cloudy | rain | 8.755556 | 6.977778 | 0.83 | 11.0446 | 259 | 15.8263 | 1016.51 | Partly cloudy throughout the day. |
Let's get this sorted, Next step in the data cleaning we will set the proper indexing to the 'Formatted Date'
And here we are done with setting proper indexing to our feature 'Formatted Date' now we can use the data efficiently with an ease.
Summary | Precip Type | Temperature (C) | Apparent Temperature (C) | Humidity | Wind Speed (km/h) | Wind Bearing (degrees) | Visibility (km) | Pressure (millibars) | Daily Summary | |
---|---|---|---|---|---|---|---|---|---|---|
Formatted Date | ||||||||||
2006-03-31 22:00:00+00:00 | Partly Cloudy | rain | 9.472222 | 7.388889 | 0.89 | 14.1197 | 251 | 15.8263 | 1015.13 | Partly cloudy throughout the day. |
2006-03-31 23:00:00+00:00 | Partly Cloudy | rain | 9.355556 | 7.227778 | 0.86 | 14.2646 | 259 | 15.8263 | 1015.63 | Partly cloudy throughout the day. |
2006-04-01 00:00:00+00:00 | Mostly Cloudy | rain | 9.377778 | 9.377778 | 0.89 | 3.9284 | 204 | 14.9569 | 1015.94 | Partly cloudy throughout the day. |
2006-04-01 01:00:00+00:00 | Partly Cloudy | rain | 8.288889 | 5.944444 | 0.83 | 14.1036 | 269 | 15.8263 | 1016.41 | Partly cloudy throughout the day. |
2006-04-01 02:00:00+00:00 | Mostly Cloudy | rain | 8.755556 | 6.977778 | 0.83 | 11.0446 | 259 | 15.8263 | 1016.51 | Partly cloudy throughout the day |
We here are only concerned with the 'Apparent Temperature (C)' and 'Humidity' so lets get our data sorted for us and neglect the other data for this analysis
Clear enough with the data now.
Now lets do the last step in the Data cleaning, i.e. extract the needed data of only the common month in the year so that it wont be the mess considering whole data of the year and will also be hard enough to analyze the pattern, so we here take the data only mean of the month of April.
Now as we are sorted with the required data, Lets move forward with Visualization.
The output of the above cell would be like,
Here we can clearly see that the average temperature in the month of the April is Approximately same, with slight difference and the level of Humidity is far same along the whole 10 years.
We can clearly see that there is a sharp rise in temperature in the year of 2009 whereas there is a fall in temperature in the year of 2015. Hence we can conclude that global warming has caused an uncertainty in temperature over the past 10 years while the average humidity as remained constant throughout the 10 years.
Comments
Post a Comment