63 assignment 2
63.1 Smoothing
In this assignment, you will delve into the application of different smoothing techniques on time series data. Utilizing meteorological data, your task is to create a series of plots that demonstrate the effects of various smoothing methods.
Choose 3 out of the 5 folowing smoothing-related tasks
63.1.1 Comparative Smoothing Methods Analysis
- Goal: Showcase three smoothing techniques – Rolling Average, Savitzky-Golay, and Resampling – on the same time series data.
- Task: Overlay these methods over the actual data in a single plot. Ensure each method uses the same window size for consistency. Describe in a few lines the differences you see.
63.1.2 Rolling Average Window Size Impact
- Goal: Analyze the effect of varying window sizes on the Rolling Average method.
- Task: Produce a plot with three lines, each representing the Rolling Average with a different window size. Describe in a few lines the differences you see.
63.1.3 Savitzky-Golay Polynomial Order Variation
- Goal: Investigate how changing the polynomial order affects the Savitzky-Golay smoothing method.
- Task: Create a plot with three lines, where each represents the Savitzky-Golay method with a different polynomial order. Describe in a few lines the differences you see.
63.1.4 Kernel Shape Influence in Rolling Mean
- Goal: Explore the impact of different kernel shapes on the Rolling Mean.
- Task: Generate a plot displaying three lines, each using a different kernel shape in the Rolling Mean. We encorage to use unique kernel shapes that we did not showcase in class. See this list of kernels. Describe in a few lines the differences you see.
63.1.5 Moving Average with Confidence Interval
- Goal: Plot a Moving Average along with a 75% confidence interval.
- Task: Design a plot illustrating both the Moving Average and its 75% confidence interval.
63.2 Outliers
Using the same datasets from the outliers/challenge
lecture, you will write rolling functions to detect outliers in time series data. We already wrote together the code that uses the Z-score method to detect outliers. Now, you will write:
- a rolling function to detect outliers using the IQR method.
- a rolling function to detect outliers using the Median Absolute Deviation (MAD) method.
Choose only one column of one of the datasets to perform the outlier detection. You can choose any dataset and any column you prefer.
- Plot 2 graphs: one for the IQR method and one for the MAD method with the same window size.
- Make a table that shows the number of outliers detected by each method, and different parameters such as window size and threshold. What did you learn from this table?