For our second assignment we were asked to perform a few tasks involving the analysis of a dataset.
Task 1 - List (bullet list of items) five "insights", chunks of knowledge, or deeper questions that you either
encountered or gained while exploring the data.
- Cereals which appeal more to older generations seem to appear mainly on shelf 3 (All Bran, etc). Perhaps useful information can be obtained from this?
- Perhaps more hot cereals could be included? I am fairly sure there are more than just 3 brands of hot cereals on the market currently.
- It seems that most cereals have either 25% or 100% vitamin fortification. Perhaps this could be used to call into question why there are no cereals with fortification levels between these two amounts?
- Calories per serving does not exceed 150 for any cereal and those that have 150 calories are few. Perhaps more cereals with high calorie densities should have been included in the dataset?
- It seems that most cereals have a weight of one ounce per serving, yet the serving size varies quite a lot despite this. Perhaps these two attributes could be used to determine the approximate density of each cereal?
Task 2 - Write one paragraph about the process you used to do the exploration and analysis.
I loaded the dataset into Excel. I then went through each
column manually and searched for patterns in the data. Since the dataset is not
very large, this was not very difficult. I made sure to see if any multiple
attributes correlated with each other in any way.
Task 3 - write one paragraph about challenges or problems that you encountered in doing the analysis this
way.
I attempted to create a graph using the simple Excel charts,
but this did not prove to be very helpful compared to simply searching through
the data manually. However, in larger datasets, a chart or other visual
representation of the data would likely be necessary as combing through the
data manually would be impractical. I feel a more useful data analysis tool or
a dataset more compatible with Excel would be helpful in more easily discerning
patterns within the dataset.
No comments:
Post a Comment