Data Science

Advanced Data Science Concepts – Part 1

Data Science also is a field of study concerned with the methods applied in the processing of data to extract useful information(knowledge). The field in quite broad, but we are going to examine the following sub-topics.


Discussion on Homescedasticity and Heteroscedasticity

Heteroscedasticity or Heteroskedasticity is a concept applied in Regression Analysis where the conditional variance of a given variable changes with respect to a second variable. Heteroscedasticity normally occurs in situation where there exists significant difference among the various sizes of the observed variables ( visit the discussion section to add more explanation).

For most of us in Business, Economics and Finance that mostly use time series and panel data for our research, there will be a seminar on Heteroscedasticity coming soon. The topic is: Can Heteroscedasticity be modeled? The day and time will be announced soon. Please, read textbooks and articles on Heteroscedasticity and raise questions that will be addressed in the seminar.

In fact, Big Data Analytics has now moved on to Industrial Internet of Things (IIoT) and Ecosystemic Data Analytics. What can a developing nation do today to stay economically afloat in a globally connected world? It must resort to Big Data Analytics or better still, the wider Data Science in every sector, in every industry and in every organisation.

It gladness my heart to read from a Businessday report on the 2016 56th NBA Conference in Port Harcourt that the legal profession in Nigeria is set to equip members to conduct their chamber preparations for law court appearances on a special legal data analytics platform. Bravo to the legal profession! You guys have truly entered the current millennium in your legal practice! Congratulations again!!!

Thanks Prof. DJO for reading my paper on Autoregressive Conditional Heteroscedasticity (ARCH/GARCH family of models) and making some intuitive remarks. I want say here that Homoscedasticity and Heteroscedasticity are two sides of a coin. Whereas the error term entering the classical regression model is assumed to be homoscedastic (i.e. its variance is assumed to be constant over time), heteroscedasticity becomes an issue of concern if this assumption is violated. So, the difference between homoscedasticity and heteroscedasticy lies in whether the variance of error term is constant or not. Just as you rightly advised, the two concepts will be thoroughly considered in the upcoming seminar.


Data Consolidation and Aggregation

Data Consolidation referes to the process of aggregating data from multiple data sources and combining them into a single data store. During the process, format conversion occurs where disperate formats of data is unified into a uniform format. This concept is different from data aggregation where data(either from on source of various sources) is summarized and further used for analysis purposes.

Data consolidation have the following benefits:

  • Reduces Inefficiencies: If data is consolidated, higher efficiency is achieved via reduction in time taken to perform operations on the dataset.
  • Facilitates Effective Data Analysis: Improved data analysis could be performed on consolidated data. It also makes it possible to use a wider range of analysis tools on a consolidated data.
  • Reduces Duplication: The process of data consolidation handles the issue of having data duplicated in more than one data store.



  • Similar data collected by different methods can be combined in a single repository
  • Data collection from multiple channels could be unified
  • Duplicate data detection and control
  • Optimized search and retrieval systems


Ecosystemic Data Analysis

This concept derives from the Ecosystem Model which is a way to model an existing ecological system using mathematical/analytical principles.

This is related to Ecosystem Model which a mathematical model used to represent an ecological system.

Observations from the field can be used to make predictions and find ecological relationships among data. This could be relationship of water and sunlight to rate of photosynthesis, or between the predator and prey populations.

A number of research works have been carried out in this areas. On of such is a work done by Julina Quintero et al, which hightlighed the epidemological techniques and the results of a global EcoSystemic study that tailors on the complex relationships existing among ecological, economic, social and political factors.

Another research carries out by Carol A. Darling focus on Ecosystemic Analysis on the Role of Family in Education. In this study, respondents with families from various internaitonal organization participated and the role of family concerns as it relates to educaiton was examined form an ecosystemic view. The findings of this research show that indicated a strong correlation

Another group worked on analysing an online repository called Maarjam. This database hold data on patterns observable amone living things confined to a given region.

Another research was also conducted by Jeroen Raes et al. The study attempted to determine the usefulness of ecological parameters such as adaptation to environmental factors, diversity variation and molecular traits from the gene pool of a give ecosystem. The results shown that climatic factors such as weather are the primary determining factors of the bio-molecular repertoire of each of the samples. However the key limiting factor was found to be the same factors.

No comments yet! You be the first to comment.

Leave a Reply

Your email address will not be published. Required fields are marked *