We would like to review approaches to mortality modeling using large amounts of data as that field has evolved from the 1970's to the present.
Our primary work is in financial services and health care. In financial services, we provide services in mortality modeling, Federal Reserve Board (FRB) reporting in the area of Comprehensive Capital and Review (CCAR) applications, and customer churn models. In health care, we work on infectious diseases, pharmacokinetics, proteomics, genomics, psychometrics, and U.S. medical claims databases. We use Hadoop, SAS, and large databases to include Oracle, DB2, Teradata, and VoltDB.
In 1998 we provided a large text-based mining solution to the insurance industry. More recently, in 2014-2015, using the considerable benefits of Hadoop, we provided a solution for a system for handling text with much more data.
Since 2001, we have used methods for large databases and SAS datastores using a family approaches for ETL sometimes named ELT, for "Extract, Load, and Transform". We also use high-level versions of the same approaches to load rules, vocabularies, and schema definitions. A brief note on ETL done as ELT is here.
For analytics, beginning in the 1980s, we used SAS, S, MATLAB and PERL. We later used S-PLUS. Today we use R, Revolution R, SAS, MATLAB, PERL, NONMEM, Monolix, Bugs/STAN, Mallet ( for topic modelling), Mahout, and domain-specific packages. When they apply, we use multi-level, nonlinear mixed effects ( NLME ) and Bayesian methods; we prefer to do dynamic statistical models using MATLAB; for work in model identifiability, we use Maple.
We acknowledge the relatively widespread perspective that roughly one third of data science is hacking. We would also like to acknowledge the perspective that statistical science comprises "old statistics", e.g., that data science without statistical science is both possible and desirable.
Our own perspective, though, is that handling very large data stores is simpler and cleaner if we can make use of formal methods. With the help of formal methods together with other advanced methods from advanced statistics and mathematics, we make use of the "science" part of "data science" as much as possible. We can then also make the best possible use of Hadoop, very large databases, and tools that combine the best of both worlds.
To solve your big data problem at hand using methods that are both vendor-independent and non-doctrinaire, please email Rich Haney at RichHaney@bigdata2.net.
Big Data2 Consulting offers solutions for big data with a focus on the Big Data 2.0 paradigm.
For us, Big Data 2.0 involves use of Hadoop and very large databases.
Big Data2 Consulting
Our experience is that ETL done in a de facto way as Extract Load and Transform ( ELT ) is relatively common.
Use of an algebraic perspective can also simplify tasks involving the automated generation of the actual code for the system itself.