How Independent Health Scaled Up their Data Processing Workflows using Python, PySpark and AWS Glue
Awarded one of the best companies to work for in New York State 14 consecutive years, Independent Health is known for staying on the cutting edge of technology. While preparing to migrate their data to the cloud, the Information Management leadership team at Independent Health needed to ensure that the entire team of data architects and data engineers were able to quickly level up their skills in Python programming and in the new cloud technologies they would soon be using on a daily basis.
Edlitera was brought on to design and facilitate a custom training on Python programming for data engineering using the most popular analytics engine for large scale data processing, and other data engineering services and libraries. As a result of the training, the data migration and integration team were able to hit the ground running with using PySpark and other data engineering services in the cloud.
About the Training
Client
Independent Health
Industry
Health Insurance
Group
Data Engineering
Situation
Group with diverse skill sets, using SQL and various ETL tools, preparing for cloud migration
Solution
Twenty-four week hybrid training program on Python coding, PySpark and AWS Glue
Outcome
The team started converting workflows to use PySpark and AWS Glue
"I went from having little Python and no PySpark experience to having confidence to write Python and PySpark code while people are watching in our Cloud Build Lab in just a couple of weeks. I would like to thank Ciprian for the training and for his patience."
Venkat K..
Data Architect at Independent HealthInterested in training for your team?
We’d love to hear from you!
Situation
A newly formed group, the data migration and integration team are responsible for crucial data-enabled processes. The team primarily used Oracle and ETL tools, but the organization is in the process of migrating targeted data assets to the cloud. The data migration and integration team have diverse backgrounds, with varying degrees of experience in SQL, ETL tools and other programming languages.
The team was also looking for a new data analysis and scripting tool to use uniformly within the new cloud environment and for their most critical workflows. For this training, time was of the essence, and the team needed to become expert users of their new tools and environment as soon as possible.
Solution
Python was identified as the best tool to manage different processes and workflows in the cloud and in local environments, based on its large ecosystems, community and mature solutions for data processing, data analysis and data engineering. Edlitera was brought in to help bring the team to the needed baseline on Python programming, as well as on other data engineering services and tools.
Edlitera designed a custom training curriculum that combined theoretical concepts and hands-on practice, with a strong focus on using Python and other data engineering services and tools to design and build highly scalable data pipelines.
In-class code concepts and examples were followed by in-class practice problems and assigned homework that allowed participants to get comfortable with Python and other data engineering services within the cloud environment.
The focus of the training was 3-fold:
What Participants Say
“The interactive nature of the training and the combination of Python scripting and cloud tools were both instrumental in the learning process. Great discussions and exercises.”
Beth P., Senior Data Architect at Independent Health
“Ciprian has been an excellent teacher. His training has equipped me with how to deal with real world Big Data scenarios, especially for ETL purposes. Ciprian really went above and beyond with his efforts throughout this training.”
Sushma D., Data Architect and Engineer at Independent Health
Results & Outcomes
By the end of the course, participants learned how to use Python as a general programming language, and how to design, create and test custom data pipelines. They also got hands-on practice using Jupyter notebooks to author and run code for large-scale data processing.
Finally, participants learned how to deploy and schedule data transformation jobs in the cloud, and how to leverage data stored in data lakes, databases and warehouses. Emphasis was also placed on how to use security, identity and access management policies, as well as Python and cloud best-practices.