Getting into Data Engineering
As a data engineer, my job is to design and build systems that collect, store, and process large amounts of data. This involves creating pipelines that allow data to flow between different systems and ensuring that the data is organized and structured to meet the needs of data scientists and analysts. Data engineering plays a critical role in helping organizations make better decisions and solve problems using data, and I am proud to be a part of this important field.
Every now and then, I get many questions about getting into Data Engineering, skillsets, interviews, good companies, salaries, and many more. Today, through this post, I will try to answer some of the most asked questions
How do I become a data engineer as a fresh college graduate?
To become a data engineer as a fresh college graduate, you should acquire the following skills:
- Proficiency in at least one programming language, such as Python or Java
- Strong SQL skills and knowledge of database design and management
- Understanding of data modelling concepts and techniques
- Familiarity with cloud computing platforms and their use in data engineering projects
- Basic understanding of data visualization tools and techniques
In addition to these technical skills, it is essential to develop strong communication and teamwork skills, as data engineering often involves collaborating with cross-functional teams. Practical experience through internships or projects can also be valuable in building your skills and portfolio.
From a fresher, no one expects all the above skills but familiarity is greatly rewarded
How is data engineering different from software engineering?
Data engineering and software engineering are two fields that involve different types of work.
Data engineering involves building systems to collect, store, and process large amounts of data. This may involve building pipelines to move data between different systems, designing and building databases, and creating algorithms to extract insights from data.
Software engineering involves building and improving software applications. This may involve writing code to build a new application or modifying existing code to improve an existing application.
Data engineering and software engineering use different tools and techniques. Data engineering often involves working with large datasets and using programming languages such as Python or R to manipulate and analyze data. Software engineering may involve using a variety of programming languages and frameworks to build and maintain software applications.
The bar for software engineering interviews is different than data engineering as the focus for data engineering is constrained to specific data-driven skill sets.
What are the tools used by data engineers?
There is no common stack that is used by everyone as each role is different. However, all of us use a subset of the stack. Some common tools used in data engineering include:
- Programming languages: Python, Pyspark, SQL, etc.
- ETL tools: Apache Beam, Apache Spark, Talend, Informatica etc.
- Data storage systems: Relational databases (MySQL, PostgreSQL), NoSQL databases (MongoDB, Cassandra), data lakes (Amazon S3, Google Cloud Storage)
- Data processing frameworks: Apache Hadoop, Apache Flink, etc.
- Data visualization tools: Tableau, Power BI, Matplotlib, etc.
- Cloud computing platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP), etc.
These tools help data engineers design and build systems to collect, store, and process data, as well as create pipelines to move data between different systems and extract insights from the data.
What are the salary ranges in India for data engineers?
There is no correct answer to this. India is a land of extremes and you can find two people doing the same job with extreme differences in their incomes. Based on my industry experience, here is a rough estimate:
- Freshers (0 years experience): Data engineers with no experience may start at a salary of around INR 3 to 8 lakhs per annum
- Juniors (1 to 5 years experience): Here the range is 5-12 lakhs per annum
- Seniors (5 to 10 years experience): This varies a lot but mostly I see between 10 to 20 lakhs per annum
- Principals (10+ years experience): These engineers are heavily skilled and easily make above 20 lakhs per annum
Tech giants, however, like Amazon, Meta and Google pay a lot more than the above range. Fresher salaries for them start at close to 25 lakhs per annum including bonuses and stocks. Principals easily get more than 1.5 crores.
For people not familiar with Indian Numbering System — 1 Lakh = 100,000 rupees and 1 crore = 10,000,000 rupees
Is a technical college degree required for data engineering jobs?
It depends on the company and its hiring criteria. Tech giants like Google and Amazon don’t really care about degrees as long as you have got the required skillset, and experience, and are able to crack their interview rounds
Some companies do expect a technical degree or at least certification on certain tools. It is better to check the job description and with the recruiter to clarify such doubts.
Can you describe the interview process for the post of data engineer?
Data engineering interview loops can differ based on the role requirement and company. Based on my experience, I will try to come up with the most common rounds that are usually part of the process. You might have all or a subset of these:
Initial Screening: This is a round with your recruiter which typically is a discussion about your experience and the role and how they fit
Online Test: Many companies send their candidates an online test which might be between 30 to 60 minutes consisting of technical MCQs and/or programming assignments
Phone Interview: In this round, the candidate meets someone from the company to discuss their past experience and maybe solve one or two technical questions which might include SQL programming or easy-to-medium data structure questions. This is to see if the candidate is fit to be interviewed
Onsite: This is typically a 3 to 5 rounds loop where the candidate has to meet a set of people including the hiring manager to solve SQL, data modelling, system/pipeline design questions, and along with behavioural questions. These rounds form the heart of the candidate’s interview loop
Team Fitment: This round is usually conducted by the hiring manager and mostly covers projects and specific technologies in them. In addition to that, there may be a lot of behavioural questions to judge the candidate’s team fitment
Bar Raiser: This is one of the special rounds mostly conducted by technology giants like Amazon. In this round, someone, who is not part of the team interviews the candidate to provide a neutral perspective to negate any bias or to break any deadlock
I hope this gives you a good idea about the rounds
Can you recommend some free resources to gain expertise for data engineering jobs?
- Youtube can be a great source for free tutorial videos. Go for videos published on www.freecodecamp.org as they curate awesome content
- For SQL coding practice, you can use www.hackerrank.com or www.datalemur.com
- For data warehousing concepts, I recommend this: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/
- Here is a great GitHub repository for data engineering cookbook: https://github.com/andkret/Cookbook. This is a single resource you might need for your first data engineer job
- Free Tableau (data visualization) tutorial: www.tableau.com/training
- A great LinkedIn post highlighting apache spark free resources: Apache Spark Free
I will update this post with more when I find it
I hope that this article has helped to clear up some of the common questions and misconceptions about data engineering. Whether you are just starting out in the field or have been working as a data engineer for years, I hope that you have learned something new and useful from our discussion. If you have any other questions about data engineering or want to learn more about the topic, don’t hesitate to reach out to me!