Roles and Responsibilities:
- Building end-to-end data pipelines for ML models, other data driven solutions such that the pipeline is directly usable for deployment/implementation.
- Building and maintaining data pipelines: data cleaning, transformation, roll-up, pre processing, etc.
- Building/developing data insight solutions various teams like Credit, Collections Distribution, Vigilance, HR, etc.
- Building automation solutions using Python, SQL, Docker, etc. as required.
Technical Skills
Must have:
1. Primary skill set:
- · High Proficiency in Python coding along with good knowledge of SQL (joins, nested query, etc.)
- · Data analysis experience. Understand and identify the data points and data acquisition mechanism for structured and unstructured data (text/json/xml) for machine learning data pipeline.
- · Knowledge of using Python Libraries such as Pandas, sqlalchemy (or other Python SQL related libraries), [good to have: matplotlib, numpy, scipy, scikit-learn, nltk].
- · Working knowledge of GIT repositories (any of the Github, Gitlab etc.)
2. Data management skill sets:
- · Ability to understand data models and create the ETL jobs using Python scripts.
- · Automate regular data acquisition, application process etc. using Python scripts.
Good to have (must be open to learning if doesn’t have already)
- Web API technology:
- Experience in Rest API developments using any of Django, Flask, Fast API etc. (this will be highly appreciated)
- Deployment of web API on cloud using Docker (this will be highly appreciated).
Other useful skills
- Working knowledge of Linux. (this will be highly appreciated).
- Should be able to work on problems independently or with less support
- Concept of bigdata and Spark (PySpark) knowledge
- Cloud experience (AWS/Azure/GCP)