Skip to main content

Sharpen your teeth in Data Analytics (and maybe create a portfolio while you’re at it)


 

As a novice data analyst myself, I truly understand the mind of another beginner rearing to go out in the world and trying to tame a wild data set and analysing the heck out of it. Understandable, it is then, the urge to pick up the same old Kaggle data set that is begging not to be analysed for the umpteenth time, all whilst arriving at the same insights as the 24,637 people that came before you. One must not be utterly taken aback then, when they see the face of their interviewer (seeing the same analysis thirteenth time that day) turn red in anger without application of any sort of conditional formatting.

So, one might naturally enquire into the remedy to such an ailment. I would then point you towards the less traversed yet endlessly fascinating direction of obscure publicly available datasets. From an exhaustive data set of all known passengers of the RMS Titanic to the largest reference data set of the human genome, not only do they make for remarkably interesting candidates for analytics projects, but they also set you apart in the eyes of interviewers. 

The variety of data out there is so diverse, every budding data enthusiast is bound to discover something that piques their analytical interest.


So here are some of my favourites that you can check out:
  1. Data.gov.in: Do your parents boast that they used to shop for a whole week’s worth of groceries all for 15 rupees in 2009? Bust out the last 15 years of Consumer Price index (CPI) data from the Government of India’s official data repository to prove them wrong, once and for all. Developed by the National Informatics Centre (NIC) under the aegis of Ministry of Electronics and Information Technology (aka MEITy), Data.gov.in has data from more than 6,00,000 resources including crime, judiciary, urban and sports. At this data heaven, everyone is sure to find something that would make a worthy addition to their portfolio. Not only this, but you can have access to a wide selection of their APIs as well. Bonus points for no sign ups necessary.

  2. Awesome public datasets (GitHub): From Swiss apartment models to the biggest crowdsourced database of American gut biome, ‘Awesome public datasets’ is a source of admittedly more global, yet no less amusing datasets which one can explore in search for their next project. These datasets were painstakingly collected and tidied from blogs and user responses. Most of them are absolutely free and part of the open source movement. Again, no obligation to sign up!

  3. Sindresorhus’s Awesome Collection (GitHub): This list, my friends! Is the GitHub equivalent to Sir Ravindra Jadeja because of the all-rounder variety of resources it holds. Not only is it home to learning resources ranging from fintech to Generative AI, it also holds free books, public datasets and much more. This list is a one stop shop to learn anything and everything! Do, however, make sure to have blinders on while you visit this page, otherwise you’re guaranteed to be distracted along the way (speaking from personal experience).

  4. Figshare: This one is for all the academically minded folks out there. Figshare has an endless trove of datasets from close to 25 categories ranging from economics to earth sciences. Be it China’s Covid-19 case data from January 2020 or the species of native plants in any state of the US, if you can think of it, this data repository probably has it. With a clean UX, it successfully distinguishes itself from the typical academic website, making it easier for a newbie to find his way around. The good news? You can download up to 20 GB of this data for FREE! (thank me later)

  5. Google Trends: Did you know that mentions of the term “big data” peaked in October 2018? Wanna know why? Then this is my homework for you to find out through Google’s own repository of all things trends and keywords. Alright! I must admit this one isn’t very “obscure” but deserves a mention, nonetheless. A pioneer in “nowcasting”, google trends is the back bone for all sorts of projects to get real time updates, the OECD’s weekly GDP tracker being a good example. Being the good Samaritans they are, they have an extremely helpful section right upfront to teach newbies how to make the most of this data as well.

  6. World Bank Open data: Wonder how the GDP of the nations of the world has changed over the past 30 years? No biggie, the World Bank has you covered! With it’s ‘World Bank Open Data’ initiative, it has made a true wealth of financial and fiscal data available to the masses. This data is available to download in CSV, XML and Excel formats along with access to their own data bank and thematic tables for easy understanding. All at one click of a button!

  7. OECD Data Explorer: In their own words, The Organisation for Economic Cooperation and Development (OECD) (phew) is an international organisation working towards making better policies for better lives. But they’re not all talk, they’ve made available data ranging from Tobacco consumption, Variation in Body weight between nationalities and Wildfires, to one and all. This excellent selection of data can help you analyse everything from levels of alcoholism between states in India to the variation of occurrence of obesity within the country. Truly a great way to spend one’s Saturday, don’t you think? (just kidding)

  8. UCI Machine learning repository: Focused towards machine learning enthusiasts, this is an excellent repository of more than 600 datasets for all the newbies trying to get themselves familiar with ML. This will ensure that you go from being an ML clueless to an ML connoisseur in no time!

So, I hope that equipped with these sources, you will make your portfolio stand out like a kangaroo in a penguin enclosure. Do always remember the wise words of Franklin D. Roosevelt, “The only thing we have to fear is fear itself, and maybe not backing up important data” (don’t quote me on that though).

Till we meet again, Data comrades!
 
About Author
Author Photo
Vasudev Pandey
I am a budding data scientist and mechatronics engineer with a passion for history and finance. I write about anything and everything I find interesting.

Comments

  1. Thank you for sharing these public datasets with the TakeOff Talent community, Vasudev. This will surely help many people who are confused around how to build a portfolio in analytics as well as in data science.

    ReplyDelete
  2. great article, thank you so much for sharing vasudev

    ReplyDelete
  3. Thanks Vasu...very helpful

    ReplyDelete
  4. This will surely help the DS community. - Dan

    ReplyDelete

Post a Comment

Please feel free to share your thoughts and discuss.

Other popular job openings

Barclays is hiring for a fresher entry level Analyst role in India

Position: Analyst Company: Barclays Location: Noida, India Job type: Permanent Job mode: Hybrid (onsite on anchor days as per business requirements) Job requisition id: JR-0000031646 Years of experience: Not specifically mentioned - looks like a fresher entry level job as per JD; 0-3 years Company description Barclays is a leading global financial services provider, built on a foundation of innovation, collaboration, and customer-centric services. With a history stretching over centuries, Barclays operates in various sectors including personal banking, corporate banking, wealth management, and investment banking. The company's strong presence across continents allows it to offer a diverse range of financial solutions to an expansive customer base. Known for promoting an inclusive culture, Barclays is dedicated to creating a dynamic and supportive work environment where every individual feels valued and empowered. The firm prioritizes continuous...

Micron Technology is hiring for a fresher entry level Data Analyst role in India

Position: Data Analyst Company: Micron Technology, Inc. Location: Hyderabad, Telangana, India Job type: Full-time Job mode: On-site Job requisition id: JR75982 Years of experience: 0-3 years Company description Micron Technology stands at the forefront of memory and storage innovations, playing a critical role in how information powers the world. Their cutting-edge solutions drive the shift from traditional data usage to an intelligence-based economy. With a commitment to operational excellence and customer satisfaction, Micron offers a wide range of high-performance DRAM, NAND, and NOR memory and storage products. These solutions empower industries in areas like artificial intelligence, 5G, and beyond, influencing everything from data centers to mobile experiences. Micron’s efforts fuel the backbone of technological progress, pushing the boundaries of innovation. The company fosters a culture of inclusion, sustainability, and community service, ...

TIAA is hiring for a fresher entry level Analyst role in India

Position: Analyst (Client Service Associate) Company: TIAA Global Capabilities Location: Pune, India Job type: Full-time Job mode: Hybrid Flex Job requisition id: R250400193 Years of experience: Open to freshers; 0-3 years Company Description Founded in 2016, TIAA Global Capabilities is an extension of the renowned U.S.-based TIAA organization that has served professionals in the nonprofit sector for over a century. The firm focuses on leveraging a highly skilled talent pool to drive strategic initiatives in technology, operations, and shared services. With a vision to reduce operational risks and foster innovation, the organization plays a crucial role in enhancing the efficiency of TIAA’s global functions. The company fosters a collaborative environment where professionals work closely with U.S.-based colleagues to improve system throughput and productivity. Emphasis is placed on insourcing vital platforms and technological infrastructure to ens...

Bristlecone is hiring for a fresher entry level Data Scientist role in India

Position: Data Scientist - Data Science Company: Bristlecone Location: Mumbai, Maharashtra, India Job type: Full Time Job mode: Onsite Job requisition id: 16068 Years of experience: Open to entry-level candidates with required skill set; 0-3 years Company Description Bristlecone is a leader in AI-powered transformation services, specializing in connected supply chain solutions. The organization focuses on providing agility, transparency, automation, and resilience to help businesses manage change effectively. They offer a wide range of services across digital logistics, cognitive manufacturing, autonomous planning, smart procurement, and supply chain digitalization. Their portfolio includes consulting on digital strategy, system design and build, and implementation services, supporting a wide array of technology platforms. Headquartered in San Jose, California, Bristlecone has an extensive presence with offices across North America, Europe, and As...

Demandbase is hiring for an entry level Data Analyst I role in India

Position: Data Analyst I Company: Demandbase Location: Hyderabad, India Job type: Full-time Job mode: On-site Job requisition id: Not explicitly mentioned Years of experience: 1 to 4 years (Internship experience also counts) Company Description: Demandbase stands at the forefront of B2B marketing innovation, transforming the traditional approach to go-to-market strategies. The company eliminates the barriers caused by fragmented data and technology, enabling sales and marketing teams to work more cohesively. Through Account Intelligence, Demandbase empowers organizations to identify opportunities sooner, engage prospects more effectively, and accelerate deal closures. Its footprint extends across major cities including the San Francisco Bay Area, New York, Seattle, and international locations like the UK and India. Recognized as a top workplace, Demandbase invests deeply in cultivating talent, enhancing company culture, and supporting community eng...

Linde Group is hiring for a fresher entry level Data Scientist in India

Position: Data Scientist for AI Products (Global) Company: Linde Group Location: Bangalore, Karnataka, India Job type: Regular / Permanent / Full-Time Employment (FTE) Long-term opportunity with stable employment terms Job mode: On-site role requiring physical presence in the Bangalore office Job requisition id: req23348 Years of experience: Not explicitly mentioned However, preference given to professionals with relevant hands-on experience in data science, machine learning, and engineering environments Company Description: Praxair India Private Limited, a part of the global Linde group, is a well-recognized industrial gases and engineering company that operates in over 100 countries across the globe. With a strong emphasis on innovation and sustainability, Linde and its subsidiary Praxair are committed to making a real impact in the industrial sector by offering high-quality technology solutions and services. On April 1st, 2020, Praxair India m...

Midtown Software is hiring for a fresher entry level Data Analyst role in India

Position: Data Analyst Opportunity for recent graduates and early-career professionals Freshers welcome, especially those with foundational training in data tools Involves both individual contributions and team-based tasks Ideal for candidates passionate about data and visualization Company: Midtown Software A US-registered international software services provider Specializes in innovative digital solutions and automation tools Offers consulting and product development services for Fortune 500 companies Active operations in the US, Canada, and India Uses modern technologies including ASP.Net, React, Angular, Azure, C#, SQL-as-a-service, and mobile platforms Location: Chandigarh, India Remote work model No relocation required; work from home Job type: Full-time role Open to candidates seeking long-term employment Stable and growth-oriented position Job mode: Remote opportunity Communication and coordination with US/Canada teams Flexibility...

Observe AI is hiring for a fresher entry level Associate Machine Learning Engineer

Position: Associate Machine Learning Engineer (Contract) Company: Observe.AI Location: Bangalore, India Job type: Contract (6 months) Job mode: Onsite Job requisition id: Not specified Years of experience: 0 to 2/3 years Company description Observe.AI is a leading innovator in conversation intelligence solutions for contact centers. The company's platform enhances frontline team performance by providing real-time insights, coaching tools, and workflow automation powered by the industry's most accurate AI engine. Trusted by renowned organizations like Pearson, Accolade, Group 1 Automotive, Southeast Trans, and Public Storage. Recently secured $125 million in Series C funding, led by SoftBank Vision Fund 2, with participation from Zoom Video Communications, bringing the total funding to $213 million. Investors include Menlo Ventures, Next47, NGP Capital, Emergent Ventures, Scale Ventures, Nexus Ventures, and Y-Combinator. Observe.AI is comm...

Peraton is hiring for an entry level Junior Data Scientist role in US

Position: Title: Junior Data Scientist Focus Area: Scientific, Research, and Analysis Requisition ID: 2025-156396 Company: Organization Name: Peraton Industry: National Security and IT Services Areas Served: Land, sea, air, space, cyberspace Service Clients: U.S. Government, U.S. Armed Forces, Allies Location: Work Location: Washington, D.C. Workplace Environment: On-site presence required Relocation Assistance: Not mentioned Job Type: Employment Type: Full-time Employment Category: Professional Work Schedule: Standard business hours (not explicitly mentioned) Job Mode: Remote Work: Not permitted Telecommute Option: No Work Environment: In-person collaboration Job Requisition ID: Internal Reference Number: 2025-156396 Category Code: Scientific, Research & Analysis Application Status: Open for applications Years of Experience: Entry-level Requirements: 0 years with a Bachelor’s Degree Without Degree: 4 years of relevant experienc...

United Airlines is hiring for a fresher entry level Associate level Data Scientist role in India

Position: Associate – Data Scientist Company: United Airlines Business Services Pvt. Ltd. (a wholly owned subsidiary of United Airlines Inc.) Location: Gurugram, Haryana, India Job type: Full-time Job mode: Hybrid Job requisition id: GGN00001898 Years of experience: 0 to 2/3 years Company Description United Airlines is one of the most recognized and reputable airlines globally, known for innovation, customer service, and operational excellence. Headquartered in the United States, United serves hundreds of destinations worldwide and operates one of the largest and most comprehensive route networks in the aviation industry. United Airlines Business Services Pvt. Ltd., located in Gurugram, India, is a strategic extension of the parent organization. This subsidiary plays a critical role in managing digital transformation, advanced analytics, and technological innovation to streamline United’s global operations. The company provides employees with access...