BDAA
Java - JDK vs JRE
We need to know about Java. So, java has two things as follows: -
Java has JRE (Java Runtime Environment) for running Java applications and JDK (Java Development Kit) for developing them, which includes JRE, compiler, and development tools.
So, JDK = JRE + JDK, so always install JDK then.
How to Check which Java is installed and where?
We can check if JDK is installed by running this command in the command prompt (CMD or terminal):
for JRE version
java -version
for JDK version
javac -version
In order to know to run the commands, hold on windows button + R
then a black screen comes, then type above one command and enter then second command then enter, if you see response in both then JRE as well as JDK version both are installed.
In case you want to know where Java is installed like :C Drive or :D drive then
Using Command Prompt
Type this command in CMD and press Enter:
where java
or
where javac
This will show the exact path where Java (JDK) is installed.
Method 2: Using Environment Variables
In CMD, type:
echo %JAVA_HOME%
If JAVA_HOME is set, it will display the installation path.
Method 3: Manually Check in C Drive
Open File Explorer (Win + E).
Go to C:\Program Files\Java (for 64-bit Java).
Or check C:\Program Files (x86)\Java (for 32-bit Java).
Inside, you’ll find a folder like jdk-22.0.2, which is your JDK installation directory.
1. java.exe (Tea Cup Icon) ☕
This is the Java Runtime Environment (JRE) executable.
It runs Java applications.
The tea cup icon represents Java's branding.
2. javac.exe (Blue Rectangle with Lines) 📄
This is the Java Compiler (javac stands for "Java Compiler").
It compiles .java files into .class files (bytecode).
The blue rectangle with lines is the default Windows icon for executables that don’t have a custom icon.
Why No Tea Cup Icon on javac.exe?
Java assigns the tea cup icon only to java.exe since it is used to run Java programs.
javac.exe is just a command-line tool for compiling Java code, so it does not have a fancy icon.
Summary:
java.exe ☕ → Runs Java programs.
javac.exe 📄 → Compiles Java code (but no tea cup icon).
Set the JAVA_HOME Environment Variable
Open Environment Variables:
Press Win + R, type sysdm.cpl, and hit Enter.
Go to the Advanced tab and click Environment Variables.
Set JAVA_HOME:
Under System Variables, click New (or Edit if JAVA_HOME exists).
Enter:
Variable Name: JAVA_HOME
Variable Value: The path to your correct Java installation (e.g., C:\Java or C:\Program Files\Common Files\Oracle\Java\javapath).
Update the PATH Variable:
In the System Variables section, find Path and click Edit.
Click New and add %JAVA_HOME%\bin.
Verify the Configuration
Close and reopen the command prompt, then run:
echo %JAVA_HOME%
java -version
Ensure it displays the correct Java path and version.
Question - What are system variables in simple terms?
Ans. System variables are global environment variables in Windows that define system-wide settings, such as paths to important software like Java, Python, and Hadoop. They help programs locate required executables and libraries. or
System variables are like a toolbox label for a plumber. They tell the system where to find the right tools (software) so it can do its job properly.
Download and Install Java 11
Download Java 11 (LTS) from:
Install it in a separate folder, e.g.,
C:\Program Files\Java\jdk-11
2. Set Up Environment Variables for Java 11
Open Environment Variables:
Press Win + R, type sysdm.cpl, and hit Enter.
Go to the Advanced tab → Click Environment Variables.
Modify JAVA_HOME for Java 11:
Find JAVA_HOME in System Variables and edit it.
Set the value to Java 11's installation path:
makefile
CopyEdit
C:\Program Files\Java\jdk-11
Modify Path Variable:
In System Variables, find Path and click Edit.
Add this new entry at the top:
%JAVA_HOME%\bin
This ensures that the system picks Java 11 first when needed.
3. Switch Between Java Versions Manually
Whenever you want to switch between Java 11 and Java 22, use these commands:
Switch to Java 11 (for Hadoop)
set JAVA_HOME=C:\Program Files\Java\jdk-11
set PATH=%JAVA_HOME%\bin;%PATH%
Switch to Java 22 (if needed)
set JAVA_HOME=C:\java
set PATH=%JAVA_HOME%\bin;%PATH%
💡 These settings apply only to the current Command Prompt session.
For a permanent switch, update JAVA_HOME in Environment Variables.
4. Verify the Setup
After switching, check which Java version is active:
java -version
echo %JAVA_HOME%
It should now show Java 11 when set for Hadoop
Hadoop Installation
binary-aarch64: Hadoop compiled for ARM-based processors (e.g., Apple M1/M2, Raspberry Pi, some servers).
binary-lean: A lighter version of Hadoop with fewer dependencies, useful for minimal installations.
binary (default): The standard, full-featured Hadoop binary for most systems (use this one for Windows).
checksum/signature: Files used to verify integrity and ensure the download isn’t corrupted or tampered with.
What to do?
✅ Download the "binary" version (without -aarch64 or -lean) for your Windows setup.
Next Step:
Once the download is complete, extract the ZIP/TAR file to:
C:\Hadoop\Hadoop
Set HADOOP_HOME
Open Command Prompt (Win + R, type sysdm.cpl, and press Enter).
Go to the Advanced tab → Click Environment Variables.
Under System Variables, click New:
Variable Name: HADOOP_HOME
Variable Value: C:\Hadoop\Hadoop
Click OK to save.
✅ Confirm once done, and we’ll move to the next step! 🚀
You said:
done
ChatGPT said:
ChatGPT
Awesome! ✅ Now, let’s add Hadoop’s bin directory to the Path variable.
Step 2: Add Hadoop to System Path
Open Environment Variables again (Win + R → type sysdm.cpl → Enter).
Under System Variables, find and select Path, then click Edit.
Click New and add:
C:\Hadoop\Hadoop\bin
Click OK to save and close all windows.
✅ Confirm once done, and we’ll proceed to verifying the setup! 🚀
Great! ✅ Now, let's verify the Hadoop setup.
Step 3: Verify Hadoop Installation
Open Command Prompt (cmd).
Run the following command to check if Hadoop is recognized:
hadoop version
If everything is set up correctly, it should display the Hadoop version (e.g., Hadoop 3.4.1).
Answer from command prompt is that of the following: -
Hadoop 3.4.1
Source code repository https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1
Compiled by mthakur on 2024-10-09T14:57Z
Compiled on platform linux-x86_64
Compiled with protoc 3.23.4
From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680
This command was run using /C:/Hadoop/Hadoop/share/hadoop/common/hadoop-common-3.4.1.jar
Next Step: Configure Hadoop
Now, we need to set up Hadoop for standalone mode and configure essential files.
Step 4: Configure HADOOP_CONF_DIR
Open Environment Variables (sysdm.cpl).
Under System Variables, click New and add:
Variable Name: HADOOP_CONF_DIR
Variable Value: C:\Hadoop\Hadoop\etc\hadoop
Click OK and close all windows.
Edit core-site.xml
Navigate to:
makefile
CopyEdit
C:\Hadoop\Hadoop\etc\hadoop
Open the file core-site.xml in Notepad or any text editor.
Replace the content with the following configuration:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Step 6: Edit hdfs-site.xml
C:\Hadoop\Hadoop\etc\hadoop
Open hdfs-site.xml in Notepad or any text editor.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\Hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\Hadoop\data\datanode</value>
</property>
</configuration>
The response from the command will be as follows: -
C:\Users\DEV>mkdir C:\Hadoop\data\namenode
A subdirectory or file C:\Hadoop\data\namenode already exists.
C:\Users\DEV>mkdir C:\Hadoop\data\datanode
A subdirectory or file C:\Hadoop\data\datanode already exists.
Edit mapred-site.xml
Navigate to:
makefile
CopyEdit
C:\Hadoop\Hadoop\etc\hadoop
Open mapred-site.xml in Notepad or any text editor.
Make sure it contains this configuration:
xml
CopyEdit
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Save and close the file.
Edit yarn-site.xml
Navigate to:
makefile
CopyEdit
C:\Hadoop\Hadoop\etc\hadoop
Open yarn-site.xml in Notepad or any text editor.
Replace its content with:
xml
CopyEdit
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Save and close the file
Basic commands?
What is a dataframe?
import pandas as pd
# Create a DataFrame (your digital recipe book)
df = pd.DataFrame({
"Dish": ["Pizza", "Burger", "Pasta"],
"Ingredients": ["Cheese, Dough", "Patty, Bun", "Noodles, Sauce"],
"Time to Cook (mins)": [20, 15, 25],
"Calories": [500, 400, 450]
})
# View the DataFrame
print(df)
What is a library? Pandas & matplotlib?
What is a Library?
A library in Python is like a set of specialized kitchen tools for a chef.
Imagine you’re a chef, and you have a regular kitchen with basic tools like knives and pans (Python’s built-in tools).
But what if you want to make fancy dishes, like sushi or soufflé? You’d need special tools like a rice cooker, pastry blender, or sous-vide machine.
In Python, libraries are those specialized tools that help you do specific tasks:
Pandas: Your tool for organizing and managing recipes (data).
Matplotlib: Your tool for presenting your dishes beautifully (visualizations).
What is Pandas (to a Chef)?
Pandas is like your recipe organizer.
It’s your kitchen assistant who helps you keep all your recipes (data) in a neat and tidy table (DataFrame).
You can:
Add recipes (rows of data).
Edit ingredients (columns of data).
Find and filter recipes (querying data).
Sort recipes (organize data by time or calories).
Example for a Chef
Let’s say you’re working with a digital recipe book:
import pandas as pd # Create a DataFrame (recipe book) recipes = pd.DataFrame({ "Dish": ["Pizza", "Burger", "Pasta"], "Calories": [500, 400, 450], "Cooking Time (mins)": [20, 15, 25] }) # View your recipes print(recipes)
Output:
Dish Calories Cooking Time (mins) 0 Pizza 500 20 1 Burger 400 15 2 Pasta 450 25
With Pandas, you can:
Sort by calories:
recipes.sort_values(by="Calories")
Find dishes with less than 20 minutes cooking time:
recipes[recipes["Cooking Time (mins)"] < 20]
Pandas = your digital sous-chef!
What is Matplotlib (to a Chef)?
Matplotlib is like your presentation toolkit.
After you cook your dishes, you want to present them beautifully.
Matplotlib is your tool for:
Making beautiful food presentations (bar charts, pie charts, line charts).
Highlighting which dishes are the most popular or healthiest.
Example for a Chef
Let’s visualize the calories of your dishes:
import matplotlib.pyplot as plt # Dishes and their calories dishes = ["Pizza", "Burger", "Pasta"] calories = [500, 400, 450] # Create a bar chart plt.bar(dishes, calories, color="lightgreen", edgecolor="black") plt.title("Calories in Each Dish") plt.xlabel("Dish") plt.ylabel("Calories") plt.show()
Output:
A bar chart showing the calories of each dish.
Matplotlib = your plating and presentation artist!
Summary
Library: Your set of specialized tools (like kitchen gadgets).
Pandas: Your assistant for managing data (like a digital recipe book).
Matplotlib: Your presentation expert for showcasing data beautifully (like plating dishes).
Reasons of BDAA for Admin Officers?
The burden of Big Data analysis is poised to fall heavily on administrative officers due to the increasing adoption of data-driven decision-making in governance. Training them now can preemptively equip them with the necessary skills to effectively leverage data for public administration. Here are the reasons, examples, and solutions:
Reasons for Increased Burden
Data-Driven Policy Making:
Governments are adopting evidence-based policies to address complex issues like poverty, health, and urban planning.
Example: The Indian government’s Aspirational Districts Programme uses data to track development metrics in backward districts.
Implication: Administrative officers will need to interpret data to ensure accurate tracking and intervention.
Emergence of Digital Infrastructure:
Initiatives like Digital India and Smart Cities Mission generate vast amounts of data through IoT, sensors, and citizen feedback systems.
Example: Smart Cities dashboards provide real-time data on traffic, water usage, and waste management.
Implication: Officers will need to extract actionable insights from these dashboards.
Increased Accountability and Transparency:
Right to Information (RTI) Act demands data transparency, increasing the need for organized data analysis.
Example: Officers managing e-Governance portals must analyze citizen grievance trends and ensure quick redressal.
Implication: Without proper training, officers might struggle with compliance and responsiveness.
Integration of Emerging Technologies:
Technologies like AI, ML, and Big Data analytics are being integrated into government systems for predictive modeling.
Example: AI is used in crop yield predictions to support farmers.
Implication: Officers must understand the basics of these technologies to collaborate effectively with technical teams.
Rapid Expansion of Citizen Databases:
The use of massive databases like Aadhaar, PM Kisan, and Ujjwala Yojana requires in-depth understanding of data handling.
Implication: Officers will increasingly manage, analyze, and safeguard sensitive citizen data.
Examples of Big Data in Governance
Health:
Example: During the COVID-19 pandemic, Big Data helped analyze infection trends, vaccination progress, and resource allocation.
Challenge: Many officers were unprepared to handle the scale and complexity of data during emergencies.
Education:
Example: Data from Diksha Portal and UDISE+ is used to monitor student outcomes and teacher performance.
Challenge: Lack of trained officers delays effective interventions.
Environment:
Example: Pollution monitoring systems in Delhi use data from sensors for air quality management.
Challenge: Interpreting this data to enforce actionable measures requires analytical skills.
Solutions
Comprehensive Training Programs:
Organize workshops on Big Data fundamentals:
Data cleaning and visualization.
Tools like Tableau, Power BI, or Python basics.
Example: ISTM or state administrative institutes can lead these efforts.
Hands-On Practice:
Provide real-world datasets (e.g., rainfall, health indicators) for officers to analyze in training.
Example: Create simulated governance scenarios for decision-making.
Collaborations with Technical Experts:
Foster partnerships with institutions like IITs, NITs, and NIC to upskill officers in data analytics.
Example: Data analytics courses tailored for administrative officers.
Creation of Dedicated Data Teams:
Establish Big Data cells in each department with trained officers supported by IT professionals.
Example: Data-driven cells in Smart City projects.
Policy-Level Integration:
Mandate Big Data proficiency as a required skill in promotion exams for officers.
Example: Incorporate data-related case studies in administrative training.
Incentivizing Continuous Learning:
Offer certifications or rewards for officers excelling in Big Data analysis.
Example: Certification programs on Coursera, or government-recognized courses.
Benefits of Training Now
Proactive Problem Solving: Officers can anticipate trends and take timely actions.
Efficient Resource Allocation: Data insights will improve governance efficiency.
Enhanced Career Progression: Trained officers will be better equipped for senior roles.
Public Trust: Data-driven governance fosters transparency and trust.
Conclusion
Big Data is reshaping governance. By training administrative officers now, the government can ensure smoother transitions into data-driven frameworks, more efficient public service delivery, and better policy outcomes. Investing in their capabilities will have a long-lasting impact on governance quality and citizen satisfaction.
5V of Big Data
Tools of Big Data Analysis?
What is Big Data & Relevancy in Governance?
Definition: Big Data refers to large, complex datasets that cannot be processed using traditional methods.
Characteristics (5 Vs):
Volume: Massive amounts of data (e.g., citizen records, economic data).
Velocity: Data generated at high speed (e.g., real-time traffic data).
Variety: Different types (structured: spreadsheets; unstructured: emails, images).
Veracity: Reliability of data (e.g., social media rumors vs. authentic data).
Value: Insights derived (e.g., better policymaking).
Example:
"Imagine analyzing millions of Aadhaar transactions to find patterns in healthcare access."
Big Data in Governance?
Monitoring welfare schemes (e.g., MNREGA fund usage).
Analyzing traffic data for urban planning.
Detecting fraud in subsidies using data patterns.
Using weather data to prepare for disasters.
Exercise:
Data: Provide a mock dataset of 1,000 citizen transactions with fields like Transaction ID, Amount, Date, Region, and Scheme.
Task: Ask participants to identify anomalies (e.g., unusually high claims in a specific region).
Tools for Big Data Analysis?
Microsoft Excel (basic), Tableau, Hadoop, Spark.
Region, Scheme, Beneficiaries, Funds Utilized (in Lakhs)
North, MNREGA, 5000, 300
South, MNREGA, 4000, 250
East, MNREGA, 4500, 275
West, MNREGA, 4800, 290
Task:
Create a pivot table to summarize total beneficiaries and funds by region.
Identify which region utilized funds most efficiently.
Big Data success in India?
COVID-19 Management: Explain how the Aarogya Setu app used Big Data to track infections and allocate resources.
Example: "Data collected from millions of app users helped in identifying infection clusters and planning containment zones."
Departments in Government of India requiring Big Data Analysis, website or interface to learn about Big Data in Governance, Government of India Initiatives in Big Data Analysis?
https://ndap.niti.gov.in/
Several departments within the Government of India actively utilize big data analytics to enhance decision-making and service delivery:
Ministry of Finance: Employs data analytics to improve tax administration and compliance. Both direct and indirect tax departments utilize big data and AI/ML techniques to make tax administration more effective and taxpayer-friendly.
Ministry of Electronics and Information Technology (MeitY): Under the Digital India programme, MeitY has launched numerous e-services and initiatives that leverage big data to promote innovation and improve governance.
NITI Aayog: Through its Data Management and Analysis Vertical, NITI Aayog addresses issues related to streamlining data usage for public policy. A key initiative is the National Data and Analytics Platform (NDAP), which democratizes access to public government data by making it accessible, interoperable, and interactive.
National Informatics Centre (NIC): Provides data analytics services to various government departments, aiding in the development of e-Government applications and managing the National Knowledge Network.
To learn more about big data in governance and the Government of India's initiatives in big data analysis, the following platforms offer valuable resources:
Open Government Data (OGD) Platform India: A single-point access to datasets, documents, services, tools, and applications published by various government ministries and departments. It facilitates data sharing and promotes innovation over non-personal data.
National Data and Analytics Platform (NDAP): Developed by NITI Aayog, NDAP aims to democratize access to public government data by making it accessible, interoperable, and interactive. It hosts datasets from various government agencies and provides tools for analytics and visualization.
These platforms serve as comprehensive resources for understanding and engaging with the Government of India's big data initiatives.
Several departments within the Government of India actively utilize big data analytics to enhance decision-making and service delivery:
Ministry of Finance: Employs data analytics to improve tax administration and compliance. Both direct and indirect tax departments utilize big data and AI/ML techniques to make tax administration more effective and taxpayer-friendly.
Ministry of Electronics and Information Technology (MeitY): Under the Digital India programme, MeitY has launched numerous e-services and initiatives that leverage big data to promote innovation and improve governance.
NITI Aayog: Through its Data Management and Analysis Vertical, NITI Aayog addresses issues related to streamlining data usage for public policy. A key initiative is the National Data and Analytics Platform (NDAP), which democratizes access to public government data by making it accessible, interoperable, and interactive.
National Informatics Centre (NIC): Provides data analytics services to various government departments, aiding in the development of e-Government applications and managing the National Knowledge Network.
To learn more about big data in governance and the Government of India's initiatives in big data analysis, the following platforms offer valuable resources:
Open Government Data (OGD) Platform India: A single-point access to datasets, documents, services, tools, and applications published by various government ministries and departments. It facilitates data sharing and promotes innovation over non-personal data.
National Data and Analytics Platform (NDAP): Developed by NITI Aayog, NDAP aims to democratize access to public government data by making it accessible, interoperable, and interactive. It hosts datasets from various government agencies and provides tools for analytics and visualization.
These platforms serve as comprehensive resources for understanding and engaging with the Government of India's big data initiatives.
Step wise practical on data by Govt of India?
Access Government Data
Platforms to Access Data:
Open Government Data (OGD) Platform India: data.gov.in
National Data and Analytics Platform (NDAP): ndap.niti.gov.in
Steps:
Visit the website.
Browse or search for datasets by categories (e.g., education, health, finance).
Download datasets in formats like CSV, JSON, or Excel.
Understand the Data
Steps:
Read the metadata or data dictionary provided with the dataset.
Understand key variables, data types, and the context of the dataset.
Clean and preprocess the data for analysis using tools like Excel or Python libraries.
Recommended Tools:
Spreadsheet Software: Microsoft Excel, Google Sheets
Python Libraries: Pandas, NumPy
Explore the Data
Steps:
Perform descriptive statistics to summarize the data.
Create visualizations to identify patterns and trends.
Use tools to filter and query data for specific insights.
Recommended Tools:
Visualization: Tableau Public, Power BI, Matplotlib, Seaborn (Python)
Querying Data: SQL, Python (Pandas)
Perform Advanced Analysis
Steps:
Use statistical methods for deeper insights.
Apply machine learning models for predictions or classifications.
Use geographic data for spatial analysis.
Recommended Tools:
Statistical Analysis: R, Python (Statsmodels, Scikit-learn)
GIS and Spatial Analysis: QGIS, ArcGIS, Google Earth Engine
Generate Insights
Steps:
Compile insights into reports or dashboards.
Share findings with stakeholders or the public.
Recommended Tools:
Dashboards: Power BI, Tableau, Streamlit (Python)
Documenting: Jupyter Notebooks, Google Docs
Step wise practical on Govt of India data?
Download a Dataset
Go to the Open Government Data (OGD) Platform India: https://data.gov.in.
Search for a dataset. For example:
Dataset: "State-wise Monthly Rainfall Data (2004-2020)"
Link: State-wise Rainfall Data
Download the dataset as a CSV file.
Set Up Tools
Use Google Colab (no installation required) or Jupyter Notebook.
Google Colab: Open Google Colab and create a new notebook.
Jupyter Notebook: If installed, launch it from your computer.
Upload the Dataset
Open your notebook in Google Colab or Jupyter.
Upload the CSV file.
Google Colab: Use the upload button in the sidebar or write the code to upload.
from google.colab import files
uploaded = files.upload()
Save the file in your working directory if using Jupyter.
Write Python Code
Here is a complete script to showcase your big data analysis. Copy and paste it into your notebook.
Python Code:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Step 1: Load the dataset
# Replace 'rainfall.csv' with the name of your downloaded file
file_name = "rainfall.csv"
data = pd.read_csv(file_name)
# Step 2: Display the first few rows of the dataset
print("Dataset Overview:")
print(data.head())
# Step 3: Summary of data
print("\nDataset Summary:")
print(data.describe())
# Step 4: Clean the data (e.g., handling missing values)
print("\nChecking for missing values:")
print(data.isnull().sum())
# Dropping rows with missing values
data_cleaned = data.dropna()
# Step 5: Analyze data
# Example: Calculate average annual rainfall per state
average_rainfall = data_cleaned.groupby("State")["Annual Rainfall"].mean()
print("\nAverage Annual Rainfall per State:")
print(average_rainfall)
# Step 6: Visualize the data
# Example: Bar plot of average rainfall
average_rainfall.plot(kind="bar", figsize=(12, 6))
plt.title("Average Annual Rainfall by State")
plt.xlabel("State")
plt.ylabel("Average Rainfall (mm)")
plt.tight_layout()
plt.show()
Explain Your Steps During the Session
Load Data: Show participants how you loaded the dataset into the notebook.
Overview: Explain the dataset structure using head() and describe().
Data Cleaning: Discuss why handling missing values is crucial.
Analysis: Show how you calculated the average rainfall for each state.
Visualization: Explain how the bar chart represents the results
Practice the Steps
Run this practical multiple times before the session to ensure smooth execution.
Familiarize yourself with the dataset and outputs so you can confidently explain the results.
Value and Volume in Big Data?
1. Value in Big Data
Value refers to the usefulness and insights derived from analyzing Big Data. It focuses on how the data can be transformed into meaningful and actionable information, leading to better decision-making, innovation, and business opportunities. Not all data is valuable, so the challenge is to extract value from the massive amounts of data being collected.
Examples of Value in Big Data:
Healthcare: Predictive Analytics for Patient Care
Example: Hospitals collect massive amounts of patient data (medical history, diagnostics, lab results). Using Big Data analytics, healthcare providers can identify patterns in this data to predict patient health outcomes. For instance, by analyzing vital signs and historical records, hospitals can predict when patients might need readmission or when certain complications might arise. This improves patient care and reduces costs.
Value: The value here is derived from saving lives, improving care quality, and reducing hospital readmissions.
E-commerce: Personalized Recommendations
Example: Platforms like Amazon collect vast amounts of data about customer preferences, browsing behavior, and purchase history. By analyzing this data, Amazon can provide personalized product recommendations to customers, increasing sales and improving the user experience.
Value: The value for the business is in increased revenue and customer satisfaction through more targeted marketing and recommendations.
Finance: Fraud Detection
Example: Banks and financial institutions collect data on millions of transactions daily. By analyzing patterns in this data, Big Data tools can detect fraudulent activities in real-time, such as unusual transaction amounts, locations, or behavior.
Value: The value lies in protecting customers from fraud, saving money for the bank, and maintaining trust.
Supply Chain Optimization
Example: A retail company can use Big Data to monitor its entire supply chain in real-time, adjusting inventory levels based on consumer demand, sales data, and delivery patterns.
Value: Optimizing supply chains leads to reduced costs, faster deliveries, and improved customer satisfaction.
The main takeaway is that Value in Big Data is about turning raw data into insights that can help businesses or organizations make smarter decisions.
2. Velocity in Big Data
Velocity refers to the speed at which data is generated, collected, and processed. In Big Data, data is produced at an extremely fast rate, often in real-time, and it needs to be processed quickly to extract timely insights and take action.
Examples of Velocity in Big Data:
Social Media Monitoring (Twitter, Facebook)
Example: Twitter processes around 500 million tweets per day (about 6,000 tweets per second). Social media platforms must analyze data in real-time to detect trends, monitor user behavior, and respond to emerging topics or crises (e.g., viral trends, breaking news).
Velocity: The velocity here is high because the data is being generated and analyzed in real-time, often requiring instant processing to capitalize on trends or prevent the spread of misinformation.
Stock Market Trading
Example: Financial markets generate an enormous amount of data in real-time, with millions of transactions happening every second. Big Data algorithms analyze this information in real-time to enable high-frequency trading (HFT), where traders make buying and selling decisions based on market fluctuations in milliseconds.
Velocity: In this case, velocity is extremely high, as data must be processed in real-time to gain a competitive edge and make split-second financial decisions.
Smart Cities and IoT
Example: In a smart city, IoT sensors collect real-time data about traffic flow, air quality, electricity usage, and water systems. To manage city services efficiently, this data must be analyzed in real-time to adjust traffic lights, optimize energy usage, or predict maintenance needs.
Velocity: The velocity is high because the data is continuously streaming, and decisions must be made in real-time to keep city systems running smoothly.
Real-Time Fraud Detection
Example: Credit card companies like Visa and Mastercard process thousands of transactions per second. To detect and prevent fraud, they need to analyze these transactions in real-time, identifying suspicious activities as they occur.
Velocity: Fraud detection systems rely on high-velocity data processing to identify and stop fraudulent transactions as they happen.
Streaming Services (Netflix, YouTube)
Example: Platforms like Netflix and YouTube collect data in real-time about user behavior, such as what people are watching, how long they stay on the platform, and what content they interact with. This data is analyzed in real-time to provide personalized recommendations and to improve user experience.
Velocity: The velocity is high because data is generated continuously while users are interacting with the platform, requiring immediate processing.
Summary of Value and Velocity in Big Data
Value: Refers to how useful the data is in generating insights that lead to better decision-making and outcomes. Examples include personalized recommendations (e-commerce), predictive analytics (healthcare), and fraud detection (finance).
Velocity: Refers to the speed at which data is generated and processed, often in real-time. Examples include social media monitoring (Twitter), financial trading (stock markets), and IoT data streams (smart cities).
Both Value and Velocity are crucial in Big Data analysis. Value is what you get from data insights, while Velocity is how fast you need to process the data to get those insights in time for them to be useful.
What is Veracity in 5V of Big Data
In the context of the 5 V’s of Big Data, Veracity refers to the quality and reliability of the data. It emphasizes the importance of ensuring that the data being processed is trustworthy, accurate, and meaningful. In Big Data, since data comes from multiple sources and is generated at a high speed, there is often uncertainty or inconsistency regarding its quality. Not all the data collected is accurate, complete, or reliable, which can make analysis difficult.
Explaining "Veracity" in Big Data
Veracity in Big Data highlights:
Data quality: Is the data clean and accurate, or does it have errors, noise, or inconsistencies?
Trustworthiness: Can the data be trusted to make decisions? Is it coming from reliable sources?
Uncertainty: How confident can you be in the insights derived from the data, given its possible flaws or bias?
The veracity of Big Data is often one of the most challenging aspects because inaccurate or low-quality data can lead to faulty conclusions or poor decision-making.
Examples to Illustrate Veracity
Here are some practical examples of Veracity to make the concept clearer:
1. Social Media Data (Low Veracity)
Example: Consider a social media platform like Twitter, where millions of posts are generated every minute. While a lot of valuable insights can be derived from social media data (e.g., public sentiment, trends), not all data is reliable.
Issues: Some posts may contain spam, fake news, irrelevant content, or even misinformation. For example, bots generating fake comments, or biased opinions from people trying to manipulate trends, can affect the trustworthiness of the analysis.
Challenge: The platform needs to filter out this low-quality data (noise) and focus on authentic interactions to derive meaningful insights. This uncertainty about the reliability of user-generated content is an example of low veracity in Big Data.
2. IoT Devices (Sensor Data)
Example: A smart city relies on sensors for monitoring traffic, pollution, weather, and more. These sensors generate large amounts of data, but sometimes, the data might be incomplete or inaccurate.
Issues: Sensors can malfunction, sending inconsistent or corrupted data (e.g., temperature sensors recording faulty readings due to weather conditions). Different sensors might also produce conflicting data for the same location.
Challenge: When analyzing the data, it’s critical to filter out incorrect or inconsistent readings to make accurate decisions. The uncertainty about the correctness of the data captured from IoT devices reflects veracity issues.
3. Customer Data in E-commerce
Example: An online retailer collects data about customers, including purchasing habits, preferences, and demographics.
Issues: Some customers might enter incorrect information (such as wrong email addresses or fake phone numbers). Additionally, duplicate or incomplete records (e.g., customers having multiple accounts or missing information in their profiles) can reduce data quality.
Challenge: Ensuring that the customer data is clean, deduplicated, and accurate is essential to make trustworthy business decisions, like personalized recommendations or targeted marketing. The reliability and accuracy of customer profiles reflect the veracity of the data.
4. Financial Data (High Veracity Needed)
Example: In the financial industry, banks and trading platforms collect vast amounts of data about transactions, stock prices, and economic indicators.
Issues: Small errors in this data (e.g., incorrect transaction records, duplicate entries, or delays in real-time updates) can have major financial consequences. If inaccurate data is used for trading algorithms or investment decisions, it can lead to significant losses.
Challenge: Financial data needs to be precise, accurate, and real-time to ensure reliable decisions. This is why the veracity of data is critical in the financial sector, where even minor errors can lead to large-scale impacts.
Dealing with Veracity Issues
To deal with veracity in Big Data, companies often implement:
Data cleansing techniques: Removing duplicates, correcting errors, and filling in missing information to improve data quality.
Filtering unreliable sources: Detecting and eliminating data from low-trust sources, such as fake accounts or spam.
Consistency checks: Ensuring that data coming from different sources or sensors is consistent with expected ranges or known facts.
For example, in e-commerce, customer data can be cross-checked with third-party services to verify accuracy, and social media platforms may use algorithms to flag or remove misleading or irrelevant content.
Summary of Veracity in Big Data
Veracity in Big Data refers to the quality, reliability, and trustworthiness of the data. It's about ensuring that the data you’re analyzing is accurate and clean because poor-quality data leads to faulty insights. High veracity means data is consistent, error-free, and can be trusted, while low veracity means the data is uncertain, inconsistent, or potentially misleading.
Examples:
Social media data often suffers from low veracity due to misinformation or spam.
IoT sensors can produce inconsistent or faulty readings that need to be filtered.
Customer data in e-commerce may contain errors or duplicates, affecting data quality.
Financial data requires high veracity to avoid costly mistakes in real-time decision-making.
Ensuring high veracity is a critical step in any Big Data project to guarantee accurate and valuable insights.
What is Volume in 5V of Big Data?
In the context of the 5 V’s of Big Data, the Volume refers to the sheer amount of data being generated, stored, and processed. It highlights the vast quantity of data that comes from various sources, often measured in terabytes (TB), petabytes (PB), or even zettabytes (ZB). Volume is often considered the defining characteristic of Big Data, as traditional databases and systems struggle to handle the large scale.
Explaining "Volume" in Big Data
Volume in Big Data means "how much" data is generated. This data can come from a variety of sources:
Social media posts: Every day, platforms like Facebook, Twitter, and Instagram generate billions of posts, likes, comments, and shares.
Sensor data: In a smart city, millions of sensors on traffic lights, buildings, and streets constantly send streams of data about temperature, traffic, air quality, etc.
Transactional data: E-commerce platforms like Amazon process millions of transactions per minute, logging customer details, purchases, reviews, and more.
These examples show the massive volume that Big Data systems handle daily. The critical challenge is storing and processing this data efficiently.
Example to Illustrate Volume
Let’s say you’re explaining the Volume aspect to someone:
Example 1: Social Media (Facebook)
Volume: Facebook generates 4 petabytes of data daily, including likes, shares, posts, photos, and videos.
Explanation: Imagine how much data that is—it's equivalent to 1 million HD movies being stored every day. Traditional databases wouldn’t be able to handle this volume, but Big Data systems like Hadoop or Spark can process such a massive amount of data efficiently.
Example 2: E-commerce (Amazon)
Volume: Amazon generates around 1 million transactions per second during peak times, producing terabytes of customer purchase data, product reviews, recommendations, and delivery information.
Explanation: To make it relatable, you could say that in one minute, Amazon handles as much data as a small company might handle in an entire year!
Example 3: Internet of Things (IoT)
Volume: In a smart city, IoT devices (smart meters, traffic sensors, air quality monitors) generate terabytes of data per day. Each sensor continuously sends data on energy usage, vehicle movement, or environmental conditions.
Explanation: Think of this as the city "talking" to itself—millions of sensors collecting information simultaneously, and all this data needs to be processed in real-time.
Summary of Volume in Big Data
The volume in Big Data refers to the massive amount of data generated by various digital processes, devices, and interactions. As technology advances, this volume continues to grow exponentially.
For citing examples of volume:
Social media generates 4 petabytes of data daily.
E-commerce platforms like Amazon handle 1 million transactions per second.
Smart cities produce terabytes of sensor data every day from IoT devices.
These examples clearly demonstrate how the volume of Big Data is far beyond the capabilities of traditional data systems, necessitating specialized tools and techniques for managing such enormous datasets.
Practical on Big Data Analysis?
To define Big Data Analysis and perform a practical demonstration of it on any computer with an internet connection, you can leverage cloud-based services, open-source tools, and datasets available online. Here’s how you can approach this:
1. Defining Big Data Analysis
Big Data Analysis refers to the process of examining large and complex datasets, often involving millions or billions of records. These datasets are so large that traditional data processing tools struggle to handle them. Big Data Analysis involves extracting useful information from this data to identify patterns, trends, correlations, and insights. It typically involves:
Volume: Huge amounts of data.
Variety: Different types of data (text, video, images, structured/unstructured data).
Velocity: Speed at which data is generated and processed.
Veracity: Trustworthiness of data.
Value: Insights and actionable information extracted from the data.
2. Tools and Technologies for Big Data Analysis
To perform a practical Big Data analysis, you can use the following tools and coding environments that are accessible via the internet:
a) Python and Libraries for Big Data Analysis
Python is a versatile language for Big Data analysis with its rich ecosystem of libraries.
Pandas: For data manipulation and analysis.
NumPy: For numerical data processing.
Dask: For parallel computing and handling larger-than-memory datasets.
PySpark: Python API for Apache Spark, which is a popular distributed computing framework.
Matplotlib/Seaborn: For data visualization.
You can run Python code locally or via cloud-based environments (e.g., Google Colab or Jupyter Notebooks).
b) Apache Spark (via PySpark)
Apache Spark is one of the most powerful tools for Big Data processing. It's fast, and built for distributed computing. You can use PySpark to work with Spark in Python. This is suitable for analyzing huge datasets using a distributed computing model.
You can use cloud services like Google Colab, which allows running PySpark code without needing local setup.
c) Cloud Platforms
If you want to explore more serious Big Data analysis:
Google Cloud (BigQuery): Offers a managed data warehouse with massive scale and SQL-based querying.
Amazon Web Services (AWS S3 + EMR): Store data in S3 and use Elastic MapReduce for Big Data analysis using Hadoop or Spark.
Azure Data Lake Analytics: For scalable cloud-based analysis.
3. Performing a Practical Big Data Analysis
Here’s how you can run a Big Data Analysis practical using publicly available datasets and tools accessible from any internet-connected computer.
Step 1: Get a Dataset
You need a large dataset to define Big Data. Some useful sources:
Kaggle: Free datasets related to various domains.
Example: Kaggle Datasets
Google Dataset Search: Search engine for finding large datasets.
Open Data Portals: Websites like data.gov or European Union Open Data Portal provide free public datasets.
For example, you could use:
New York City Taxi Trip Data: Over 1 billion records of taxi rides in NYC.
Global Temperature Data: Huge datasets for climate research.
COVID-19 Data: Large-scale, globally collected datasets.
Step 2: Set Up an Environment (Google Colab for Example)
Go to Google Colab. This is a free, cloud-based Python environment with access to many Big Data libraries.
Install necessary libraries (Pandas, PySpark, Dask, etc.).
Step 3: Analyze the Data
Here’s an example of Big Data analysis using PySpark in Google Colab.
Install PySpark on Colab:
bash
Copy code
!pip install pyspark
Load and Analyze Data (using NYC taxi data as an example):
python
Copy code
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.appName("BigDataAnalysis").getOrCreate()
# Load data (assuming NYC taxi data is available as a CSV on cloud storage or local drive)
df = spark.read.csv("/path-to-your-nyc-taxi-data.csv", header=True, inferSchema=True)
# Show data schema
df.printSchema()
# Example: Calculate average trip distance
df.select('trip_distance').summary('mean').show()
Run Queries and Visualize
Filter data based on conditions (e.g., find all trips over 10 miles).
Visualize patterns such as trip duration vs distance using matplotlib/seaborn.
Summarize Findings: You can summarize the findings in terms of trends or anomalies discovered, such as peak travel times, average fare per distance, or other insights.
Step 4: Interpretation and Reporting
After running the practical analysis, you can report:
The size and variety of the dataset (to demonstrate the "Big" part of Big Data).
The velocity at which insights were derived using PySpark.
The veracity of the data and whether any cleaning was required.
The value in terms of insights (e.g., how the data analysis can help optimize services like taxis or resource allocation).
Practical can be done on elements of data analytics (data source, data lake, data processing, data analysis, data interpretation and its use) with example and resources
1. Data Source
The data source is where the data originates. It can come from multiple sources like databases, APIs, sensors, or publicly available datasets.
Example: NYC Taxi Dataset
Source: The NYC Taxi & Limousine Commission provides an open dataset that records every trip made by a taxi in NYC. You can download data from NYC Taxi Data.
Practical Task:
Go to the website and download a month's worth of NYC Taxi data (e.g., January 2023).
You can also directly import it into Google Colab:
python
Copy code
!wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet
2. Data Lake
A data lake is a centralized repository that allows you to store structured and unstructured data at scale. A popular tool for this is AWS S3, but you can also use Google Cloud Storage or even local storage on your system or Google Colab environment for simplicity.
Example with Google Colab or AWS S3:
Local Data Lake (Colab): The dataset you downloaded can be considered part of a local "data lake" in your Google Colab environment.
AWS S3 for Data Lake:
Create an AWS S3 bucket.
Upload the NYC Taxi dataset (or other datasets).
Use AWS CLI or Boto3 to access the data from your data lake programmatically.
python
Copy code
import boto3
# Access your AWS S3 bucket
s3 = boto3.resource('s3')
bucket = s3.Bucket('your-bucket-name')
# Download the dataset
bucket.download_file('your-file.csv', 'local-filename.csv')
3. Data Processing
Data Processing is the step where raw data is cleaned and transformed into a more usable format. This typically involves removing errors, missing values, and outliers, as well as transforming the data into formats that are more suitable for analysis.
Example: Data Cleaning and Transformation with Pandas:
Data Cleaning: Use Pandas to handle missing data and remove irrelevant columns from the dataset.
python
Copy code
import pandas as pd
# Load the dataset
df = pd.read_parquet("yellow_tripdata_2023-01.parquet")
# Check for missing data
print(df.isnull().sum())
# Remove missing values
df_cleaned = df.dropna()
# Remove irrelevant columns
df_cleaned = df_cleaned[['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'passenger_count', 'trip_distance', 'total_amount']]
Data Transformation: Convert columns (e.g., timestamps) into useful formats and create new columns if necessary (e.g., trip duration).
python
Copy code
# Convert pickup and dropoff times to datetime
df_cleaned['pickup_datetime'] = pd.to_datetime(df_cleaned['tpep_pickup_datetime'])
df_cleaned['dropoff_datetime'] = pd.to_datetime(df_cleaned['tpep_dropoff_datetime'])
# Create a new column for trip duration in minutes
df_cleaned['trip_duration'] = (df_cleaned['dropoff_datetime'] - df_cleaned['pickup_datetime']).dt.total_seconds() / 60
4. Data Analysis
Data Analysis involves applying statistical and machine learning techniques to discover patterns or trends in the data.
Example: Basic Descriptive Statistics and Visualizations:
Descriptive Statistics: Calculate basic statistics such as mean, median, and distribution of trips and fares.
python
Copy code
# Descriptive statistics for trip distance and total amount
df_cleaned[['trip_distance', 'total_amount']].describe()
Data Visualization: Use Matplotlib and Seaborn to visualize data, like the distribution of trip distances or total fare amounts.
python
Copy code
import matplotlib.pyplot as plt
import seaborn as sns
# Plot distribution of trip distances
plt.figure(figsize=(10,6))
sns.histplot(df_cleaned['trip_distance'], bins=50, kde=False)
plt.title('Distribution of Trip Distances')
plt.xlabel('Trip Distance (miles)')
plt.ylabel('Frequency')
plt.show()
Advanced Analysis: You can apply machine learning models to predict the fare based on trip distance, passenger count, etc., using libraries like Scikit-Learn.
5. Data Interpretation
Data Interpretation involves understanding the results from the data analysis and drawing meaningful conclusions. This step is key to turning raw data into insights that can drive decision-making.
Example: Insights from NYC Taxi Data:
Insight 1: Discover the busiest hours for NYC taxis based on pickup times.
python
Copy code
df_cleaned['hour'] = df_cleaned['pickup_datetime'].dt.hour
hourly_trips = df_cleaned.groupby('hour').size()
hourly_trips.plot(kind='bar', figsize=(10,6), title='Number of Taxi Trips per Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Trips')
plt.show()
Interpretation: The plot might show a peak during rush hours (8-9 AM, 5-6 PM), indicating the busiest periods for taxi drivers.
Insight 2: Determine the correlation between trip distance and total fare.
python
Copy code
# Calculate correlation
correlation = df_cleaned[['trip_distance', 'total_amount']].corr()
print("Correlation between trip distance and total fare:\n", correlation)
Interpretation: A positive correlation would suggest that longer trips tend to generate higher fares, but analyzing outliers (like very short trips with high fares) can offer deeper insights.
6. Data Use
Data Use refers to applying the insights to solve real-world problems or improve decision-making. This could be in business strategy, operations, customer service, etc.
Example: Taxi Fleet Optimization:
Problem: A taxi company might want to optimize its fleet based on the busiest times and locations.
Use Case: Using the insights from the analysis, the company could:
Increase the number of taxis available during peak hours.
Position taxis near high-demand drop-off locations.
Adjust fare pricing dynamically based on demand.
Sharing the Insights:
You can export your cleaned dataset or analysis to a CSV for sharing or further analysis.
python
Copy code
# Save the processed and analyzed data to a CSV
df_cleaned.to_csv('processed_nyc_taxi_data.csv', index=False)
# Download the CSV file
from google.colab import files
files.download('processed_nyc_taxi_data.csv')
Resources:
Here are some useful resources for each stage:
Data Sources:
NYC Taxi Data: NYC Taxi & Limousine Commission Data
Kaggle Datasets: Kaggle
Data Lake:
AWS S3: Amazon S3
Google Cloud Storage: Google Cloud Storage
Data Processing:
Pandas Documentation: Pandas
AWS Glue (Data Processing Service): AWS Glue
Data Analysis:
Matplotlib Documentation: Matplotlib
Seaborn Documentation: Seaborn
Data Interpretation & Use:
Data Science for Business: Data Science for Business Book
By following these steps, you can practically demonstrate the entire Data Analytics Pipeline with an example dataset. You will have touched on all key elements, from sourcing the data to drawing valuable insights and making data-driven decisions.
4o
My SQL coding
SQL Datatypes
CHAR(50)
VARCHAR(50)
BLOB(1000)
INT
TINYINT
BIGINT
BIT
FLOAT
DOUBLE DECIMAL NUMBER
BOOLEAN
DATE = YYYY-MM-DD
YEAR
Lookup functions, count functions and customisation of functions in data analysis
To explain Lookup functions, Count functions, and the Customization of functions in data analysis with practical examples, you can use a spreadsheet tool (e.g., Google Sheets, Excel) or a Python-based environment (e.g., Google Colab, Jupyter Notebook). Below is a step-by-step guide on how you can practically explain each of these concepts:
1. Lookup Functions: VLOOKUP, HLOOKUP, and Python Pandas Equivalent
Lookup functions allow you to search for specific data in a table or range and return a corresponding value from another column or row. These functions are extremely useful when you need to extract data based on specific criteria.
Example in Excel or Google Sheets:
VLOOKUP (Vertical Lookup): This searches for a value in the first column of a range and returns a value in the same row from another column.
Formula:
scss
Copy code
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Example: Imagine you have a table with Employee IDs in Column A and Employee Names in Column B. To find the name of the employee with ID 102, you can use:
php
Copy code
=VLOOKUP(102, A1:B10, 2, FALSE)
HLOOKUP (Horizontal Lookup): Similar to VLOOKUP, but it searches for a value in the top row and returns a value in the same column from another row.
Practical Example in Python using Pandas:
In Python, you can achieve similar functionality using Pandas.
python
Copy code
import pandas as pd
# Create a sample dataset (similar to Excel table)
data = {'Employee_ID': [101, 102, 103, 104],
'Employee_Name': ['Alice', 'Bob', 'Charlie', 'Diana']}
# Convert to DataFrame
df = pd.DataFrame(data)
# Use a lookup function to find the employee name for Employee ID 102
employee_id = 102
employee_name = df.loc[df['Employee_ID'] == employee_id, 'Employee_Name'].values[0]
print(f'Employee with ID {employee_id} is {employee_name}')
This code will print:
csharp
Copy code
Employee with ID 102 is Bob
2. Count Functions: COUNT, COUNTIF, COUNTIFS in Excel/Sheets and Python Equivalent
Count functions are used to count the number of cells that contain data, meet certain criteria, or match specific conditions.
Example in Excel or Google Sheets:
COUNT: Counts the number of cells that contain numbers.
scss
Copy code
=COUNT(A1:A10)
COUNTIF: Counts the number of cells that meet a specific criterion.
less
Copy code
=COUNTIF(A1:A10, ">5")
COUNTIFS: Counts the number of cells that meet multiple criteria.
less
Copy code
=COUNTIFS(A1:A10, ">5", B1:B10, "<10")
Practical Example in Python using Pandas:
You can achieve the same functionality in Pandas.
python
Copy code
# Sample dataset
data = {'Product': ['A', 'B', 'A', 'B', 'C', 'A'],
'Sales': [5, 15, 10, 5, 25, 10]}
df = pd.DataFrame(data)
# COUNT: Total number of sales entries
count_sales = df['Sales'].count()
print(f"Total Sales Entries: {count_sales}")
# COUNTIF: Number of sales greater than 10
countif_sales = (df['Sales'] > 10).sum()
print(f"Sales > 10: {countif_sales}")
# COUNTIFS: Number of products 'A' with sales > 5
countifs = len(df[(df['Product'] == 'A') & (df['Sales'] > 5)])
print(f"Product A with Sales > 5: {countifs}")
Output:
yaml
Copy code
Total Sales Entries: 6
Sales > 10: 2
Product A with Sales > 5: 2
3. Customization of Functions: Custom Functions in Excel and Python
Custom functions allow you to create your own logic or operation. In Excel, you can use VBA (Visual Basic for Applications) for complex custom functions. In Python, you can define functions with custom logic using the def keyword.
Custom Function in Excel:
In Excel, you can create custom functions with VBA:
Open Excel.
Press Alt + F11 to open the VBA editor.
Insert a new Module.
Write a simple custom function, for example, to calculate the square of a number.
vba
Copy code
Function SquareNumber(num As Double) As Double
SquareNumber = num * num
End Function
Now, you can use =SquareNumber(A1) in your Excel sheet.
Custom Function in Python:
In Python, you can easily define a custom function. For example, to create a custom function that calculates the discounted price of a product:
python
Copy code
# Custom function to calculate discount
def calculate_discount(price, discount_percentage):
discount_amount = price * (discount_percentage / 100)
discounted_price = price - discount_amount
return discounted_price
# Test the function
price = 100
discount_percentage = 20
print(f"Discounted Price: {calculate_discount(price, discount_percentage)}")
Output:
yaml
Copy code
Discounted Price: 80.0
You can create custom functions for any business logic, such as calculating tax, applying conditions to data, or even complex transformations.
Practical Steps to Demonstrate These in a Session:
Start with Lookup Functions:
Open Google Sheets or Excel.
Load a dataset and demonstrate VLOOKUP by finding a value.
Explain how Python's Pandas library can perform similar lookup tasks with .loc().
Move to Count Functions:
Show how to use COUNT, COUNTIF, and COUNTIFS in Excel/Sheets.
In Python, demonstrate how to count entries and apply conditional counts using Pandas.
Custom Functions:
Show how to create custom functions in Excel using VBA.
Then, define custom functions in Python and explain how it allows you to create reusable code.
Resources:
Google Sheets: Google Sheets
Excel VBA Custom Functions: Excel VBA Documentation
Python Pandas Documentation: Pandas
Google Colab: Google Colab
These practical examples give users hands-on experience with common data analysis functions while showing how to transition between spreadsheet-based analysis (Excel, Google Sheets) and Python-based analysis (Pandas).
Is excel a part of big data analysis, if not then why above question was explained through excel
Excel is not typically considered a primary tool for Big Data Analysis because it has limitations in handling large datasets, scalability, and advanced processing capabilities. Big Data refers to massive datasets that cannot be processed or analyzed using traditional data tools, like Excel, which struggles when the data size exceeds its row limit (approximately 1 million rows).
However, Excel is widely used for basic data analysis and small to medium-sized datasets in businesses and educational settings. The reason I explained your earlier question using Excel is because:
Excel is Familiar: Many people, especially in non-technical fields or those beginning data analysis, are familiar with Excel or Google Sheets. It's easier to understand fundamental data analysis concepts like lookup functions, count functions, and custom functions in Excel because of its simplicity and visual interface.
Demonstrating Concepts: The concepts of lookup, count, and customization of functions can be effectively demonstrated in Excel for teaching purposes. These concepts are also transferable to more advanced tools like Python and SQL when you're ready to scale up to Big Data environments.
Transitional Learning: By showing how certain analysis functions work in Excel, which is familiar, I can help build a bridge to more advanced tools used for Big Data (like Python, Pandas, Spark, etc.). Many beginners in data analysis start with Excel and gradually move to advanced programming languages and platforms.
Why Excel is Not Ideal for Big Data Analysis:
Data Size Limitations: Excel can handle only up to ~1 million rows, but Big Data usually involves millions or even billions of rows.
Processing Power: Excel lacks the computational power and speed to efficiently process large volumes of data. Tools like Hadoop, Spark, or databases like Google BigQuery are used in Big Data environments for their distributed processing capabilities.
Scalability and Automation: Big Data tools are designed to handle data across distributed systems and can easily scale. Excel doesn’t have the same level of automation or scalability.
When to Transition from Excel to Big Data Tools:
If your data exceeds Excel's limitations, or you're dealing with datasets that involve complex processing or advanced analytics (e.g., machine learning, real-time data streaming), it’s time to transition to Big Data platforms such as:
Python & Pandas: For advanced data processing.
SQL/NoSQL Databases: For querying large datasets.
Hadoop/Spark: For distributed data processing.
Cloud Platforms: AWS S3, Google BigQuery, or Azure for large data storage and analysis.
Conclusion:
While Excel is useful for small-scale data analysis, it's not suitable for Big Data. The earlier explanation was framed around Excel because the question related to fundamental data analysis concepts, which can be more easily understood and visualized in a simpler tool like Excel before moving on to more complex Big Data environments.
Exercise on Rainfall pattern created?
- Go to the website: https://data.gov.in
- Search for a dataset. For simplicity, let’s use the following dataset:
- Dataset Name: State-wise Monthly Rainfall Data (2004-2020)
- Direct Link to Download: Rainfall Dataset
- Download the Dataset:
- Click the Download button on the page.
- Save the file on your computer.
- The file name will likely be rainfall.csv.
# Upload the fileuploaded = files.upload() import pandas as pd # Importing the Pandas library file_name = "D67. Rainfall Monthly Total_1_0.csv"data = pd.read_csv(file_name)
print("Dataset Overview:")print(data.head()) Step 4: Check for Missing Values and Dataset StructureThis step ensures that the data is clean and ready for analysis.
- Copy and paste the following code into a new cell in Google Colab:
# Check for missing values in the datasetprint("\nMissing Values in Each Column:")print(data.isnull().sum()) # Shows the count of missing values for each column
# Display the total number of rows and columns in the datasetprint("\nDataset Dimensions:")print(f"Rows: {data.shape[0]}, Columns: {data.shape[1]}")
What to Expect
- A list of all column names in the dataset.
- A count of missing values in each column (if any).
- The number of rows and columns in the dataset.
Step 5: Clean the Data (Handle Missing Values)If there are missing values in the dataset, we need to handle them. We’ll remove rows with missing values for simplicity.
- Copy and paste the following code into a new cell in Google Colab:
# Confirm that missing values have been removedprint("Missing Values After Cleaning:")print(data_cleaned.isnull().sum())
# Display the first few rows of the cleaned datasetprint("\nCleaned Dataset Overview:")print(data_cleaned.head()) What This Code Does
- Removes rows with missing values using data.dropna().
- Confirms that there are no missing values left.
- Displays the first few rows of the cleaned dataset for verification.
Step 6: Analyze the Data (Calculate Average Rainfall by State)This step involves grouping the data by state and calculating the average annual rainfall for each state.
- Copy and paste the following code into a new cell in Google Colab:
# Display the resultsprint("Average Annual Rainfall by State:")print(average_rainfall) Step 6 (Revised): Check Column Names
- First, let’s confirm the exact column names in the dataset. Run the following code to display all the column names:
- Check the output carefully. Look for the column names that correspond to State and Annual Rainfall. For example, they could be:
- "State" might actually be " State" or "STATE".
- "Annual Rainfall" might be "Annual_Rainfall" or "Rainfall (mm)".
# Display the resultsprint("Average Annual Rainfall by State:")print(average_rainfall)
For instance, if the columns are " State" and "Rainfall (mm)", your code will look like this:
average_rainfall = data_cleaned.groupby(" State")["Rainfall (mm)"].mean()print("Average Annual Rainfall by State:")print(average_rainfall)
Step 6 (Revised): Calculate Average Rainfall
- Copy and paste the following code into a new cell:
# Display the resultprint("Average Rainfall (across all months and years):")print(f"{average_rainfall:.2f} mm")
What This Code Does
- Takes the 'Total_rainfall' column and calculates the mean value.
- Displays the average rainfall rounded to 2 decimal places.
Step 7: Visualize Rainfall Trend Over Time
- Copy and paste the following code into a new cell in Google Colab:
# Convert 'Month-Year' to datetime format for better visualizationdata_cleaned['Month-Year'] = pd.to_datetime(data_cleaned['Month-Year'])
# Plotting the rainfall trendplt.figure(figsize=(12, 6))plt.plot(data_cleaned['Month-Year'], data_cleaned['Total_rainfall'], marker='o', linestyle='-', color='b')
# Adding title and labelsplt.title("Rainfall Trend Over Time", fontsize=16)plt.xlabel("Month-Year", fontsize=12)plt.ylabel("Total Rainfall (mm)", fontsize=12)
# Improving x-axis readabilityplt.xticks(rotation=45)plt.grid(True)plt.tight_layout()
# Show the plotplt.show()
What This Code Does
- Converts the 'Month-Year' column into a datetime format so that it can be plotted on the x-axis.
- Creates a line plot showing how rainfall varies over time.
- Adds markers, labels, and a title to make the visualization informative.
Step 1: Check the 'Month-Year' ColumnWe need to inspect the 'Month-Year' column to see what it contains.
- Run this code to display unique values in the 'Month-Year' column:
2. Check the output:
- Look for invalid, corrupted, or strange date formats (e.g., Jan-11 or empty strings).
- Confirm if all values are consistent (e.g., YYYY-MM or Month-Year).
Step 2: Fix or Format the 'Month-Year' ColumnIf the dates are in a format like Jan-11, we need to specify the correct date format. Use this code# Convert 'Month-Year' to datetime format, assuming it's in 'MMM-YY' (e.g., Jan-11)data_cleaned['Month-Year'] = pd.to_datetime(data_cleaned['Month-Year'], format='%b-%y', errors='coerce')
# Check if any invalid dates existprint("Invalid dates after conversion:")print(data_cleaned[data_cleaned['Month-Year'].isna()])
# Drop rows with invalid datesdata_cleaned = data_cleaned.dropna(subset=['Month-Year'])print("Conversion successful!")
Explanation of the Code
- format='%b-%y': Tells Python to interpret the dates as abbreviated month and two-digit year (e.g., Jan-11 = January 2011).
- errors='coerce': Replaces invalid date entries with NaT (Not a Time).
- dropna(subset=['Month-Year']): Removes rows where the date could not be converted.
Next step
data_cleaned = data_cleaned.dropna(subset=['Month-Year']) Step 8: Visualize the Rainfall TrendNow, plot the data to see how rainfall trends over time.
# Plotting the rainfall trendplt.figure(figsize=(12, 6))plt.plot(data_cleaned['Month-Year'], data_cleaned['Total_rainfall'], marker='o', linestyle='-', color='b')
# Adding title and labelsplt.title("Rainfall Trend Over Time", fontsize=16)plt.xlabel("Month-Year", fontsize=12)plt.ylabel("Total Rainfall (mm)", fontsize=12)
# Improving x-axis readabilityplt.xticks(rotation=45)plt.grid(True)plt.tight_layout()
# Show the plotplt.show()
Exercise on Rainfall executed?
from google.colab import files
uploaded = files.upload()
D67. Rainfall Monthly Total_1_0.csv(text/csv) - 1513 bytes, last modified: 11/30/2024 - 100% done
Saving D67. Rainfall Monthly Total_1_0.csv to D67. Rainfall Monthly Total_1_0 (1).csvimport pandas as pd
file_name = "D67. Rainfall Monthly Total_1_0.csv"
data = pd.read_csv(file_name)
print("Dataset Overview:")
print(data.head())
Dataset Overview: Month-Year Total_rainfall0 Jan-11 0.01 Feb-11 14.02 Mar-11 0.03 Apr-11 0.04 May-11 0.0
print("Column Names in the Dataset:")
print(data.columns)
Column Names in the Dataset:Index(['Month-Year', 'Total_rainfall'], dtype='object')
print("\nMissing Values in Each Column:")
print(data.isnull().sum())
Missing Values in Each Column:Month-Year 0Total_rainfall 0dtype: int64
print("\nDataset Dimensions:")
print(f"Rows: {data.shape[0]}, Columns: {data.shape[1]}")
Dataset Dimensions:Rows: 132, Columns: 2
data_cleaned = data.dropna()
print("Missing Values After Cleaning:")
print(data_cleaned.isnull().sum())
Missing Values After Cleaning:Month-Year 0Total_rainfall 0dtype: int64
print("\nCleaned Dataset Overview:")
print(data_cleaned.head())
Cleaned Dataset Overview: Month-Year Total_rainfall0 Jan-11 0.01 Feb-11 14.02 Mar-11 0.03 Apr-11 0.04 May-11 0.0
print("Column Names in the Dataset:")
print(data_cleaned.columns)
Column Names in the Dataset:Index(['Month-Year', 'Total_rainfall'], dtype='object')
average_rainfall = data_cleaned["Total_rainfall"].mean()
print("Average Rainfall (across all months and years):")
print(f"{average_rainfall:.2f} mm")
Average Rainfall (across all months and years):57.00 mm
import matplotlib.pyplot as plt
data_cleaned['Month-Year'] = pd.to_datetime(data_cleaned['Month-Year'], format='%b-%y', errors='coerce')
print("Invalid dates after conversion:")
print(data_cleaned[data_cleaned['Month-Year'].isna()])
Invalid dates after conversion:Empty DataFrameColumns: [Month-Year, Total_rainfall]Index: []
print("Invalid dates after conversion:")
print(data_cleaned[data_cleaned['Month-Year'].isna()])
Invalid dates after conversion:Empty DataFrameColumns: [Month-Year, Total_rainfall]Index: []
data_cleaned = data_cleaned.dropna(subset=['Month-Year'])
print("Conversion successful!")
Conversion successful!
plt.figure(figsize=(12, 6))
plt.plot(data_cleaned['Month-Year'], data_cleaned['Total_rainfall'], marker='o', linestyle='-', color='b')
plt.title("Rainfall Trend Over Time", fontsize=16)
plt.xlabel("Month-Year", fontsize=12)
plt.ylabel("Total Rainfall (mm)", fontsize=12)
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
Exercise on Rainfall executed 2.0?
from google.colab import files
uploaded = files.upload()
D67. Rainfall Monthly Total_1_0.csv(text/csv) - 1513 bytes, last modified: 11/30/2024 - 100% done
Saving D67. Rainfall Monthly Total_1_0.csv to D67. Rainfall Monthly Total_1_0.csvimport pandas as pd
file_name ="D67. Rainfall Monthly Total_1_0.csv"
data = pd.read_csv(file_name)
print("Dataset Overview:")
print(data.head())
Dataset Overview:
Month-Year Total_rainfall
0 Jan-11 0.0
1 Feb-11 14.0
2 Mar-11 0.0
3 Apr-11 0.0
4 May-11 0.0
print("Column Names in the Dataset:")
print(data.columns)
Column Names in the Dataset:
Index(['Month-Year', 'Total_rainfall'], dtype='object')
print("\nMissing Values in Each Column:")
print(data.isnull().sum())
Missing Values in Each Column:
Month-Year 0
Total_rainfall 0
dtype: int64
print("\nDataset Dimensions:")
print(f"Rows: {data.shape[0]}, Columns: {data.shape[1]}")
Dataset Dimensions:
Rows: 132, Columns: 2
data_cleaned = data.dropna()
print("Missing Values After Cleaning:")
print(data_cleaned.isnull().sum())
Missing Values After Cleaning:
Month-Year 0
Total_rainfall 0
dtype: int64
print("\nCleaned Dataset Overview:")
print(data_cleaned.head())
Cleaned Dataset Overview:
Month-Year Total_rainfall
0 Jan-11 0.0
1 Feb-11 14.0
2 Mar-11 0.0
3 Apr-11 0.0
4 May-11 0.0
print("Column Names in the Dataset:")
print(data_cleaned.columns)
Column Names in the Dataset:
Index(['Month-Year', 'Total_rainfall'], dtype='object')
average_rainfall = data_cleaned["Total_rainfall"].mean()
print("Average Rainfall (across all months and years):")
print(f"{average_rainfall:.2f} mm")
Average Rainfall (across all months and years):
57.00 mm
import matplotlib.pyplot as plt
data_cleaned['Month-Year'] = pd.to_datetime(data_cleaned['Month-Year'], format='%b-%y', errors='coerce')
print("Invalid dates after conversion:")
print(data_cleaned[data_cleaned['Month-Year'].isna()])
print("Invalid dates after conversion:")
print(data_cleaned[data_cleaned['Month-Year'].isna()])
Invalid dates after conversion:
Empty DataFrame
Columns: [Month-Year, Total_rainfall]
Index: []
data_cleaned = data_cleaned.dropna(subset=['Month-Year'])
print("Conversion successful!")
Conversion successful!
plt.figure(figsize=(12,6))
plt.plot(data_cleaned['Month-Year'], data_cleaned['Total_rainfall'], marker='o', linestyle='-', color='b')
plt.title("Rainfall Trend Over Time", fontsize=16)
plt.xlabel("Month-Year", fontsize=12)
plt.ylabel("Total Rainfall (mm)", fontsize=12)
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
Exercise on Month, Region, Sales, Profit?
Step 1: Create a Custom CSV File
Copy and paste the following data into a plain text editor (e.g., Notepad):
Month,Region,Sales,Profit
January,North,20000,5000
January,South,18000,4500
January,East,22000,6000
January,West,15000,3000
February,North,25000,7000
February,South,20000,5000
February,East,27000,8000
February,West,16000,3500
March,North,30000,9000
March,South,22000,5500
March,East,29000,8500
March,West,17000,4000
2. Save it as a CSV file:
File Name: sales_data.csv
Step 2: Upload the CSV File to Google Colab
from google.colab import files
# Upload the CSV file
uploaded = files.upload()
What to Expect
Once uploaded, Colab will display:
Uploading sales_data.csv
Saving sales_data.csv to sales_data.csv
Step 3: Load and Display the Dataset
import pandas as pd # Importing Pandas for data handling
# Load the uploaded CSV file into a Pandas DataFrame
file_name = "sales_data.csv"
data = pd.read_csv(file_name)
# Display the first few rows of the dataset
print("Dataset Overview:")
print(data.head())
What to Expect
You should see the first 5 rows of the dataset, with columns Month, Region, Sales, and Profit.
Step 4: Visualize Sales by Region
import matplotlib.pyplot as plt
# Group the data by Region and sum the Sales
region_sales = data.groupby("Region")["Sales"].sum()
# Plot a bar chart
plt.figure(figsize=(8, 6))
region_sales.plot(kind="bar", color="skyblue", edgecolor="black")
# Add title and labels
plt.title("Total Sales by Region", fontsize=16)
plt.xlabel("Region", fontsize=12)
plt.ylabel("Total Sales (in $)", fontsize=12)
# Show the plot
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
What to Expect
A bar chart showing total sales for each region (North, South, East, West).
Step 5: Visualize Profit Trends by Month
# Group the data by Month and sum the Profit
month_profit = data.groupby("Month")["Profit"].sum()
# Sort the months in calendar order
month_order = ["January", "February", "March"]
month_profit = month_profit.reindex(month_order)
# Plot a line chart
plt.figure(figsize=(8, 6))
month_profit.plot(kind="line", marker='o', color="green", linewidth=2)
# Add title and labels
plt.title("Profit Trends by Month", fontsize=16)
plt.xlabel("Month", fontsize=12)
plt.ylabel("Total Profit (in $)", fontsize=12)
# Show the plot
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
What This Code Does
Groups the data by Month and calculates the total Profit for each month.
Sorts the months in the correct calendar order (January → February → March).
Creates a line chart with green markers and lines to represent the profit trend.
Exercise on Vehicle Pollution data?
Step 1: Create a Dataset
Open Notepad and paste the following data:
Day,Intersection,Vehicles Counted,Average Speed
Monday,Intersection A,1200,45
Monday,Intersection B,800,50
Monday,Intersection C,1500,40
Tuesday,Intersection A,1300,42
Tuesday,Intersection B,900,48
Tuesday,Intersection C,1400,38
Wednesday,Intersection A,1250,43
Wednesday,Intersection B,850,46
Wednesday,Intersection C,1350,39
Thursday,Intersection A,1400,41
Thursday,Intersection B,920,47
Thursday,Intersection C,1450,37
Friday,Intersection A,1500,40
Friday,Intersection B,980,45
Friday,Intersection C,1600,35
Saturday,Intersection A,1700,38
Saturday,Intersection B,1100,42
Saturday,Intersection C,1800,33
Sunday,Intersection A,1600,39
Sunday,Intersection B,1050,44
Sunday,Intersection C,1750,34
Save the file as traffic_data.csv on your desktop.
Step 2: Upload the CSV File to Google Colab
Open Google Colab and create a new notebook.
from google.colab import files
# Upload the CSV file
uploaded = files.upload()
A file upload dialog will appear. Select the file traffic_data.csv from your desktop and upload it.
What to Expect
Once uploaded, Colab will display:
Uploading traffic_data.csv Saving traffic_data.csv to traffic_data.csv
traffic_data.csv(text/csv) - 717 bytes, last modified: 12/1/2024 - 100% done
Saving traffic_data.csv to traffic_data.csv
Step 3: Load and Display the Dataset
import pandas as pd # Importing Pandas for data handling
# Load the uploaded CSV file into a Pandas DataFrame
file_name = "traffic_data.csv"
data = pd.read_csv(file_name)
# Display the first few rows of the dataset
print("Dataset Overview:")
print(data.head())
Step 4: Total Vehicles Counted by Day
Let’s calculate the total number of vehicles counted for each day and create a bar chart.
import matplotlib.pyplot as plt
# Group data by Day and sum the Vehicles Counted
daywise_traffic = data.groupby("Day")["Vehicles Counted"].sum()
# Plot a bar chart
plt.figure(figsize=(8, 6))
daywise_traffic.plot(kind="bar", color="skyblue", edgecolor="black")
# Add title and labels
plt.title("Total Vehicles Counted by Day", fontsize=16)
plt.xlabel("Day", fontsize=12)
plt.ylabel("Total Vehicles", fontsize=12)
# Show the plot
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
What This Code Does
Groups the data by Day and calculates the total number of vehicles for each day.
Creates a bar chart to visualize traffic patterns throughout the week.
What to Expect
A bar chart showing total vehicle counts for each day (Monday to Sunday)
Step 5: Analyze Traffic for Odd and Even Days
We’ll classify the days into Odd and Even, calculate the total vehicle count for each category, and create a comparative bar chart.
# Classify days into Odd and Even based on their position in the week
odd_days = ["Monday", "Wednesday", "Friday", "Sunday"]
data["Day Type"] = data["Day"].apply(lambda x: "Odd" if x in odd_days else "Even")
# Group data by Day Type and sum the Vehicles Counted
odd_even_traffic = data.groupby("Day Type")["Vehicles Counted"].sum()
# Plot a bar chart for Odd and Even days
plt.figure(figsize=(8, 6))
odd_even_traffic.plot(kind="bar", color=["lightcoral", "skyblue"], edgecolor="black")
# Add title and labels
plt.title("Traffic Comparison: Odd vs Even Days", fontsize=16)
plt.xlabel("Day Type", fontsize=12)
plt.ylabel("Total Vehicles", fontsize=12)
# Show the plot
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
What This Code Does
Classifies days into Odd (Monday, Wednesday, Friday, Sunday) and Even (Tuesday, Thursday, Saturday).
Groups data by these categories and calculates the total vehicle count for each.
Creates a bar chart to compare traffic between Odd and Even days.
What to Expect
A bar chart comparing total vehicle counts for Odd and Even days.
Highlights whether traffic is heavier on odd or even days, which is critical for policies like Odd-Even Rules.
odd_days = ["Monday", "Wednesday", "Friday", "Sunday"]
data["Day Type"] = data["Day"].apply(lambda x: "Odd" if x in odd_days else "Even")
odd_even_traffic = data.groupby("Day Type")["Vehicles Counted"].sum()
plt.figure(figsize=(8,6))
odd_even_traffic.plot(kind="bar", color=["lightcoral", "skyblue"], edgecolor="black")
plt.title("Traffic Comparison:odd vs Even Days", fontsize=16)
plt.xlabel("Day Type", fontsize=12)
plt.ylabel("Total Vehicles", fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Exercise which is common?
Open Government Data (OGD) Platform India as our primary resource. Here's how you can proceed:
Step 1: Access the OGD Platform
Visit the OGD Platform: Open your web browser and go to data.gov.in.
Step 2: Search for a Dataset
Use the Search Function: On the homepage, you'll find a search bar. Type in keywords related to the data you're interested in, such as "rainfall," "population," or "traffic."
Filter Results: After searching, you can filter the results by file format (e.g., CSV) to find datasets that are easy to work with.
Step 3: Select and Download a Dataset
Choose a Dataset: Browse through the search results and select a dataset that fits your needs. For example, you might choose "State-wise Monthly Rainfall Data."
Download the Dataset: On the dataset's page, look for a download link or button, often labeled "Download" or "CSV." Click it to download the file to your computer.
Step 4: Upload the Dataset to Google Colab
Open Google Colab: Navigate to Google Colab and create a new notebook.
Upload the File:
In Colab, click on the folder icon on the left sidebar to open the file explorer.
Click the upload icon (a file with an upward arrow) and select the CSV file you downloaded.
Step 5: Load and Explore the Data
Import Necessary Libraries:
import pandas as pd
Load the Data:
# Replace 'filename.csv' with the actual name of your file data = pd.read_csv('filename.csv')
View the Data:
# Display the first few rows of the dataset
data.head()
Step 6: Analyze and Visualize the Data
Check for Missing Values:
data.isnull().sum()
Basic Statistics:
data.describe()
Data Visualization:
import matplotlib.pyplot as plt
Plot Data (example for rainfall data):
# Ensure 'Date' column is in datetime format
data['Date'] = pd.to_datetime(data['Date'])
# Plot
plt.figure(figsize=(10, 5))
plt.plot(data['Date'], data['Rainfall'], marker='o')
plt.title('Monthly Rainfall Over Time')
plt.xlabel('Date')
plt.ylabel('Rainfall (mm)')
plt.grid(True)
plt.show()
Additional Resources
Smart Cities Mission Data Portal: For city-specific data, visit the Smart Cities Mission Data Portal.
NCRB
NCRB data
Let’s begin working on the NCRB_CII-2019_Table_17A.1.csv file you downloaded.
Step 1: Upload the CSV File to Google Colab
Open Google Colab and create a new notebook.
Add the following code in a new cell to upload the file:
from google.colab import files # Upload the CSV file uploaded = files.upload()
Run the cell by pressing Shift + Enter.
A file upload dialog will appear. Select the file NCRB_CII-2019_Table_17A.1.csv from your computer and upload it.
What to Expect
Once uploaded, Colab should display something like this:
Uploading NCRB_CII-2019_Table_17A.1.csv Saving NCRB_CII-2019_Table_17A.1.csv to NCRB_CII-2019_Table_17A.1.csv
NCRB_CII-2019_Table_17A.1.csv
NCRB_CII-2019_Table_17A.1.csv(text/csv) - 18087 bytes, last modified: 12/1/2024 - 100% done
Saving NCRB_CII-2019_Table_17A.1.csv to NCRB_CII-2019_Table_17A.1.csv
Step 2: Load and Display the Dataset
Add the following code to a new cell in Google Colab:
import pandas as pd
# Importing Pandas for data handling
# Load the uploaded CSV file into a Pandas DataFrame
file_name = "NCRB_CII-2019_Table_17A.1.csv"
data = pd.read_csv(file_name)
# Display the first few rows of the dataset
print("Dataset Overview:")
print(data.head())
Run the cell by pressing Shift + Enter.
What to Expect
You should see the first 5 rows of the dataset, including the column names and some data.
Step 4: Analyze the Dataset
Let’s start with a simple analysis: Identify the top 5 crime categories with the highest number of cases reported during the year.
# Sort the dataset by 'Cases Reported during the year' in descending order
top_crimes = data.sort_values(by='Cases Reported during the year', ascending=False)
# Select the top 5 rows
top_5_crimes = top_crimes[['Crime Head', 'Cases Reported during the year']].head(5)
# Display the results
print("Top 5 Crime Categories by Cases Reported:")
print(top_5_crimes)
A list of the top 5 crime categories with the highest number of cases reported during the year.
Top 5 Crime Categories by Cases Reported:
Crime Head Cases Reported during the year
141 Total Cognizable IPC crimes 3225701
60 Offences Affecting the Human Body (Total) 1050945
139 Miscellaneous IPC Crimes(Total) 860209
104 Offences against Property (Total) 854618
88 Theft 675916
Step 5: Visualize the Top 5 Crime Categories by Cases Reported
import matplotlib.pyplot as plt
# Extract the crime categories and case numbers
crime_categories = top_5_crimes['Crime Head']
case_numbers = top_5_crimes['Cases Reported during the year']
# Create the bar chart
plt.figure(figsize=(10, 6))
plt.barh(crime_categories, case_numbers, color='skyblue', edgecolor='black')
# Add title and labels
plt.title("Top 5 Crime Categories by Cases Reported", fontsize=16)
plt.xlabel("Number of Cases", fontsize=12)
plt.ylabel("Crime Categories", fontsize=12)
# Show the plot
plt.tight_layout()
plt.show()
What to Expect
A horizontal bar chart showcasing the top 5 crime categories with their respective number of cases reported during the year.
import matplotlib.pyplot as plt
crime_categories = top_5_crimes['Crime Head']
case_numbers = top_5_crimes['Cases Reported during the year']
plt.figure(figsize=(10, 6))
plt.barh(crime_categories, case_numbers, color='skyblue', edgecolor='black')
plt.title("Top 5 Crime Categories by Cases Reported", fontsize=16)
plt.xlabel("Number of Cases", fontsize=12)
plt.ylabel("Crime Categories", fontsize=12)
plt.tight_layout()
plt.show()
A simple plot of a chef?
import matplotlib.pyplot as plt
# Your dishes and their quantities
dishes = ["Pizza", "Burger", "Pasta"]
quantities = [5, 3, 7]
# Create a horizontal bar chart
plt.barh(dishes, quantities, color='skyblue')
# Add a title and labels
plt.title("Dish Quantities at Buffet")
plt.xlabel("Quantity (kg)")
plt.ylabel("Dishes")
# Show the chart
plt.show()
CRPF csv file & Predictions
Copy and paste the following in notepad and save it as .csv
CRPF_Incidents_Large.csv
Incident_ID,Location,Date,Incident_Type,Casualties,Reported_By,Severity
1,Assam,2024-08-23,Encounter,0,Patrol Unit B,Low
2,Odisha,2024-04-10,Riots,0,Unit D,Low
3,Odisha,2024-06-10,Encounter,2,Unit D,Medium
4,West Bengal,2024-07-27,Patrolling,6,Unit C,Medium
5,Jammu & Kashmir,2024-07-22,Patrolling,0,Patrol Unit A,Medium
6,Maharashtra,2024-08-08,IED Blast,4,Unit C,Low
7,Bihar,2024-10-30,Encounter,7,Local Intel,Low
8,Assam,2024-07-19,Search Operation,9,Unit D,Low
9,Assam,2024-07-14,Riots,5,Local Intel,Medium
10,Kerala,2024-12-06,Encounter,8,Unit D,Medium
11,Odisha,2025-01-13,Search Operation,1,Patrol Unit A,Medium
12,Bihar,2024-10-08,Search Operation,8,Patrol Unit B,High
13,Assam,2024-06-01,Encounter,5,Local Intel,High
14,Odisha,2024-11-26,Encounter,8,Unit C,Low
15,Bihar,2024-07-24,Search Operation,6,Unit D,High
16,West Bengal,2024-11-29,Search Operation,2,Patrol Unit A,Low
17,Kerala,2024-10-22,Riots,4,Unit C,Medium
18,Jammu & Kashmir,2024-03-04,IED Blast,3,Patrol Unit B,High
19,Assam,2024-09-06,Patrolling,4,Unit C,Low
20,Chhattisgarh,2024-07-29,Encounter,7,Local Intel,High
21,Assam,2024-05-01,Riots,7,Unit C,Low
22,West Bengal,2024-06-08,Search Operation,10,Patrol Unit A,Medium
23,Jammu & Kashmir,2024-05-14,Ambush,2,Local Intel,High
24,Jammu & Kashmir,2024-07-02,IED Blast,2,Unit C,Low
25,Odisha,2024-11-04,Ambush,8,Local Intel,Low
26,Bihar,2024-09-28,IED Blast,7,Patrol Unit A,High
27,Chhattisgarh,2024-06-12,IED Blast,4,Local Intel,Low
28,Maharashtra,2024-04-26,Patrolling,2,Unit C,Medium
29,Jammu & Kashmir,2024-08-13,Encounter,10,Unit C,Low
30,Maharashtra,2024-09-16,Encounter,10,Patrol Unit A,High
31,Odisha,2024-02-09,Search Operation,2,Unit D,Low
32,Maharashtra,2024-02-11,Encounter,8,Unit D,High
33,Bihar,2025-01-02,Encounter,10,Unit D,Low
34,Odisha,2024-08-27,Search Operation,4,Unit D,Medium
35,Bihar,2024-07-26,Search Operation,7,Patrol Unit B,Low
36,Kerala,2024-09-29,Search Operation,0,Local Intel,High
37,Jammu & Kashmir,2024-09-14,Encounter,4,Local Intel,High
38,Maharashtra,2024-10-09,Patrolling,10,Unit C,Medium
39,Maharashtra,2024-10-17,Patrolling,9,Local Intel,Low
40,Kerala,2024-02-09,Patrolling,3,Unit D,Medium
41,Maharashtra,2024-10-06,Ambush,0,Patrol Unit A,Low
42,Jammu & Kashmir,2024-02-29,Search Operation,8,Unit D,Medium
43,Odisha,2024-11-07,IED Blast,1,Unit D,Medium
44,West Bengal,2024-02-03,Riots,5,Local Intel,Medium
45,Maharashtra,2024-11-26,Ambush,2,Local Intel,Low
46,Kerala,2025-01-14,Encounter,6,Patrol Unit B,Medium
47,Assam,2024-05-20,Riots,2,Patrol Unit A,Low
48,Bihar,2024-04-14,Encounter,7,Local Intel,High
49,Bihar,2024-05-22,Encounter,10,Patrol Unit A,Medium
50,West Bengal,2024-07-07,Encounter,9,Unit C,Low
51,Chhattisgarh,2024-07-29,Search Operation,1,Unit D,Medium
52,Bihar,2024-11-18,Riots,6,Patrol Unit B,Medium
53,Assam,2024-09-09,Encounter,4,Patrol Unit B,Medium
54,West Bengal,2025-01-02,Search Operation,8,Patrol Unit A,Medium
55,Chhattisgarh,2024-03-04,Search Operation,9,Patrol Unit B,Medium
56,Jammu & Kashmir,2024-06-25,Ambush,0,Patrol Unit B,Medium
57,Maharashtra,2024-05-22,Riots,2,Local Intel,Medium
58,Assam,2024-08-17,Riots,5,Patrol Unit B,High
59,Assam,2024-11-02,Search Operation,4,Unit C,Medium
60,Assam,2024-04-16,Encounter,6,Patrol Unit B,Medium
61,Maharashtra,2024-10-09,Patrolling,6,Patrol Unit B,Low
62,Kerala,2024-07-06,IED Blast,7,Unit D,Low
63,Maharashtra,2024-06-29,IED Blast,1,Unit D,High
64,Chhattisgarh,2024-08-09,Patrolling,7,Patrol Unit A,Low
65,Bihar,2024-06-17,Ambush,5,Patrol Unit A,Low
66,Chhattisgarh,2024-07-05,Patrolling,4,Unit C,Low
67,West Bengal,2024-09-25,Patrolling,9,Unit C,Medium
68,Odisha,2024-09-26,Encounter,8,Local Intel,Medium
69,Assam,2024-03-29,Encounter,7,Unit D,Medium
70,Odisha,2024-12-05,IED Blast,3,Patrol Unit B,High
71,West Bengal,2024-05-29,IED Blast,0,Local Intel,Low
72,Odisha,2024-02-18,IED Blast,10,Unit C,Low
73,Jammu & Kashmir,2025-01-25,Riots,7,Local Intel,High
74,Maharashtra,2024-03-03,Patrolling,2,Patrol Unit A,Low
75,Assam,2024-02-27,Patrolling,5,Unit D,Low
76,Jammu & Kashmir,2024-10-28,Search Operation,5,Patrol Unit B,Medium
77,Maharashtra,2025-01-26,Search Operation,2,Patrol Unit A,High
78,Assam,2024-08-31,Encounter,6,Patrol Unit B,Low
79,Chhattisgarh,2024-03-02,IED Blast,10,Unit C,High
80,Chhattisgarh,2024-05-09,Riots,7,Patrol Unit A,High
81,Bihar,2024-10-16,Encounter,7,Patrol Unit B,Medium
82,Bihar,2024-10-12,Ambush,8,Patrol Unit A,Low
83,Chhattisgarh,2024-12-05,Search Operation,0,Unit D,Low
84,Bihar,2024-05-06,Riots,2,Unit D,Low
85,Bihar,2024-05-12,Riots,6,Unit C,Low
86,Chhattisgarh,2024-03-01,Search Operation,3,Patrol Unit B,High
87,West Bengal,2024-11-05,IED Blast,1,Patrol Unit A,Medium
88,Jammu & Kashmir,2024-01-29,Riots,10,Patrol Unit B,High
89,Assam,2024-05-13,Patrolling,10,Patrol Unit A,Low
90,Assam,2024-10-08,Ambush,0,Unit C,Low
91,Jammu & Kashmir,2024-06-22,IED Blast,10,Unit C,High
92,Assam,2025-01-08,Patrolling,8,Patrol Unit B,High
93,Assam,2025-01-22,IED Blast,2,Local Intel,High
94,Maharashtra,2024-10-06,Patrolling,7,Local Intel,High
95,Jammu & Kashmir,2024-04-11,IED Blast,4,Local Intel,Medium
96,Bihar,2024-09-16,Ambush,8,Unit D,Low
97,Bihar,2024-10-20,Patrolling,2,Unit C,Low
98,West Bengal,2024-01-30,Encounter,9,Patrol Unit B,Low
99,Chhattisgarh,2024-07-26,Riots,4,Unit C,High
100,Kerala,2024-03-12,Ambush,6,Local Intel,Low
101,West Bengal,2024-09-20,IED Blast,10,Local Intel,Low
102,Bihar,2024-02-21,Riots,2,Patrol Unit A,Low
103,Assam,2025-01-13,Riots,0,Unit D,Medium
104,Chhattisgarh,2025-01-13,Patrolling,3,Unit D,Low
105,Odisha,2024-11-28,IED Blast,2,Unit C,Low
106,Jammu & Kashmir,2024-10-17,Ambush,8,Patrol Unit A,Medium
107,Bihar,2024-05-16,IED Blast,9,Patrol Unit A,Low
108,West Bengal,2024-03-12,Encounter,6,Local Intel,High
109,Odisha,2024-05-28,Encounter,4,Unit C,Medium
110,Jammu & Kashmir,2024-09-24,Encounter,1,Local Intel,Low
111,West Bengal,2024-03-30,Search Operation,4,Unit C,Low
112,West Bengal,2024-03-18,Patrolling,8,Unit D,Low
113,West Bengal,2024-03-01,Riots,8,Patrol Unit A,Medium
114,Jammu & Kashmir,2024-08-05,Encounter,6,Patrol Unit B,High
115,Maharashtra,2024-11-02,Search Operation,7,Patrol Unit B,Medium
116,West Bengal,2024-08-21,Encounter,7,Patrol Unit B,High
117,Odisha,2024-02-29,IED Blast,9,Unit C,Medium
118,Odisha,2024-10-22,Search Operation,5,Patrol Unit B,Medium
119,Jammu & Kashmir,2024-01-30,Patrolling,4,Unit C,High
120,Odisha,2024-02-13,Patrolling,5,Local Intel,Low
121,West Bengal,2024-03-14,IED Blast,9,Patrol Unit A,Medium
122,Kerala,2024-05-15,Search Operation,0,Local Intel,Medium
123,Chhattisgarh,2024-02-14,Riots,6,Patrol Unit A,Medium
124,Odisha,2025-01-15,Riots,1,Local Intel,Low
125,Chhattisgarh,2024-10-04,Patrolling,0,Patrol Unit B,High
126,Bihar,2025-01-18,Search Operation,8,Unit D,High
127,Bihar,2025-01-11,IED Blast,3,Unit D,Low
128,Assam,2024-04-25,Patrolling,8,Local Intel,High
129,Kerala,2024-11-21,Encounter,3,Patrol Unit A,Medium
130,Bihar,2024-04-01,Patrolling,7,Unit D,Medium
131,Assam,2024-11-16,Ambush,1,Unit D,High
132,West Bengal,2024-10-13,IED Blast,1,Local Intel,Low
133,Odisha,2024-03-07,Patrolling,2,Patrol Unit A,Low
134,West Bengal,2024-03-09,Riots,10,Unit C,Medium
135,Bihar,2024-10-18,Riots,2,Unit D,Low
136,Assam,2024-08-14,Riots,3,Local Intel,High
137,Jammu & Kashmir,2024-07-30,Ambush,6,Unit D,Low
138,Kerala,2024-11-04,Search Operation,1,Local Intel,High
139,Odisha,2024-08-26,Riots,1,Patrol Unit A,High
140,Odisha,2024-02-13,IED Blast,9,Patrol Unit A,High
141,West Bengal,2024-05-24,Search Operation,1,Patrol Unit A,Low
142,West Bengal,2024-07-31,Encounter,1,Patrol Unit B,Medium
143,West Bengal,2024-12-08,Ambush,3,Unit C,High
144,West Bengal,2024-06-29,Search Operation,4,Local Intel,High
145,Bihar,2024-02-01,Patrolling,4,Unit C,Medium
146,Odisha,2024-07-04,Search Operation,1,Patrol Unit B,High
147,Maharashtra,2024-08-31,IED Blast,8,Local Intel,High
148,Assam,2025-01-01,Search Operation,3,Local Intel,Medium
149,Maharashtra,2024-04-12,Riots,1,Patrol Unit A,Medium
150,Odisha,2024-03-04,Riots,5,Patrol Unit B,High
151,Kerala,2024-08-22,Patrolling,10,Patrol Unit B,Medium
152,Kerala,2024-06-25,IED Blast,10,Unit C,Low
153,Bihar,2024-04-05,Search Operation,4,Patrol Unit B,Low
154,Bihar,2025-01-08,Search Operation,10,Unit D,Medium
155,Bihar,2024-05-25,Riots,3,Unit C,Medium
156,Assam,2024-07-22,Encounter,2,Unit C,High
157,West Bengal,2024-02-24,Search Operation,1,Local Intel,Medium
158,Maharashtra,2024-08-23,Ambush,10,Local Intel,Medium
159,Jammu & Kashmir,2024-11-05,Ambush,1,Patrol Unit B,Medium
160,Assam,2024-05-02,Patrolling,3,Local Intel,High
161,Chhattisgarh,2024-10-06,Search Operation,8,Patrol Unit A,Low
162,Maharashtra,2024-05-28,Search Operation,5,Local Intel,Low
163,Jammu & Kashmir,2024-11-07,Search Operation,8,Patrol Unit B,Low
164,Odisha,2024-11-29,Patrolling,9,Local Intel,Medium
165,West Bengal,2024-04-20,Patrolling,3,Patrol Unit A,High
166,Chhattisgarh,2025-01-13,Search Operation,8,Local Intel,Low
167,West Bengal,2024-09-23,Riots,10,Patrol Unit B,Medium
168,Kerala,2024-12-31,Search Operation,4,Patrol Unit A,Medium
169,Kerala,2024-06-04,Patrolling,6,Patrol Unit A,Medium
170,Odisha,2024-09-23,Encounter,7,Patrol Unit B,Low
171,Kerala,2024-02-05,Encounter,0,Local Intel,High
172,Jammu & Kashmir,2024-04-12,Encounter,1,Local Intel,Medium
173,Chhattisgarh,2024-08-26,Encounter,0,Patrol Unit A,High
174,Assam,2024-02-09,IED Blast,6,Patrol Unit A,Medium
175,Jammu & Kashmir,2024-09-06,Riots,6,Patrol Unit B,High
176,West Bengal,2024-02-15,Search Operation,5,Unit C,High
177,Odisha,2024-04-10,Riots,0,Unit C,Medium
178,Maharashtra,2024-10-12,Patrolling,9,Local Intel,Low
179,Maharashtra,2024-09-09,Riots,4,Unit D,Medium
180,Kerala,2024-12-04,Riots,5,Patrol Unit A,Medium
181,Bihar,2024-04-09,IED Blast,6,Unit C,Medium
182,Odisha,2024-10-09,Riots,2,Patrol Unit B,Medium
183,Chhattisgarh,2024-03-03,Search Operation,3,Unit D,High
184,Jammu & Kashmir,2024-01-27,Riots,10,Patrol Unit A,Low
185,West Bengal,2024-02-29,Patrolling,1,Unit D,High
186,Chhattisgarh,2024-04-11,Patrolling,1,Local Intel,Low
187,Assam,2024-10-17,Patrolling,1,Local Intel,High
188,Assam,2025-01-20,Search Operation,3,Patrol Unit B,Low
189,Jammu & Kashmir,2024-11-11,IED Blast,0,Local Intel,Medium
190,Jammu & Kashmir,2024-07-26,Patrolling,9,Patrol Unit B,High
191,Assam,2024-04-20,IED Blast,1,Unit C,Low
192,Assam,2024-02-05,Patrolling,8,Unit C,Medium
193,Maharashtra,2024-09-13,IED Blast,1,Patrol Unit B,Medium
194,Kerala,2024-12-03,Search Operation,10,Local Intel,High
195,Maharashtra,2024-01-30,Patrolling,9,Patrol Unit A,Low
196,Kerala,2024-02-12,Patrolling,4,Unit C,Medium
197,Maharashtra,2025-01-18,Patrolling,9,Patrol Unit B,High
198,West Bengal,2024-03-21,Search Operation,10,Local Intel,Low
199,Odisha,2024-10-19,Encounter,5,Patrol Unit A,High
200,Kerala,2024-05-11,Riots,8,Patrol Unit B,Low
201,Jammu & Kashmir,2025-01-14,Riots,0,Unit D,Low
202,Kerala,2024-06-06,Patrolling,5,Unit C,High
203,Bihar,2024-07-11,Ambush,9,Unit D,Medium
204,Maharashtra,2024-12-22,Search Operation,8,Unit D,Low
205,Jammu & Kashmir,2024-11-04,Ambush,0,Local Intel,Medium
206,Jammu & Kashmir,2024-11-04,Patrolling,9,Patrol Unit B,Medium
207,Bihar,2024-09-28,IED Blast,10,Local Intel,High
208,Maharashtra,2024-05-02,IED Blast,10,Unit D,Low
209,Chhattisgarh,2024-08-08,Patrolling,2,Unit C,Medium
210,Chhattisgarh,2024-12-03,Encounter,4,Unit C,High
211,Kerala,2024-05-21,Patrolling,0,Local Intel,Medium
212,Kerala,2025-01-21,Patrolling,1,Local Intel,Low
213,Odisha,2024-08-19,Search Operation,2,Local Intel,High
214,Chhattisgarh,2024-02-04,Riots,10,Unit D,Medium
215,West Bengal,2024-06-05,IED Blast,8,Patrol Unit A,Low
216,Jammu & Kashmir,2024-05-02,Riots,0,Unit C,High
217,Jammu & Kashmir,2024-08-26,Search Operation,6,Unit D,Medium
218,Odisha,2024-05-26,Encounter,0,Unit D,High
219,Bihar,2024-10-30,Riots,1,Unit D,Medium
220,Odisha,2024-10-13,Search Operation,6,Unit C,High
221,Odisha,2024-04-16,Encounter,10,Patrol Unit A,Low
222,Chhattisgarh,2024-04-18,Patrolling,4,Patrol Unit B,High
223,West Bengal,2024-09-05,Patrolling,6,Local Intel,High
224,Maharashtra,2024-09-14,Search Operation,9,Patrol Unit B,Medium
225,Maharashtra,2024-06-24,IED Blast,7,Unit C,Low
226,Chhattisgarh,2024-06-03,Search Operation,7,Patrol Unit A,High
227,Bihar,2024-04-19,Ambush,10,Patrol Unit B,Low
228,Assam,2024-09-09,Encounter,3,Patrol Unit A,Medium
229,Maharashtra,2024-08-06,Riots,4,Unit C,Low
230,Kerala,2024-05-17,Encounter,5,Local Intel,Medium
231,Chhattisgarh,2024-12-01,Ambush,3,Local Intel,Medium
232,Maharashtra,2024-02-25,Patrolling,7,Unit D,Low
233,Odisha,2024-07-02,Ambush,5,Unit D,Medium
234,Assam,2024-04-09,Search Operation,10,Unit C,Low
235,West Bengal,2024-09-05,IED Blast,0,Patrol Unit B,High
236,Assam,2025-01-26,Encounter,6,Patrol Unit A,High
237,Odisha,2024-11-15,Riots,1,Patrol Unit B,Medium
238,Chhattisgarh,2024-02-15,Search Operation,0,Patrol Unit B,High
239,Odisha,2024-04-08,Encounter,4,Unit C,Low
240,Jammu & Kashmir,2024-08-26,Patrolling,6,Unit C,Medium
241,Bihar,2024-05-05,Ambush,3,Unit C,Low
242,Jammu & Kashmir,2025-01-04,Riots,10,Patrol Unit A,Medium
243,Maharashtra,2024-02-27,IED Blast,6,Patrol Unit B,High
244,Assam,2024-08-01,Search Operation,3,Unit D,Medium
245,Assam,2024-09-05,Patrolling,8,Unit D,Medium
246,Maharashtra,2024-07-16,Search Operation,3,Unit D,Medium
247,Kerala,2024-06-07,Encounter,10,Local Intel,Low
248,Chhattisgarh,2024-05-09,Patrolling,2,Local Intel,Low
249,Chhattisgarh,2024-03-22,Riots,7,Patrol Unit B,Medium
250,Kerala,2024-11-04,Encounter,9,Patrol Unit A,Low
251,Jammu & Kashmir,2024-10-23,Riots,10,Patrol Unit A,Medium
252,Kerala,2024-06-18,Ambush,7,Unit D,Low
253,Kerala,2024-03-05,Riots,2,Patrol Unit B,Medium
254,Jammu & Kashmir,2024-12-13,Ambush,4,Unit C,High
255,Kerala,2024-04-29,Search Operation,4,Unit C,High
256,Odisha,2024-02-27,Riots,6,Unit C,High
257,Odisha,2024-05-14,IED Blast,10,Unit D,Low
258,Kerala,2024-03-07,Search Operation,3,Patrol Unit A,Low
259,Chhattisgarh,2024-09-04,IED Blast,8,Unit C,Medium
260,Odisha,2024-04-01,IED Blast,10,Unit C,Medium
261,Assam,2024-02-26,Patrolling,1,Unit D,High
262,Kerala,2024-02-27,Patrolling,2,Patrol Unit B,Medium
263,West Bengal,2024-06-25,Patrolling,10,Local Intel,Medium
264,Jammu & Kashmir,2024-12-15,Ambush,1,Unit D,High
265,Kerala,2024-09-14,Search Operation,4,Patrol Unit A,Medium
266,Assam,2024-01-29,Riots,8,Patrol Unit A,Medium
267,Assam,2025-01-22,IED Blast,5,Unit C,Low
268,Kerala,2024-07-10,Patrolling,3,Patrol Unit B,Low
269,Assam,2024-06-11,Search Operation,0,Unit C,High
270,Assam,2024-10-20,IED Blast,1,Unit C,Medium
271,Kerala,2024-09-17,Ambush,4,Unit C,Medium
272,West Bengal,2024-04-09,Encounter,5,Patrol Unit B,Low
273,Odisha,2024-12-29,Encounter,3,Patrol Unit B,High
274,Kerala,2024-12-08,Ambush,7,Unit C,Medium
275,Maharashtra,2025-01-16,Ambush,7,Unit D,Medium
276,Maharashtra,2024-07-21,Ambush,4,Local Intel,Medium
277,Maharashtra,2024-08-24,Patrolling,4,Unit D,Medium
278,Chhattisgarh,2024-05-17,Encounter,7,Unit D,High
279,West Bengal,2024-09-05,IED Blast,9,Unit C,High
280,Kerala,2024-12-16,Search Operation,8,Patrol Unit B,High
281,Odisha,2024-05-24,Patrolling,9,Patrol Unit B,Low
282,Jammu & Kashmir,2024-08-06,IED Blast,1,Local Intel,High
283,Chhattisgarh,2024-03-14,Riots,4,Patrol Unit B,Medium
284,Chhattisgarh,2024-06-30,Ambush,1,Unit D,Medium
285,Maharashtra,2024-09-06,Encounter,7,Unit D,Low
286,Chhattisgarh,2024-10-26,Patrolling,0,Patrol Unit A,High
287,Maharashtra,2024-04-21,Search Operation,5,Unit D,Medium
288,Kerala,2024-11-07,Patrolling,4,Unit C,High
289,Kerala,2024-09-20,Ambush,1,Local Intel,High
290,West Bengal,2024-11-15,Patrolling,2,Unit D,Medium
291,Odisha,2024-12-14,Riots,5,Unit C,Low
292,Bihar,2024-09-07,IED Blast,10,Unit C,Medium
293,Kerala,2024-12-21,Riots,2,Patrol Unit B,Low
294,Maharashtra,2024-08-28,Search Operation,3,Unit D,Low
295,Odisha,2025-01-18,Patrolling,3,Local Intel,Medium
296,Maharashtra,2024-10-05,Patrolling,1,Local Intel,High
297,Chhattisgarh,2024-06-01,Patrolling,0,Patrol Unit A,Medium
298,West Bengal,2024-08-01,Patrolling,4,Patrol Unit B,High
299,Bihar,2025-01-11,Encounter,4,Patrol Unit B,Medium
300,Odisha,2024-04-24,Patrolling,10,Unit C,Low
301,Chhattisgarh,2024-09-14,Ambush,10,Unit D,High
302,Chhattisgarh,2024-10-06,Search Operation,7,Unit D,Medium
303,Chhattisgarh,2024-08-26,IED Blast,10,Patrol Unit A,High
304,Maharashtra,2024-09-05,Patrolling,10,Unit C,Medium
305,West Bengal,2025-01-15,Patrolling,10,Unit C,Medium
306,Jammu & Kashmir,2024-06-14,Ambush,10,Unit C,Low
307,Bihar,2024-09-07,Patrolling,6,Unit C,Low
308,West Bengal,2024-02-19,Ambush,5,Patrol Unit A,High
309,Odisha,2024-12-01,Encounter,0,Unit C,Low
310,Jammu & Kashmir,2024-03-29,Riots,8,Unit C,Low
311,West Bengal,2024-09-08,Patrolling,1,Patrol Unit B,Low
312,Kerala,2024-05-24,Search Operation,5,Local Intel,High
313,Chhattisgarh,2024-05-04,IED Blast,7,Unit C,Low
314,Odisha,2024-04-13,Search Operation,6,Local Intel,Low
315,Chhattisgarh,2024-03-19,Encounter,3,Local Intel,Medium
316,Maharashtra,2025-01-08,Ambush,9,Local Intel,Low
317,Kerala,2024-08-01,Search Operation,10,Unit C,Low
318,West Bengal,2024-07-24,Patrolling,10,Patrol Unit A,Low
319,Kerala,2024-08-31,IED Blast,3,Unit D,Low
320,Kerala,2024-02-27,Ambush,3,Unit C,High
321,West Bengal,2024-12-27,Ambush,9,Local Intel,Medium
322,Assam,2024-02-29,Ambush,1,Local Intel,Low
323,Kerala,2024-12-31,Ambush,2,Unit D,Low
324,Kerala,2024-03-11,IED Blast,8,Unit C,Medium
325,Assam,2024-12-21,Encounter,4,Patrol Unit A,High
326,Odisha,2024-04-19,Patrolling,10,Local Intel,Low
327,Kerala,2024-05-03,Encounter,4,Patrol Unit A,Medium
328,Assam,2024-04-20,Patrolling,2,Unit D,Medium
329,Chhattisgarh,2024-10-03,Encounter,8,Unit D,High
330,Chhattisgarh,2025-01-02,IED Blast,7,Unit C,Low
331,Chhattisgarh,2024-08-19,IED Blast,5,Patrol Unit A,High
332,Bihar,2024-08-22,Encounter,10,Patrol Unit B,Low
333,Bihar,2024-03-25,Riots,6,Patrol Unit A,Low
334,Kerala,2024-01-30,Riots,7,Unit D,High
335,Maharashtra,2024-08-15,IED Blast,8,Patrol Unit A,High
336,Jammu & Kashmir,2024-04-27,Patrolling,0,Local Intel,Low
337,Bihar,2024-10-01,Search Operation,8,Unit D,Low
338,Odisha,2024-05-22,IED Blast,1,Unit D,High
339,Jammu & Kashmir,2024-03-11,Ambush,8,Patrol Unit A,High
340,Maharashtra,2024-08-22,Riots,8,Unit C,Low
341,Assam,2024-08-04,Riots,3,Patrol Unit B,High
342,Odisha,2024-11-03,Ambush,10,Local Intel,Medium
343,West Bengal,2024-02-26,Search Operation,10,Patrol Unit A,High
344,Jammu & Kashmir,2024-11-16,Search Operation,4,Patrol Unit A,Low
345,Odisha,2024-09-23,Ambush,5,Local Intel,Medium
346,Kerala,2024-07-31,IED Blast,7,Unit C,Medium
347,Odisha,2024-04-05,IED Blast,10,Patrol Unit B,Low
348,Assam,2024-10-29,IED Blast,3,Patrol Unit A,Low
349,Jammu & Kashmir,2024-09-18,IED Blast,3,Patrol Unit B,Medium
350,Jammu & Kashmir,2024-05-06,Search Operation,6,Local Intel,Medium
351,Jammu & Kashmir,2024-12-31,Patrolling,10,Local Intel,High
352,Chhattisgarh,2025-01-23,Riots,0,Patrol Unit A,High
353,Kerala,2024-06-06,Ambush,10,Patrol Unit A,High
354,Maharashtra,2024-09-13,IED Blast,5,Unit D,High
355,Kerala,2024-10-07,Search Operation,4,Local Intel,Medium
356,Jammu & Kashmir,2024-08-04,Search Operation,3,Unit C,High
357,Maharashtra,2024-08-13,Patrolling,9,Patrol Unit A,Medium
358,Maharashtra,2024-04-01,Patrolling,1,Patrol Unit A,High
359,Assam,2025-01-06,Riots,9,Unit C,High
360,Bihar,2025-01-05,Search Operation,9,Patrol Unit A,High
361,Bihar,2024-09-11,Search Operation,2,Unit D,High
362,Maharashtra,2024-06-15,Ambush,2,Patrol Unit B,High
363,West Bengal,2024-07-19,IED Blast,1,Unit D,Low
364,Chhattisgarh,2024-04-24,Riots,9,Patrol Unit A,Medium
365,West Bengal,2024-02-01,Riots,1,Local Intel,Low
366,Kerala,2024-11-24,Riots,0,Unit C,Low
367,Kerala,2024-10-15,Patrolling,2,Patrol Unit B,Low
368,Assam,2024-12-27,Riots,6,Unit C,High
369,Odisha,2024-07-11,Ambush,8,Unit D,Medium
370,West Bengal,2024-09-30,Encounter,6,Local Intel,High
371,Maharashtra,2024-11-29,Ambush,2,Unit D,Low
372,Maharashtra,2024-07-29,Riots,8,Unit C,High
373,Odisha,2024-07-09,Patrolling,5,Patrol Unit B,Low
374,Bihar,2024-03-19,Search Operation,1,Patrol Unit B,High
375,Kerala,2024-02-14,Search Operation,4,Local Intel,High
376,Maharashtra,2024-07-21,Patrolling,9,Unit C,High
377,Assam,2024-02-04,Search Operation,7,Local Intel,Low
378,Maharashtra,2024-06-07,IED Blast,2,Unit C,Medium
379,Chhattisgarh,2024-02-20,Ambush,2,Local Intel,High
380,Assam,2024-11-13,Riots,9,Local Intel,Low
381,West Bengal,2024-09-17,Ambush,2,Unit D,High
382,West Bengal,2024-03-04,IED Blast,9,Unit C,Low
383,Kerala,2024-09-16,Ambush,3,Patrol Unit B,Medium
384,Assam,2024-11-11,Search Operation,1,Patrol Unit A,Medium
385,Odisha,2024-10-13,Patrolling,9,Local Intel,High
386,Chhattisgarh,2025-01-21,Search Operation,2,Unit D,High
387,Jammu & Kashmir,2024-03-27,Search Operation,1,Patrol Unit A,High
388,Bihar,2024-04-28,Ambush,3,Patrol Unit B,High
389,Jammu & Kashmir,2024-09-28,Ambush,1,Local Intel,High
390,Chhattisgarh,2024-07-11,Ambush,2,Patrol Unit B,High
391,Odisha,2024-05-08,Search Operation,7,Patrol Unit B,Low
392,Kerala,2024-09-21,Riots,0,Patrol Unit B,High
393,Jammu & Kashmir,2024-11-03,Encounter,0,Unit C,Medium
394,Chhattisgarh,2024-05-09,Encounter,6,Patrol Unit B,High
395,Odisha,2024-04-07,Patrolling,8,Patrol Unit B,Low
396,Bihar,2024-04-17,Ambush,6,Unit C,Low
397,Bihar,2024-05-18,IED Blast,10,Patrol Unit A,Medium
398,Maharashtra,2024-08-08,Encounter,5,Patrol Unit A,Low
399,Bihar,2024-03-13,Encounter,2,Unit D,High
400,Odisha,2024-03-10,Encounter,8,Local Intel,High
401,Jammu & Kashmir,2024-07-22,Encounter,4,Unit D,Low
402,Jammu & Kashmir,2024-05-17,Patrolling,7,Local Intel,Medium
403,Bihar,2024-06-04,IED Blast,0,Patrol Unit B,High
404,Odisha,2024-08-01,Search Operation,2,Local Intel,High
405,Jammu & Kashmir,2025-01-09,Encounter,10,Patrol Unit B,Low
406,Kerala,2024-04-17,Search Operation,10,Unit D,High
407,Jammu & Kashmir,2024-12-13,Encounter,2,Patrol Unit A,Medium
408,Maharashtra,2024-07-16,Encounter,2,Unit D,Medium
409,Odisha,2025-01-03,Riots,1,Unit D,High
410,Maharashtra,2024-12-23,IED Blast,3,Patrol Unit A,High
411,Jammu & Kashmir,2024-02-05,Search Operation,1,Unit D,Medium
412,Maharashtra,2025-01-22,Encounter,9,Patrol Unit B,High
413,Jammu & Kashmir,2024-10-02,IED Blast,4,Patrol Unit A,Medium
414,Jammu & Kashmir,2024-04-24,IED Blast,6,Patrol Unit A,Low
415,Odisha,2024-08-06,IED Blast,9,Local Intel,Medium
416,Chhattisgarh,2024-02-19,IED Blast,2,Unit D,Low
417,Bihar,2024-07-09,Search Operation,3,Local Intel,Low
418,Jammu & Kashmir,2024-02-14,IED Blast,2,Patrol Unit B,Low
419,Assam,2024-12-22,Ambush,8,Patrol Unit B,Low
420,Odisha,2024-04-12,Riots,6,Patrol Unit B,High
421,Jammu & Kashmir,2024-03-11,Riots,10,Local Intel,High
422,Odisha,2024-02-09,Riots,10,Unit C,Medium
423,Jammu & Kashmir,2024-08-28,Search Operation,7,Patrol Unit B,Low
424,Bihar,2024-02-12,Encounter,1,Local Intel,Medium
425,Assam,2024-08-14,IED Blast,8,Local Intel,High
426,Odisha,2024-10-13,IED Blast,1,Patrol Unit B,High
427,West Bengal,2025-01-18,Patrolling,8,Patrol Unit B,Medium
428,West Bengal,2024-04-20,Encounter,2,Patrol Unit A,High
429,Kerala,2024-06-21,Ambush,5,Local Intel,High
430,Chhattisgarh,2024-02-25,Riots,4,Patrol Unit A,High
431,West Bengal,2024-04-17,Encounter,5,Patrol Unit A,Medium
432,Maharashtra,2024-05-31,Search Operation,6,Patrol Unit B,Medium
433,Chhattisgarh,2024-11-15,Riots,3,Unit C,High
434,Assam,2024-12-20,Ambush,5,Unit D,Medium
435,Assam,2024-04-16,Riots,0,Local Intel,Medium
436,Bihar,2024-10-20,Encounter,5,Local Intel,Medium
437,Odisha,2024-06-06,IED Blast,2,Patrol Unit B,High
438,Assam,2024-04-26,Riots,6,Patrol Unit A,Low
439,Maharashtra,2024-03-05,Encounter,7,Unit C,Low
440,Odisha,2024-10-13,Ambush,3,Patrol Unit A,High
441,Maharashtra,2024-09-12,Riots,4,Patrol Unit B,High
442,Assam,2024-04-30,IED Blast,3,Patrol Unit B,Medium
443,Kerala,2024-03-21,Riots,6,Unit D,High
444,Assam,2024-07-29,Riots,5,Local Intel,Low
445,West Bengal,2024-02-07,Encounter,3,Unit D,Low
446,Assam,2024-12-07,Patrolling,2,Local Intel,Medium
447,Bihar,2024-07-08,IED Blast,8,Local Intel,Medium
448,Assam,2024-03-19,Search Operation,10,Patrol Unit B,High
449,Chhattisgarh,2024-08-08,Ambush,4,Unit C,Medium
450,Chhattisgarh,2024-07-30,Ambush,3,Unit D,Medium
451,Assam,2024-05-12,Ambush,7,Patrol Unit A,Low
452,Odisha,2024-05-31,Patrolling,9,Patrol Unit A,Medium
453,Maharashtra,2024-02-26,IED Blast,9,Local Intel,High
454,Jammu & Kashmir,2024-09-20,IED Blast,1,Local Intel,Low
455,Bihar,2024-11-19,Search Operation,9,Patrol Unit A,High
456,Jammu & Kashmir,2024-10-01,Riots,0,Unit D,High
457,Assam,2024-12-12,Patrolling,9,Patrol Unit A,High
458,Odisha,2024-03-09,IED Blast,2,Patrol Unit A,Medium
459,Assam,2024-05-20,IED Blast,4,Patrol Unit B,High
460,Maharashtra,2024-08-04,Search Operation,7,Patrol Unit B,Low
461,Kerala,2024-06-07,Patrolling,10,Patrol Unit A,Medium
462,Odisha,2024-09-26,Search Operation,1,Local Intel,Low
463,Kerala,2024-10-07,Patrolling,7,Unit D,High
464,Odisha,2024-05-13,Encounter,5,Patrol Unit B,Medium
465,Chhattisgarh,2024-02-15,Ambush,7,Unit D,High
466,Maharashtra,2024-04-10,Encounter,5,Unit C,Low
467,Jammu & Kashmir,2024-12-27,Riots,1,Unit D,Medium
468,Chhattisgarh,2024-10-02,Patrolling,10,Patrol Unit B,High
469,Bihar,2024-02-16,Encounter,1,Patrol Unit A,Medium
470,Bihar,2024-05-30,Encounter,5,Unit C,Low
471,Jammu & Kashmir,2024-05-23,Riots,1,Local Intel,Low
472,Bihar,2024-06-27,IED Blast,3,Patrol Unit B,Medium
473,Bihar,2024-10-31,IED Blast,7,Unit C,Medium
474,Jammu & Kashmir,2024-06-12,Riots,8,Patrol Unit A,High
475,Jammu & Kashmir,2024-12-07,Ambush,6,Local Intel,Medium
476,Maharashtra,2024-07-06,Encounter,8,Patrol Unit B,High
477,Odisha,2024-07-31,Ambush,4,Unit D,Medium
478,Odisha,2024-12-20,Patrolling,1,Unit C,Low
479,Assam,2024-05-21,Patrolling,6,Unit C,Low
480,Bihar,2024-07-15,IED Blast,6,Unit C,Medium
481,West Bengal,2024-06-13,Riots,3,Patrol Unit A,Low
482,Odisha,2024-09-02,IED Blast,3,Patrol Unit B,High
483,Jammu & Kashmir,2024-12-31,Patrolling,1,Local Intel,Medium
484,Bihar,2024-04-10,IED Blast,9,Local Intel,Low
485,Odisha,2024-07-22,Patrolling,0,Unit C,High
486,Bihar,2024-10-08,Ambush,2,Unit D,High
487,Kerala,2024-12-14,IED Blast,6,Unit C,Low
488,Jammu & Kashmir,2024-11-10,Patrolling,7,Local Intel,Medium
489,Assam,2024-07-17,Patrolling,4,Unit D,Medium
490,Maharashtra,2024-05-18,Patrolling,4,Local Intel,Low
491,Jammu & Kashmir,2024-04-01,Patrolling,1,Unit D,Medium
492,Kerala,2024-06-01,Ambush,5,Unit D,Low
493,West Bengal,2024-11-23,Riots,6,Unit D,Medium
494,Assam,2024-12-30,Encounter,6,Patrol Unit B,High
495,West Bengal,2024-05-16,Patrolling,7,Patrol Unit B,Medium
496,Assam,2024-12-01,IED Blast,9,Patrol Unit A,Medium
497,Odisha,2024-09-16,Riots,0,Patrol Unit B,Medium
498,Kerala,2024-06-15,IED Blast,3,Patrol Unit B,Medium
499,Chhattisgarh,2024-11-23,Search Operation,9,Patrol Unit A,High
500,Jammu & Kashmir,2024-02-29,Encounter,6,Unit C,Medium
501,West Bengal,2024-11-19,Search Operation,3,Patrol Unit B,High
502,Maharashtra,2024-02-21,Encounter,10,Local Intel,Low
503,Kerala,2024-11-24,Ambush,10,Unit D,Medium
504,Odisha,2024-12-31,Patrolling,5,Unit D,Medium
505,Bihar,2024-04-06,Patrolling,7,Patrol Unit B,High
506,Maharashtra,2024-10-28,Search Operation,7,Patrol Unit B,Low
507,Chhattisgarh,2024-09-08,Search Operation,2,Patrol Unit B,Low
508,Jammu & Kashmir,2024-11-13,Encounter,6,Unit C,High
509,Assam,2024-10-26,Search Operation,9,Unit C,Medium
510,Bihar,2024-10-18,Encounter,9,Unit C,Low
511,Assam,2024-02-25,Search Operation,0,Unit D,High
512,Kerala,2024-03-25,Ambush,5,Patrol Unit A,High
513,Assam,2024-08-14,Riots,2,Unit D,Medium
514,Assam,2024-10-29,Patrolling,8,Unit D,Low
515,Odisha,2024-08-17,Patrolling,0,Patrol Unit A,Low
516,Odisha,2024-06-12,Patrolling,4,Patrol Unit A,Medium
517,Kerala,2024-12-26,Encounter,7,Unit C,Low
518,Jammu & Kashmir,2024-04-24,Ambush,6,Unit D,Medium
519,Chhattisgarh,2025-01-17,IED Blast,8,Unit D,Medium
520,Odisha,2024-04-26,Riots,10,Local Intel,High
521,Odisha,2024-07-25,Ambush,1,Patrol Unit A,High
522,Assam,2024-08-13,IED Blast,6,Local Intel,High
523,Chhattisgarh,2024-07-15,Search Operation,1,Local Intel,High
524,Jammu & Kashmir,2024-03-12,Patrolling,6,Unit D,Low
525,Bihar,2024-04-25,Encounter,7,Unit C,High
526,West Bengal,2024-10-18,Encounter,1,Local Intel,Low
527,Assam,2024-06-10,Ambush,6,Unit D,High
528,Jammu & Kashmir,2024-10-03,Search Operation,4,Patrol Unit A,High
529,Assam,2024-04-08,Ambush,2,Unit D,High
530,Jammu & Kashmir,2025-01-15,Riots,1,Local Intel,Low
531,Odisha,2024-05-25,Encounter,8,Unit C,High
532,Maharashtra,2024-09-24,IED Blast,5,Unit C,High
533,Bihar,2025-01-04,Ambush,3,Unit C,High
534,Bihar,2024-11-25,Encounter,10,Unit D,Low
535,Maharashtra,2024-09-29,IED Blast,1,Patrol Unit A,High
536,Assam,2024-05-18,IED Blast,9,Unit C,Low
537,Odisha,2024-03-05,Encounter,5,Local Intel,High
538,Maharashtra,2025-01-03,Encounter,5,Unit C,Medium
539,Assam,2024-08-29,Riots,6,Unit D,High
540,Odisha,2024-09-11,Riots,4,Unit D,Medium
541,Jammu & Kashmir,2024-04-24,Riots,3,Local Intel,Low
542,Chhattisgarh,2024-10-17,Patrolling,2,Unit C,High
543,Chhattisgarh,2024-04-22,Ambush,8,Local Intel,High
544,Odisha,2024-03-13,Ambush,5,Unit C,Low
545,Bihar,2025-01-03,Ambush,2,Patrol Unit B,High
546,Chhattisgarh,2024-12-03,Search Operation,9,Local Intel,Low
547,West Bengal,2024-11-15,Ambush,1,Unit D,High
548,Bihar,2024-12-10,Encounter,3,Local Intel,Medium
549,Chhattisgarh,2024-03-01,Patrolling,3,Local Intel,High
550,Assam,2024-05-16,Encounter,5,Patrol Unit B,Medium
551,Assam,2024-06-02,Encounter,2,Patrol Unit A,Medium
552,Kerala,2024-02-23,Encounter,6,Unit D,Low
553,Assam,2024-06-18,Patrolling,1,Unit C,Medium
554,Bihar,2024-07-20,Encounter,3,Unit D,High
555,West Bengal,2024-07-12,IED Blast,10,Patrol Unit A,Low
556,West Bengal,2024-05-14,IED Blast,2,Unit C,High
557,Maharashtra,2024-04-02,Ambush,0,Patrol Unit A,Medium
558,Maharashtra,2024-06-12,Encounter,2,Unit C,Low
559,Jammu & Kashmir,2024-06-12,Ambush,8,Patrol Unit A,High
560,Kerala,2024-07-24,IED Blast,3,Patrol Unit B,High
561,West Bengal,2024-03-27,Riots,0,Unit D,Medium
562,Odisha,2025-01-01,Ambush,4,Unit C,High
563,Kerala,2024-06-03,Encounter,8,Unit C,High
564,Odisha,2024-02-11,Patrolling,3,Local Intel,High
565,West Bengal,2024-07-21,Riots,10,Unit C,Medium
566,Maharashtra,2024-08-22,Encounter,3,Unit C,Low
567,Chhattisgarh,2024-07-30,Ambush,1,Local Intel,Low
568,West Bengal,2024-08-18,Search Operation,5,Patrol Unit B,Low
569,Assam,2024-09-22,IED Blast,10,Unit C,Low
570,Assam,2024-10-14,IED Blast,1,Local Intel,Low
571,Odisha,2024-09-23,IED Blast,9,Patrol Unit B,Medium
572,Jammu & Kashmir,2024-04-11,Patrolling,7,Unit D,Low
573,Jammu & Kashmir,2024-08-05,Patrolling,0,Local Intel,High
574,Chhattisgarh,2024-10-24,Encounter,8,Unit C,Low
575,Kerala,2024-05-30,Encounter,0,Patrol Unit B,Medium
576,Bihar,2024-09-01,IED Blast,2,Patrol Unit B,Low
577,Odisha,2024-02-10,Riots,2,Local Intel,Low
578,Chhattisgarh,2024-08-02,Search Operation,7,Patrol Unit B,High
579,Jammu & Kashmir,2024-03-24,Search Operation,2,Patrol Unit A,Low
580,Kerala,2024-02-23,Search Operation,0,Local Intel,Medium
581,Maharashtra,2025-01-16,Patrolling,9,Unit D,Medium
582,Odisha,2024-05-08,Search Operation,0,Patrol Unit B,Low
583,Assam,2025-01-21,IED Blast,0,Unit D,Medium
584,Chhattisgarh,2024-07-05,Patrolling,6,Local Intel,Low
585,Kerala,2025-01-08,IED Blast,8,Patrol Unit A,Medium
586,Chhattisgarh,2024-05-20,Search Operation,5,Unit C,High
587,Jammu & Kashmir,2025-01-22,Encounter,8,Unit D,Medium
588,Kerala,2024-11-10,Encounter,0,Patrol Unit B,Low
589,Bihar,2024-07-04,IED Blast,4,Unit D,Low
590,West Bengal,2024-03-07,Ambush,1,Unit D,Low
591,West Bengal,2024-12-18,Encounter,4,Patrol Unit B,High
592,Maharashtra,2024-07-18,Patrolling,6,Unit D,Low
593,West Bengal,2024-10-17,Patrolling,1,Unit C,High
594,Odisha,2024-05-19,Patrolling,5,Unit D,Medium
595,West Bengal,2024-09-21,Ambush,6,Patrol Unit B,Low
596,Chhattisgarh,2024-09-28,Patrolling,9,Patrol Unit B,Medium
597,Odisha,2024-04-04,Patrolling,0,Unit D,Medium
598,Chhattisgarh,2024-12-15,Ambush,4,Patrol Unit A,High
599,Odisha,2024-04-16,Riots,8,Unit C,Low
600,Assam,2024-10-31,Search Operation,6,Local Intel,High
601,Assam,2024-04-25,IED Blast,7,Patrol Unit B,High
602,West Bengal,2024-05-05,Encounter,9,Patrol Unit A,Low
603,Assam,2024-02-09,Patrolling,7,Patrol Unit A,High
604,Assam,2024-06-28,Riots,7,Unit C,High
605,Jammu & Kashmir,2024-05-11,Encounter,0,Unit C,Low
606,Jammu & Kashmir,2024-06-09,Riots,0,Patrol Unit B,Low
607,Maharashtra,2024-10-01,IED Blast,4,Local Intel,High
608,Assam,2025-01-01,Patrolling,6,Local Intel,Low
609,Kerala,2024-09-29,IED Blast,1,Unit D,Medium
610,Odisha,2024-12-14,Riots,3,Local Intel,High
611,Maharashtra,2024-08-06,Patrolling,8,Patrol Unit B,Medium
612,Odisha,2024-06-17,Search Operation,1,Local Intel,Medium
613,Maharashtra,2024-12-23,Encounter,3,Patrol Unit B,Medium
614,Jammu & Kashmir,2024-04-08,Search Operation,2,Patrol Unit A,Low
615,Kerala,2024-08-25,Riots,3,Local Intel,Low
616,Chhattisgarh,2024-12-19,Riots,10,Patrol Unit B,High
617,Maharashtra,2024-07-29,Ambush,0,Local Intel,High
618,West Bengal,2024-12-30,Patrolling,1,Patrol Unit B,High
619,Bihar,2024-06-18,Encounter,0,Unit D,Medium
620,Assam,2024-03-18,Ambush,5,Unit D,Low
621,Odisha,2024-08-19,Patrolling,9,Local Intel,Low
622,Odisha,2024-12-20,Search Operation,6,Unit C,Low
623,Assam,2024-04-17,Encounter,7,Local Intel,High
624,Maharashtra,2024-06-09,Search Operation,1,Patrol Unit A,Medium
625,Chhattisgarh,2025-01-06,Patrolling,9,Local Intel,High
626,West Bengal,2024-08-13,Encounter,0,Unit C,Medium
627,Maharashtra,2025-01-05,Patrolling,9,Patrol Unit A,High
628,Kerala,2024-10-23,Patrolling,10,Local Intel,High
629,Chhattisgarh,2025-01-25,IED Blast,10,Local Intel,Medium
630,Odisha,2024-04-02,IED Blast,10,Local Intel,Medium
631,Assam,2024-05-29,Encounter,0,Unit C,Medium
632,Jammu & Kashmir,2024-02-02,Riots,6,Unit C,Medium
633,Jammu & Kashmir,2024-11-19,Encounter,10,Unit D,High
634,Maharashtra,2025-01-21,Riots,7,Local Intel,Medium
635,Jammu & Kashmir,2024-03-09,Ambush,7,Unit D,Low
636,Jammu & Kashmir,2024-12-15,Search Operation,6,Patrol Unit B,Medium
637,Maharashtra,2024-12-11,Patrolling,3,Patrol Unit A,High
638,Bihar,2024-04-03,Ambush,3,Patrol Unit B,High
639,Odisha,2024-10-25,Search Operation,7,Unit D,Medium
640,Assam,2024-10-10,IED Blast,7,Patrol Unit A,High
641,Bihar,2024-06-16,Encounter,6,Unit C,Low
642,Assam,2024-11-16,Encounter,7,Patrol Unit A,High
643,Odisha,2024-08-27,Riots,10,Patrol Unit A,Low
644,Bihar,2024-03-26,IED Blast,8,Local Intel,High
645,Assam,2024-10-01,Riots,0,Unit C,High
646,Odisha,2024-03-24,Patrolling,9,Unit C,Medium
647,Kerala,2024-05-31,Riots,10,Unit C,Low
648,Assam,2024-03-06,Search Operation,6,Patrol Unit A,Medium
649,Jammu & Kashmir,2024-08-17,Riots,5,Patrol Unit B,High
650,Odisha,2024-12-06,Ambush,5,Patrol Unit B,Low
651,Maharashtra,2024-09-22,IED Blast,1,Unit D,High
652,Maharashtra,2024-12-31,Encounter,9,Local Intel,Medium
653,Kerala,2024-06-19,Patrolling,6,Local Intel,High
654,Odisha,2025-01-17,IED Blast,5,Unit D,High
655,Kerala,2024-04-02,Search Operation,10,Local Intel,Medium
656,West Bengal,2024-03-22,Encounter,10,Patrol Unit B,Low
657,Maharashtra,2025-01-14,Encounter,0,Local Intel,High
658,Odisha,2024-04-24,Riots,1,Unit D,High
659,Maharashtra,2024-11-05,IED Blast,8,Unit D,Low
660,West Bengal,2024-07-11,Ambush,5,Patrol Unit B,Medium
661,West Bengal,2024-03-04,Encounter,2,Unit D,Medium
662,Assam,2024-10-21,Search Operation,2,Patrol Unit A,Low
663,Kerala,2024-06-15,Search Operation,8,Unit D,Medium
664,West Bengal,2024-09-29,Patrolling,1,Patrol Unit A,High
665,Bihar,2024-04-09,Riots,3,Patrol Unit A,Medium
666,Maharashtra,2024-04-17,Ambush,3,Patrol Unit A,High
667,Maharashtra,2024-07-18,Riots,5,Local Intel,Medium
668,Assam,2024-02-21,Patrolling,3,Patrol Unit B,High
669,Jammu & Kashmir,2024-08-05,IED Blast,1,Unit C,High
670,West Bengal,2024-09-24,Patrolling,6,Patrol Unit A,Medium
671,Jammu & Kashmir,2024-09-12,Search Operation,6,Local Intel,Medium
672,West Bengal,2024-09-08,IED Blast,1,Unit D,Medium
673,Kerala,2024-11-24,Ambush,3,Unit C,Medium
674,Chhattisgarh,2024-02-05,Ambush,4,Local Intel,Medium
675,Odisha,2024-05-12,Ambush,7,Unit C,Low
676,Odisha,2024-08-24,Encounter,0,Unit D,Low
677,Bihar,2024-03-28,IED Blast,5,Patrol Unit A,Low
678,Assam,2024-11-02,IED Blast,4,Patrol Unit B,High
679,Kerala,2024-02-06,Ambush,0,Unit C,Low
680,Chhattisgarh,2024-09-01,Search Operation,6,Local Intel,High
681,Jammu & Kashmir,2024-10-26,Riots,1,Unit C,Low
682,Jammu & Kashmir,2024-09-14,Search Operation,9,Local Intel,Low
683,Jammu & Kashmir,2024-02-14,Search Operation,10,Patrol Unit A,Low
684,West Bengal,2024-12-27,Ambush,9,Unit D,Medium
685,Assam,2024-02-29,Encounter,9,Unit D,Medium
686,Bihar,2024-03-10,Patrolling,0,Unit C,Low
687,Jammu & Kashmir,2024-03-04,Search Operation,9,Patrol Unit B,High
688,West Bengal,2024-03-29,Riots,8,Local Intel,Low
689,Maharashtra,2024-05-05,Encounter,7,Unit C,Medium
690,Jammu & Kashmir,2024-03-21,Encounter,4,Patrol Unit A,Medium
691,Odisha,2025-01-01,Patrolling,0,Patrol Unit B,Low
692,Maharashtra,2024-04-15,Encounter,5,Unit C,High
693,Kerala,2024-08-24,Riots,8,Unit C,High
694,Assam,2025-01-10,Patrolling,3,Patrol Unit A,Medium
695,Bihar,2024-10-11,Encounter,5,Unit C,Medium
696,Kerala,2024-06-19,Patrolling,6,Patrol Unit B,Medium
697,Bihar,2024-11-29,Riots,3,Patrol Unit A,High
698,Chhattisgarh,2024-09-02,Riots,1,Unit D,Medium
699,Assam,2024-03-01,Ambush,2,Patrol Unit B,High
700,Chhattisgarh,2024-10-04,Ambush,7,Unit C,Low
701,West Bengal,2024-11-27,Search Operation,7,Patrol Unit A,High
702,Assam,2024-07-05,Riots,4,Unit D,High
703,West Bengal,2024-11-22,Search Operation,6,Patrol Unit B,Low
704,Kerala,2024-03-31,Search Operation,0,Patrol Unit B,Medium
705,Jammu & Kashmir,2024-04-03,Patrolling,7,Patrol Unit A,Low
706,Kerala,2024-12-17,IED Blast,7,Patrol Unit B,High
707,Assam,2024-03-19,Patrolling,4,Patrol Unit A,Medium
708,Chhattisgarh,2024-05-30,Patrolling,4,Patrol Unit A,Low
709,Jammu & Kashmir,2024-02-29,Riots,4,Unit D,High
710,Assam,2024-11-27,Encounter,3,Patrol Unit A,Low
711,Chhattisgarh,2024-09-05,Search Operation,4,Unit C,Medium
712,Jammu & Kashmir,2024-08-10,Encounter,9,Patrol Unit B,Low
713,West Bengal,2024-10-17,Patrolling,9,Unit D,Medium
714,Chhattisgarh,2024-07-07,Riots,8,Local Intel,Medium
715,Kerala,2024-10-24,Ambush,4,Patrol Unit B,Low
716,Odisha,2024-02-29,IED Blast,10,Patrol Unit A,Low
717,Jammu & Kashmir,2024-12-01,IED Blast,0,Local Intel,Medium
718,Odisha,2024-03-23,Ambush,6,Patrol Unit B,High
719,Assam,2024-12-28,Encounter,10,Unit D,High
720,Chhattisgarh,2024-10-20,Encounter,10,Unit D,Medium
721,Jammu & Kashmir,2024-06-24,Ambush,8,Unit D,High
722,West Bengal,2024-09-22,Patrolling,4,Local Intel,Low
723,Bihar,2024-04-18,Search Operation,2,Unit C,Medium
724,Jammu & Kashmir,2024-08-24,Ambush,8,Patrol Unit A,Medium
725,Odisha,2024-07-11,Search Operation,4,Unit C,Low
726,Jammu & Kashmir,2024-10-13,IED Blast,9,Patrol Unit B,Medium
727,Bihar,2024-08-06,Encounter,3,Local Intel,Low
728,Odisha,2024-11-30,Riots,9,Local Intel,Medium
729,Kerala,2024-04-29,Ambush,0,Unit C,Medium
730,Maharashtra,2024-12-15,IED Blast,9,Unit D,Low
731,Bihar,2024-02-26,Riots,8,Unit D,Medium
732,Odisha,2024-08-01,Riots,2,Patrol Unit A,High
733,Kerala,2024-03-19,Riots,5,Unit D,Low
734,Maharashtra,2024-06-06,Ambush,6,Patrol Unit A,Low
735,Jammu & Kashmir,2024-04-29,Search Operation,9,Patrol Unit B,Low
736,Jammu & Kashmir,2024-06-30,Riots,0,Unit C,Low
737,West Bengal,2024-11-04,Ambush,4,Unit D,High
738,Odisha,2024-07-27,Search Operation,8,Unit D,High
739,Odisha,2024-04-03,Search Operation,10,Unit C,Medium
740,Jammu & Kashmir,2024-06-09,Search Operation,4,Unit D,High
741,Jammu & Kashmir,2024-02-22,Search Operation,8,Unit C,Medium
742,West Bengal,2024-10-29,IED Blast,10,Patrol Unit B,Medium
743,Maharashtra,2024-02-01,Ambush,10,Local Intel,Low
744,Jammu & Kashmir,2024-05-20,IED Blast,9,Unit C,High
745,Jammu & Kashmir,2024-08-12,Search Operation,10,Unit C,High
746,Odisha,2024-10-16,IED Blast,9,Unit D,High
747,Kerala,2025-01-20,IED Blast,9,Patrol Unit A,Medium
748,Jammu & Kashmir,2024-09-25,Encounter,8,Local Intel,High
749,Odisha,2024-08-13,Patrolling,0,Patrol Unit A,Medium
750,Maharashtra,2024-03-19,IED Blast,9,Unit C,Medium
751,Jammu & Kashmir,2024-09-14,Encounter,2,Local Intel,Medium
752,Jammu & Kashmir,2024-08-18,Encounter,2,Local Intel,High
753,Bihar,2024-02-29,Encounter,7,Unit D,Medium
754,Jammu & Kashmir,2024-06-10,Patrolling,0,Local Intel,High
755,Odisha,2024-11-30,IED Blast,9,Unit C,High
756,Assam,2024-02-24,IED Blast,8,Patrol Unit A,Low
757,Kerala,2024-11-23,Patrolling,10,Unit C,Low
758,West Bengal,2024-05-10,Ambush,5,Local Intel,Low
759,Jammu & Kashmir,2024-08-27,Riots,4,Patrol Unit B,High
760,Maharashtra,2024-10-22,Riots,1,Local Intel,High
761,West Bengal,2024-12-12,Patrolling,1,Unit D,High
762,Bihar,2024-05-17,Riots,3,Local Intel,Medium
763,Assam,2024-10-28,Patrolling,10,Patrol Unit A,Medium
764,West Bengal,2024-08-01,IED Blast,5,Unit D,High
765,West Bengal,2024-03-08,Patrolling,1,Unit D,Medium
766,Assam,2024-02-24,Search Operation,6,Local Intel,Low
767,Jammu & Kashmir,2024-07-02,Encounter,2,Patrol Unit B,Medium
768,Assam,2024-08-14,Search Operation,8,Patrol Unit A,Medium
769,Kerala,2024-02-23,IED Blast,2,Unit D,Low
770,West Bengal,2024-06-27,Patrolling,2,Local Intel,Low
771,Maharashtra,2024-04-04,Patrolling,0,Unit D,Low
772,Bihar,2025-01-07,Patrolling,3,Unit D,Medium
773,Maharashtra,2024-01-30,Search Operation,3,Unit C,Medium
774,Kerala,2024-06-05,Ambush,5,Patrol Unit A,Medium
775,Chhattisgarh,2024-05-20,Encounter,5,Unit D,High
776,Chhattisgarh,2024-06-27,Riots,5,Patrol Unit B,Medium
777,Chhattisgarh,2024-12-19,Patrolling,5,Unit C,High
778,West Bengal,2025-01-01,Search Operation,5,Patrol Unit B,High
779,Assam,2024-03-22,Search Operation,4,Patrol Unit A,High
780,Odisha,2024-12-08,Riots,9,Unit D,Low
781,Jammu & Kashmir,2024-10-10,IED Blast,5,Local Intel,Low
782,Odisha,2024-05-12,Riots,9,Local Intel,High
783,Kerala,2024-05-03,Riots,4,Unit D,Medium
784,Assam,2024-02-08,Ambush,3,Unit C,Medium
785,Maharashtra,2025-01-25,Ambush,3,Patrol Unit B,Low
786,Maharashtra,2024-06-30,Ambush,10,Patrol Unit A,Low
787,Bihar,2024-12-26,Patrolling,0,Patrol Unit B,High
788,Chhattisgarh,2024-07-08,Ambush,2,Patrol Unit A,High
789,West Bengal,2024-12-04,Ambush,3,Unit D,Low
790,Jammu & Kashmir,2024-05-12,Patrolling,0,Patrol Unit B,Medium
791,Kerala,2025-01-25,Riots,0,Local Intel,Low
792,West Bengal,2024-12-09,Patrolling,9,Local Intel,High
793,West Bengal,2024-05-15,Search Operation,6,Local Intel,High
794,Kerala,2024-05-17,Encounter,7,Local Intel,High
795,Kerala,2024-10-05,Search Operation,3,Patrol Unit A,Medium
796,West Bengal,2024-08-19,Ambush,8,Local Intel,Low
797,Maharashtra,2024-09-05,Patrolling,9,Patrol Unit A,Medium
798,Assam,2024-03-09,Ambush,6,Unit D,Low
799,Maharashtra,2024-05-10,Patrolling,5,Patrol Unit A,High
800,Maharashtra,2024-09-15,Encounter,4,Local Intel,Low
801,Chhattisgarh,2024-06-01,Riots,8,Patrol Unit B,Medium
802,Chhattisgarh,2024-05-06,Encounter,7,Patrol Unit B,High
803,Kerala,2024-02-13,Encounter,7,Patrol Unit A,Low
804,Bihar,2024-09-18,Search Operation,10,Unit C,Low
805,Maharashtra,2025-01-19,Patrolling,2,Unit D,Medium
806,Bihar,2024-07-24,Riots,4,Unit C,Medium
807,Jammu & Kashmir,2024-05-11,Ambush,10,Unit D,Low
808,Maharashtra,2024-11-22,Riots,8,Patrol Unit B,Low
809,Kerala,2024-01-27,Riots,7,Unit C,Low
810,West Bengal,2025-01-25,Search Operation,5,Unit D,Medium
811,Odisha,2024-11-12,Encounter,4,Unit C,Low
812,Chhattisgarh,2024-09-09,Ambush,9,Unit C,Low
813,Jammu & Kashmir,2024-04-09,Patrolling,1,Patrol Unit B,Medium
814,West Bengal,2024-06-22,Riots,8,Local Intel,High
815,Jammu & Kashmir,2024-10-22,Riots,6,Unit D,High
816,Kerala,2024-09-19,Search Operation,2,Unit C,High
817,Odisha,2024-07-16,Encounter,0,Unit D,Medium
818,Kerala,2024-04-12,IED Blast,1,Unit C,High
819,Assam,2024-03-05,Ambush,7,Local Intel,High
820,Chhattisgarh,2024-08-15,Ambush,7,Patrol Unit A,High
821,West Bengal,2024-02-01,Encounter,1,Patrol Unit B,Medium
822,Assam,2024-07-30,Search Operation,6,Local Intel,Medium
823,Odisha,2024-04-30,Ambush,5,Unit C,Medium
824,Bihar,2024-04-15,Ambush,10,Patrol Unit B,Low
825,Chhattisgarh,2024-01-30,IED Blast,10,Local Intel,Medium
826,Kerala,2024-03-01,Riots,2,Unit C,High
827,Odisha,2024-02-15,IED Blast,3,Local Intel,Medium
828,Maharashtra,2024-04-29,Encounter,0,Unit D,Low
829,Maharashtra,2024-06-22,Search Operation,5,Patrol Unit A,High
830,Assam,2024-03-15,Search Operation,0,Local Intel,Low
831,Jammu & Kashmir,2024-10-18,Search Operation,8,Unit D,Medium
832,West Bengal,2024-05-13,Riots,10,Unit D,High
833,Odisha,2024-06-28,Ambush,9,Unit D,Low
834,Kerala,2024-05-28,Riots,3,Unit D,Medium
835,Maharashtra,2024-05-09,Patrolling,7,Patrol Unit A,High
836,Assam,2024-03-21,Riots,4,Unit C,Medium
837,West Bengal,2025-01-23,Encounter,1,Patrol Unit B,Medium
838,Kerala,2024-06-14,Search Operation,3,Patrol Unit A,Medium
839,Chhattisgarh,2024-04-23,IED Blast,1,Local Intel,High
840,Chhattisgarh,2024-07-12,IED Blast,1,Unit C,Medium
841,Maharashtra,2024-05-13,Riots,1,Unit D,Medium
842,Jammu & Kashmir,2024-08-28,Ambush,4,Unit C,Medium
843,Odisha,2024-09-22,Ambush,9,Patrol Unit B,Low
844,Bihar,2024-12-17,Encounter,6,Patrol Unit A,Medium
845,Assam,2024-03-19,Patrolling,9,Local Intel,Low
846,Bihar,2024-08-01,Encounter,4,Unit C,Medium
847,Assam,2024-04-16,Riots,8,Patrol Unit B,High
848,Jammu & Kashmir,2024-11-26,Ambush,9,Local Intel,High
849,Chhattisgarh,2024-06-22,Encounter,0,Unit D,High
850,West Bengal,2024-09-04,Patrolling,5,Patrol Unit A,High
851,Jammu & Kashmir,2024-10-22,Search Operation,2,Patrol Unit B,High
852,Kerala,2024-02-16,Ambush,10,Unit C,Low
853,West Bengal,2024-03-01,Search Operation,4,Unit D,Low
854,West Bengal,2024-09-07,Encounter,6,Patrol Unit B,Medium
855,Chhattisgarh,2024-02-06,Encounter,10,Local Intel,Low
856,Jammu & Kashmir,2024-01-31,Search Operation,9,Unit D,Low
857,Kerala,2024-10-26,Ambush,6,Patrol Unit A,High
858,Bihar,2024-12-12,Riots,0,Patrol Unit A,High
859,Maharashtra,2024-03-21,IED Blast,1,Patrol Unit B,Low
860,Chhattisgarh,2024-12-27,Encounter,7,Unit D,High
861,Maharashtra,2024-12-17,Ambush,8,Unit C,Low
862,Odisha,2024-03-13,Patrolling,5,Patrol Unit A,High
863,Maharashtra,2024-06-08,IED Blast,9,Unit D,Low
864,West Bengal,2024-08-05,IED Blast,4,Local Intel,Medium
865,Maharashtra,2025-01-01,Encounter,9,Unit D,Medium
866,Kerala,2024-03-31,Ambush,5,Local Intel,High
867,Assam,2024-11-19,IED Blast,4,Unit D,Medium
868,West Bengal,2024-03-16,IED Blast,5,Unit D,Medium
869,Jammu & Kashmir,2024-12-30,Search Operation,7,Local Intel,Medium
870,Chhattisgarh,2024-11-23,IED Blast,9,Local Intel,Low
871,Maharashtra,2024-05-18,Patrolling,1,Unit C,Medium
872,Bihar,2024-08-15,Riots,0,Patrol Unit B,Low
873,West Bengal,2024-05-05,Riots,6,Patrol Unit A,High
874,Bihar,2024-05-07,Patrolling,5,Local Intel,Low
875,Bihar,2024-07-22,IED Blast,9,Patrol Unit A,High
876,Jammu & Kashmir,2024-08-28,Search Operation,7,Unit C,High
877,Kerala,2024-07-05,Patrolling,0,Patrol Unit B,Low
878,Assam,2024-05-12,Patrolling,7,Patrol Unit A,Medium
879,Kerala,2024-11-07,Patrolling,7,Patrol Unit A,Low
880,Jammu & Kashmir,2024-04-12,Search Operation,6,Unit D,Low
881,Chhattisgarh,2024-04-09,Ambush,3,Patrol Unit B,Low
882,West Bengal,2025-01-16,Patrolling,7,Patrol Unit A,Medium
883,Chhattisgarh,2024-07-04,Patrolling,6,Unit D,Medium
884,West Bengal,2024-10-07,Search Operation,6,Local Intel,Low
885,Maharashtra,2024-07-07,Encounter,5,Patrol Unit A,High
886,Maharashtra,2025-01-22,Encounter,0,Patrol Unit A,Low
887,Assam,2024-07-01,Patrolling,3,Unit C,Low
888,Odisha,2024-09-07,Search Operation,3,Local Intel,High
889,Assam,2024-04-16,Patrolling,2,Patrol Unit A,High
890,Bihar,2024-12-02,IED Blast,3,Local Intel,High
891,Chhattisgarh,2024-12-01,Riots,0,Unit C,Low
892,Assam,2024-06-10,Riots,6,Patrol Unit B,High
893,Assam,2024-04-28,Ambush,6,Unit C,High
894,Assam,2024-06-07,Search Operation,5,Unit C,High
895,Jammu & Kashmir,2024-12-16,Ambush,9,Unit D,Low
896,Odisha,2024-08-23,Ambush,0,Patrol Unit B,High
897,Odisha,2024-03-24,Patrolling,3,Local Intel,Low
898,Assam,2024-02-15,IED Blast,4,Unit C,Medium
899,Odisha,2024-02-21,Encounter,3,Patrol Unit A,High
900,Jammu & Kashmir,2024-04-15,Ambush,0,Patrol Unit A,High
901,Odisha,2024-05-21,Search Operation,7,Unit D,High
902,Assam,2024-09-18,Ambush,6,Unit C,High
903,West Bengal,2024-12-14,Encounter,10,Unit D,Medium
904,Kerala,2025-01-12,Riots,10,Unit C,Medium
905,Jammu & Kashmir,2024-07-08,IED Blast,2,Local Intel,Low
906,Odisha,2024-03-27,Encounter,1,Unit D,Low
907,Maharashtra,2024-12-30,IED Blast,8,Unit C,High
908,Bihar,2024-11-24,Ambush,1,Unit C,Medium
909,West Bengal,2024-06-05,Ambush,0,Patrol Unit B,Low
910,Bihar,2024-10-19,Patrolling,1,Patrol Unit A,High
911,Odisha,2024-12-18,Search Operation,3,Local Intel,Medium
912,Jammu & Kashmir,2024-05-27,Riots,2,Unit D,Medium
913,Kerala,2024-06-24,Ambush,6,Patrol Unit A,Low
914,Odisha,2024-11-03,Riots,6,Unit D,Low
915,West Bengal,2024-03-10,IED Blast,2,Patrol Unit B,High
916,Odisha,2024-02-08,Patrolling,1,Local Intel,High
917,Maharashtra,2024-04-11,Patrolling,7,Unit D,Low
918,Assam,2024-09-27,Search Operation,4,Unit D,Low
919,Chhattisgarh,2024-10-21,Ambush,3,Patrol Unit B,Medium
920,Kerala,2024-11-08,Ambush,8,Local Intel,High
921,Chhattisgarh,2024-10-24,Search Operation,10,Patrol Unit B,High
922,Jammu & Kashmir,2024-07-02,IED Blast,8,Patrol Unit A,Low
923,Bihar,2024-05-16,IED Blast,0,Unit D,Low
924,Bihar,2024-08-25,Encounter,2,Unit D,High
925,West Bengal,2024-03-20,Ambush,7,Local Intel,High
926,Bihar,2024-04-18,Ambush,7,Patrol Unit B,Low
927,Kerala,2024-11-08,Ambush,9,Unit C,High
928,Assam,2024-02-24,IED Blast,8,Unit C,Medium
929,Bihar,2025-01-05,Riots,8,Patrol Unit B,Low
930,Chhattisgarh,2024-09-06,Encounter,0,Local Intel,High
931,Maharashtra,2024-12-17,Encounter,1,Patrol Unit B,High
932,Bihar,2024-11-19,Search Operation,5,Unit D,High
933,Maharashtra,2024-09-21,Encounter,6,Patrol Unit A,Low
934,Jammu & Kashmir,2024-03-11,Encounter,1,Patrol Unit A,High
935,Maharashtra,2024-03-19,Search Operation,4,Patrol Unit B,Medium
936,Odisha,2024-06-26,Riots,4,Unit C,Medium
937,Kerala,2024-12-31,Ambush,7,Patrol Unit B,Low
938,Jammu & Kashmir,2024-04-06,Search Operation,6,Unit C,High
939,West Bengal,2024-03-19,Riots,5,Patrol Unit A,Low
940,Chhattisgarh,2024-12-28,Riots,3,Patrol Unit B,Medium
941,Kerala,2024-05-02,IED Blast,3,Patrol Unit A,High
942,Kerala,2024-04-19,Search Operation,4,Patrol Unit A,Medium
943,Assam,2024-02-28,Patrolling,4,Local Intel,Low
944,Assam,2024-07-04,IED Blast,8,Patrol Unit B,High
945,Odisha,2024-11-19,Patrolling,4,Patrol Unit A,High
946,Jammu & Kashmir,2024-08-11,Search Operation,0,Unit D,Low
947,Bihar,2024-11-28,Riots,3,Local Intel,Low
948,Bihar,2024-02-21,Encounter,5,Unit C,Medium
949,Kerala,2024-03-16,Search Operation,7,Unit D,Medium
950,Bihar,2024-10-30,Encounter,5,Patrol Unit A,Medium
951,Bihar,2024-06-11,IED Blast,4,Local Intel,Medium
952,West Bengal,2024-03-13,Search Operation,9,Patrol Unit A,High
953,Odisha,2024-06-12,Search Operation,4,Unit C,Medium
954,Kerala,2024-06-14,Encounter,3,Patrol Unit A,High
955,Bihar,2024-05-09,Encounter,2,Patrol Unit A,Medium
956,Jammu & Kashmir,2024-02-02,IED Blast,3,Patrol Unit B,Medium
957,Chhattisgarh,2024-07-30,Ambush,5,Unit C,Medium
958,Bihar,2024-12-08,IED Blast,9,Unit D,Low
959,Odisha,2024-02-04,IED Blast,6,Local Intel,High
960,Bihar,2024-07-05,Patrolling,5,Patrol Unit A,Medium
961,West Bengal,2024-11-09,Ambush,3,Patrol Unit A,High
962,Chhattisgarh,2025-01-18,Ambush,7,Unit C,Low
963,Maharashtra,2024-02-23,IED Blast,6,Unit C,Medium
964,Maharashtra,2024-07-10,Encounter,2,Patrol Unit A,Medium
965,Assam,2024-09-23,Riots,6,Patrol Unit B,High
966,Assam,2024-05-12,Encounter,4,Patrol Unit B,High
967,Odisha,2024-02-08,Encounter,4,Local Intel,Low
968,West Bengal,2024-10-01,IED Blast,6,Unit C,Low
969,Assam,2024-03-19,Riots,9,Local Intel,Low
970,Assam,2025-01-19,Riots,5,Unit D,Low
971,Kerala,2025-01-22,Encounter,7,Local Intel,High
972,Bihar,2024-05-30,Encounter,5,Unit D,Medium
973,Maharashtra,2024-01-28,Ambush,2,Patrol Unit A,Medium
974,Jammu & Kashmir,2024-09-22,Patrolling,10,Patrol Unit A,Low
975,Kerala,2024-04-11,Ambush,1,Unit D,Low
976,Chhattisgarh,2024-09-25,Search Operation,4,Patrol Unit B,Medium
977,Odisha,2024-11-16,Search Operation,0,Unit C,Medium
978,Bihar,2024-07-31,IED Blast,10,Unit C,Medium
979,Odisha,2024-05-02,Search Operation,3,Unit C,Medium
980,Bihar,2024-07-06,Riots,10,Patrol Unit B,Low
981,West Bengal,2024-07-22,Search Operation,1,Unit C,Medium
982,Bihar,2024-09-09,Patrolling,10,Patrol Unit B,Low
983,Assam,2024-05-26,Riots,0,Patrol Unit A,Medium
984,Jammu & Kashmir,2024-11-11,Encounter,1,Unit D,High
985,Bihar,2024-08-20,IED Blast,3,Patrol Unit B,Medium
986,West Bengal,2024-11-17,Ambush,3,Patrol Unit A,High
987,West Bengal,2024-10-29,Patrolling,6,Unit D,Low
988,Kerala,2024-07-24,Riots,2,Local Intel,Low
989,Maharashtra,2024-02-21,Encounter,4,Patrol Unit A,Medium
990,Maharashtra,2024-09-05,Search Operation,7,Unit C,Medium
991,Jammu & Kashmir,2024-04-05,Encounter,7,Local Intel,Low
992,West Bengal,2024-11-26,Encounter,7,Local Intel,High
993,West Bengal,2024-05-25,Search Operation,5,Patrol Unit B,High
994,Bihar,2025-01-02,Encounter,9,Unit D,High
995,Maharashtra,2024-03-07,Search Operation,8,Unit C,High
996,Maharashtra,2024-10-18,Encounter,2,Patrol Unit A,Medium
997,Maharashtra,2024-07-20,Riots,10,Unit D,Medium
998,Maharashtra,2024-05-28,Patrolling,8,Unit D,Low
999,Assam,2024-04-03,Ambush,10,Patrol Unit A,High
1000,Chhattisgarh,2024-05-28,Ambush,3,Unit C,High
Step 1: Upload the CSV File to Google Colab
In Google Colab, click on the folder icon on the left sidebar to open the file explorer.
Click the upload icon (a file with an upward arrow) in the file explorer.
Select the CRPF_Incidents_Large.csv file you downloaded.
Step 2: Import Required Libraries and Read the CSV File
In a new code cell in Colab, copy and paste the following code to import the required libraries and load the CSV file using pandas:
python
CopyEdit
import pandas as pd
# Load the uploaded CSV file
file_path = "/content/CRPF_Incidents_Large.csv"
df = pd.read_csv(file_path)
# Display the first few rows
print("Data Preview:")
print(df.head())
Run the code cell.
Step 3: Analyze and Visualize the Data
In a new code cell, paste the following code to generate some basic insights and visualize the data using matplotlib:
python
CopyEdit
import matplotlib.pyplot as plt
# Basic statistics
print("Dataset Overview:")
print(df.describe())
# Incident Type Distribution
incident_counts = df['Incident_Type'].value_counts()
plt.figure(figsize=(8, 6))
incident_counts.plot(kind='bar', title="Incident Type Distribution")
plt.xlabel("Incident Type")
plt.ylabel("Number of Incidents")
plt.show()
# Severity Distribution
severity_counts = df['Severity'].value_counts()
plt.figure(figsize=(8, 6))
severity_counts.plot(kind='pie', autopct='%1.1f%%', title="Severity Distribution")
plt.ylabel("") # To remove y-label for pie chart
plt.show()
Step 4: Preparing Data for Prediction
To predict future incidents or identify patterns, we can start by encoding categorical data and applying a basic machine learning model.
Here’s the code to prepare the data:
python
CopyEdit
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score
# Encode categorical data
label_encoders = {}
for column in ['Location', 'Incident_Type', 'Reported_By', 'Severity']:
le = LabelEncoder()
df[column] = le.fit_transform(df[column])
label_encoders[column] = le
# Features and target variable
X = df[['Location', 'Incident_Type', 'Casualties', 'Reported_By']]
y = df['Severity'] # Predicting severity
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Evaluation
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))
What This Code Does:
Encodes Categorical Data: Converts text columns (e.g., Location, Incident_Type, etc.) into numeric format using LabelEncoder.
Defines Features and Target:
Features (X): Includes Location, Incident_Type, Casualties, and Reported_By.
Target (y): Predicts Severity.
Splits the Dataset: Divides the data into training (70%) and testing (30%) sets.
Trains a Random Forest Classifier: A robust algorithm for classification tasks.
Evaluates the Model: Prints the classification report and accuracy score.
Step 5: Predicting New Data
Here’s the code to test the model with new data:
python
CopyEdit
# Example new data for prediction
new_data = pd.DataFrame({
"Location": ["Chhattisgarh", "Jammu & Kashmir", "Assam"],
"Incident_Type": ["Encounter", "IED Blast", "Riots"],
"Casualties": [3, 5, 0],
"Reported_By": ["Patrol Unit A", "Patrol Unit B", "Unit C"]
})
# Encode new data using the same encoders
for column in ['Location', 'Incident_Type', 'Reported_By']:
new_data[column] = label_encoders[column].transform(new_data[column])
# Make predictions
predictions = model.predict(new_data)
predicted_severity = label_encoders['Severity'].inverse_transform(predictions)
# Display predictions
new_data['Predicted_Severity'] = predicted_severity
print("Predictions for New Data:")
print(new_data)
Step 6: Visualizing Feature Importance
This step helps explain which features contribute the most to the model's predictions:
python
CopyEdit
# Feature Importance Visualization
importances = model.feature_importances_
features = X.columns
plt.figure(figsize=(8, 6))
plt.bar(features, importances)
plt.title("Feature Importance")
plt.xlabel("Features")
plt.ylabel("Importance Score")
plt.show()
What These Codes Do:
New Data Prediction:
You provide new incident details (like Location, Incident_Type, etc.).
The model predicts the severity for these incidents.
Feature Importance:
Visualizes which features (e.g., Location, Casualties) have the most impact on predictions.
Step 5: Predicting New Data (Updated for Clarity)
Copy and paste this updated code:
python
CopyEdit
# Example new data for prediction
new_data = pd.DataFrame({
"Location": ["Chhattisgarh", "Jammu & Kashmir", "Assam"],
"Incident_Type": ["Encounter", "IED Blast", "Riots"],
"Casualties": [3, 5, 0],
"Reported_By": ["Patrol Unit A", "Patrol Unit B", "Unit C"]
})
# Encode new data using the same encoders
for column in ['Location', 'Incident_Type', 'Reported_By']:
new_data[column] = label_encoders[column].transform(new_data[column])
# Make predictions
predictions = model.predict(new_data)
predicted_severity = label_encoders['Severity'].inverse_transform(predictions)
# Add predictions to the DataFrame
new_data['Predicted_Severity'] = predicted_severity
# Display predictions in a clear format
print("Predictions for New Data:")
print(new_data[['Location', 'Incident_Type', 'Casualties', 'Reported_By', 'Predicted_Severity']])
Explanation:
Input Data: This code defines new incidents with Location, Incident_Type, Casualties, and Reported_By.
Encoding: Ensures the new data matches the numeric encoding of the trained model.
Prediction Output: Adds a Predicted_Severity column to the new data for clarity.
Print: Displays a clear table of inputs and their predicted severities.
Expected Output:
A table with the following structure:
kotlin
CopyEdit
Location Incident_Type Casualties Reported_By Predicted_Severity
0 Chhattisgarh Encounter 3 Patrol Unit A High
1 Jammu & Kashmir IED Blast 5 Patrol Unit B Medium
2 Assam Riots 0 Unit C Low
Two CSV files
save a file with the name user_data.csv
IP address,Video streaming history,Subscription details,Payment methods,Email ID,Return history,Purchase amount till this date
16.249.66.250,A; Two; House; Still,Non-Member,UPI,hicksmonica@yahoo.com,Yes,23963.08
148.142.102.204,Law; Various; Its; Pattern,Monthly Plan,Net Banking,lopezrobert@hotmail.com,Yes,30181.04
51.161.116.64,Nearly; Wish; Until; Figure; Heavy,Monthly Plan,Credit Card,jeffreyjones@yahoo.com,No,3987.4
7.144.239.20,Body,Annual Plan,Debit Card,matthew29@hotmail.com,No,12419.92
18.175.69.100,Seek; Partner,Monthly Plan,Credit Card,pittmanmartha@hotmail.com,No,34413.69
134.93.246.240,Issue; Impact; Produce,Annual Plan,UPI,mccoydavid@wilson.com,No,21630.25
156.106.194.245,Point; Goal; Do,Prime Member,Net Banking,trevorleach@yahoo.com,No,21322.08
104.128.150.214,Remain; Safe,Non-Member,Credit Card,tlawrence@walker.com,No,2291.95
135.119.234.89,Rise; Friend; Of,Annual Plan,UPI,njenkins@gmail.com,Yes,14929.72
60.108.118.92,Necessary; Fund; Watch,Non-Member,Wallet,bobby36@yahoo.com,No,29847.41
136.205.199.241,Minute,Annual Plan,Net Banking,brownjustin@guerrero.com,Yes,26946.59
162.200.44.125,Agency; Station; Class; Dark,Non-Member,Wallet,jeremy94@turner-williams.com,Yes,15305.29
214.201.97.62,Institution; Activity,Monthly Plan,Net Banking,masonalison@morris.info,Yes,2952.34
173.116.126.73,Listen,Non-Member,Credit Card,thenry@hotmail.com,No,26587.54
200.33.19.153,Physical; Open,Prime Member,Wallet,robinbrown@wang.com,Yes,4624.42
158.136.72.189,Within,Prime Member,Credit Card,heather81@yahoo.com,Yes,13940.5
109.151.171.74,Book; Deal; Bring,Non-Member,Net Banking,cherylmartinez@thompson.org,No,14858.83
38.169.249.110,House; Guess; Job,Annual Plan,UPI,murrayshannon@hotmail.com,No,5087.71
157.165.166.12,Worry; Open; Unit; State; Travel,Prime Member,Debit Card,justinstewart@gmail.com,No,49074.21
115.57.168.31,Late; Late; Life,Annual Plan,UPI,davidprice@mckee.com,No,2833.29
84.239.206.146,Whom; Notice; Even; Provide; Red,Annual Plan,Credit Card,benjamin46@brown.com,Yes,47998.63
84.122.48.97,View; The; Push; Wide,Monthly Plan,Credit Card,steven51@patterson.biz,Yes,4409.41
178.207.210.34,Happy,Annual Plan,Debit Card,jessicavargas@hotmail.com,No,41068.32
158.100.50.9,Grow,Non-Member,Credit Card,marqueztroy@hotmail.com,Yes,27929.42
22.18.44.219,Relationship; Chance; Set; Leg,Annual Plan,Debit Card,rwilson@figueroa.biz,Yes,24622.55
148.89.93.169,Point; Lawyer; Or; Can,Non-Member,UPI,bbrown@pearson.net,Yes,38228.83
138.77.131.251,Charge; Concern; Particular; Time; Manage,Prime Member,Credit Card,lukejackson@matthews.org,Yes,47985.63
165.97.182.96,Effect,Prime Member,Debit Card,wallssue@reid.com,No,35305.84
106.128.93.87,Often; Alone,Prime Member,Credit Card,william56@davis.org,No,40920.14
53.17.16.119,Knowledge,Prime Member,Wallet,soliswilliam@green-vang.com,No,23781.67
109.19.205.72,Miss; Reflect; Can; Tax; Stop,Annual Plan,Wallet,thomasbrittney@hotmail.com,No,43031.38
71.146.225.237,Future,Monthly Plan,Wallet,joe13@wright-miller.net,Yes,23321.46
21.31.166.98,Opportunity; Actually; Together; Soon,Monthly Plan,UPI,sanchezsusan@love.com,Yes,5268.54
108.136.93.118,Into; Serve; Instead; Throughout,Monthly Plan,Debit Card,jennifermoore@brooks.com,No,42462.77
220.60.47.80,Conference; Feeling; Last; Begin,Non-Member,Debit Card,alexander74@gmail.com,Yes,29034.87
176.84.115.17,Several; Focus,Monthly Plan,Credit Card,lindsay15@gmail.com,Yes,47548.81
39.151.159.215,Able; Enjoy; Court; Evidence; System,Prime Member,Credit Card,gregoryrodgers@roach.com,Yes,28603.16
179.148.117.141,Article,Prime Member,Wallet,xevans@evans-bernard.com,Yes,11972.92
56.190.248.26,Before; Budget,Annual Plan,Credit Card,zimmermanmark@hotmail.com,No,25587.69
10.56.81.199,Into; Time; Oil,Annual Plan,UPI,ronaldgibson@hobbs.com,Yes,42815.87
199.150.24.125,Plan; Unit; Year; Affect; Month,Prime Member,Debit Card,reynoldsnicole@yahoo.com,Yes,41338.66
101.35.216.60,How; Magazine; Stand; She,Annual Plan,Wallet,claytonosborn@yahoo.com,Yes,13705.38
30.252.125.137,Positive; Until; From; Should; Surface,Prime Member,Credit Card,huangalejandro@hotmail.com,Yes,12053.22
121.216.116.30,Hot,Monthly Plan,UPI,davidjacobs@kim.net,No,40311.92
124.198.170.153,Great; First; Behavior; Might,Prime Member,Debit Card,roger07@guerrero.com,Yes,36252.62
118.37.23.210,Down; Gun,Annual Plan,Debit Card,gina67@yahoo.com,No,47386.7
81.137.234.96,Possible,Non-Member,Net Banking,ismith@yahoo.com,Yes,43595.38
11.150.244.131,Law; Here,Annual Plan,Wallet,qgraham@yahoo.com,No,8571.84
14.125.86.127,Leave; Feeling,Annual Plan,Net Banking,james31@baker.com,No,43662.52
183.46.87.119,Decision; Across; Special; Effort; Over,Annual Plan,UPI,michael43@sullivan.org,Yes,1652.87
Save another file with name other_user_data.csv
IP address,Video streaming history,Subscription details,Payment methods,Email ID,Return history,Purchase amount till this date
18.9.163.26,Large; Five,Annual Plan,Credit Card,george86@hotmail.com,Yes,1674.08
221.165.221.79,Move; Painting,Monthly Plan,Net Banking,laura43@yahoo.com,Yes,18098.02
111.194.239.150,Should; Someone; Job; Skill; Art,Free Trial,Net Banking,joseph83@gmail.com,No,11372.89
55.236.143.56,Ask; Without; Notice; Will; Increase,Annual Plan,Wallet,walkerwilliam@griffin.info,No,48952.71
40.213.23.90,Minute; Develop; Discover; Business; Address,Monthly Plan,Wallet,mconrad@gmail.com,Yes,12848.38
220.31.152.251,Person; On,Free Trial,Wallet,macdonaldbriana@gmail.com,No,6908.27
66.181.34.162,Throw; Everybody; Edge,Free Trial,Credit Card,cynthiajackson@yahoo.com,Yes,33165.87
33.245.226.169,Half; Again; Arm,Free Trial,Credit Card,charlesshelley@gmail.com,Yes,26850.97
73.22.92.105,Different; Choice; Foot; Necessary,Non-Member,UPI,tylerarmstrong@yahoo.com,Yes,40660.61
141.223.118.141,Boy; Already; Child; Laugh; Different,Annual Plan,Wallet,nmacdonald@yahoo.com,No,22967.07
133.65.96.220,Standard; Kid; Yet,Free Trial,UPI,cabrerakimberly@johnson-hodge.org,Yes,14482.93
3.131.249.169,Anyone; Modern; Design; Sell; Mission,Annual Plan,Debit Card,michael14@rivera.net,No,14943.28
78.170.205.208,Share; Member; View; Family; Usually,Monthly Plan,Debit Card,crawfordnathan@hotmail.com,No,17755.28
152.126.183.244,More; Least,Monthly Plan,Net Banking,rosslatoya@hotmail.com,No,43124.02
177.115.49.255,Movie; Station,Non-Member,UPI,ucook@weaver-travis.info,No,27948.94
140.177.14.194,Factor; Defense; Coach; People,Free Trial,Net Banking,lindsey96@gmail.com,No,21471.99
182.213.192.93,Night,Monthly Plan,Debit Card,pphillips@gmail.com,No,35868.33
101.200.116.231,Provide,Free Trial,Wallet,bpatterson@reyes.com,Yes,6876.03
138.96.57.81,Heavy,Monthly Plan,Wallet,walkeredwin@yahoo.com,No,18834.24
68.107.215.152,Poor; Source; Design; Rather,Free Trial,Wallet,charles44@austin-johnson.com,Yes,34290.03
169.138.103.32,Political,Annual Plan,Net Banking,zachary62@herring.biz,Yes,13364.47
175.143.228.29,Plant,Free Trial,Credit Card,deanna21@edwards.com,Yes,21003.82
141.166.141.220,Crime,Free Trial,Net Banking,jessicajohnson@yahoo.com,No,33059.17
148.82.16.138,Benefit; Into; Discuss; Represent,Free Trial,Credit Card,shannonphillips@howell.com,Yes,40392.81
196.170.232.44,Pretty,Annual Plan,Wallet,staceyfreeman@cabrera.org,No,5569.28
218.32.104.4,Network; Pretty; Unit; Culture; Foot,Monthly Plan,Debit Card,hhorton@gmail.com,Yes,30174.75
112.37.19.101,Offer; Player; Accept; Grow; Clear,Non-Member,Net Banking,reevescaroline@yahoo.com,No,12240.03
176.72.6.248,Democratic; Tv; Analysis,Non-Member,Wallet,virginia75@griffith.com,No,9182.99
70.32.180.102,He; Article; Low; Light; Difference,Monthly Plan,UPI,jrowe@gates-buchanan.info,No,10707.69
28.224.146.240,Change; Along; Method; Thus,Annual Plan,Net Banking,jennifermitchell@yahoo.com,Yes,19257.34
105.177.12.4,Something; Recognize; Too,Monthly Plan,Debit Card,graygreg@lawrence.com,Yes,27640.53
111.186.84.137,Sport; Such,Free Trial,Wallet,cummingswilliam@harris.com,Yes,40863.27
213.118.229.231,Reflect; Itself; Again; Radio; Than,Monthly Plan,Debit Card,michael59@rogers.biz,Yes,27174.34
22.185.171.76,Himself; Else; Including; Record; Major,Monthly Plan,Credit Card,owenscarolyn@hotmail.com,No,8571.43
29.150.187.187,Example; Draw; Message,Free Trial,UPI,lthomas@perez.com,No,17541.53
218.172.34.132,Teacher; Drive; Candidate; Beat,Free Trial,Credit Card,mannsamuel@harris.com,Yes,22669.28
5.137.68.53,Ability; House,Monthly Plan,Wallet,chadgarcia@yahoo.com,No,4982.53
140.56.9.53,Up,Non-Member,UPI,pjackson@gmail.com,Yes,26744.73
28.184.129.161,Month; Race; This; Many,Non-Member,Credit Card,howelljonathan@suarez-williams.biz,No,43690.33
174.221.2.166,Dark; Protect,Monthly Plan,Wallet,angelaandrews@elliott-rodriguez.com,No,3766.25
38.132.49.124,Western; Low; College,Free Trial,Debit Card,geraldrodriguez@hotmail.com,No,15378.1
35.233.113.185,Yeah; Could; Environmental; Power,Annual Plan,UPI,cynthia41@hotmail.com,Yes,26035.79
89.239.149.109,Scientist; Standard; Rule; Amount,Free Trial,Debit Card,sherigrant@davis.info,No,31944.84
23.64.234.180,Factor; Consider; Assume,Free Trial,Net Banking,kdixon@miranda.org,No,23435.71
4.163.167.221,Sometimes; Else; Table; Music,Annual Plan,Net Banking,powellkeith@wilson.biz,No,34815.31
198.98.205.122,Tonight,Annual Plan,Debit Card,jennifer42@hotmail.com,Yes,31493.27
86.253.237.81,Down,Non-Member,Credit Card,ncarlson@santiago.com,No,41932.71
159.117.159.234,Read; Prevent,Annual Plan,Wallet,webbstephen@gmail.com,No,31304.67
143.67.90.90,On; Discuss; Visit; Avoid,Annual Plan,Wallet,vanessagreen@chen.com,Yes,2032.34
183.131.115.45,Many,Free Trial,Wallet,philip65@kent.biz,Yes,15690.88
Step 1: Use the following code to upload the files using google.colab.files:
from google.colab import files
uploaded = files.upload()
This will open a file upload dialog. Upload your user_data.csv and other_user_data.csv files.
Now, run the same code again:
uploaded = files.upload() # Upload the second file
This will allow you to upload the second file.
Step 2: Load the uploaded files into pandas DataFrames using this code:
import pandas as pd
# Load the first file
user_data = pd.read_csv("user_data.csv")
# Load the second file
other_user_data = pd.read_csv("other_user_data.csv")
Step 3 - Verify that the files are loaded correctly by displaying the first few rows of each DataFrame:
# Display the first few rows of the first DataFrame
print("User Data:")
print(user_data.head())
# Display the first few rows of the second DataFrame
print("\nOther User Data:")
print(other_user_data.head())
Step 4: Combine both DataFrames into one.
# Combine the two DataFrames
combined_data = pd.concat([user_data, other_user_data])
Identify the top 5 purchasing persons by sorting the combined data based on the "Purchase amount till this date" column.
# Sort combined data by "Purchase amount till this date" in descending order
top_5_purchasers = combined_data.sort_values(by="Purchase amount till this date", ascending=False).head(5)
# Display the top 5 purchasers
print(top_5_purchasers)
Step5
Analyze the distribution of "Purchase amount till this date" across all users using a histogram.
import matplotlib.pyplot as plt
# Plot histogram of purchase amounts
plt.figure(figsize=(10, 6))
plt.hist(combined_data["Purchase amount till this date"], bins=20, edgecolor='k')
plt.title("Distribution of Purchase Amounts", fontsize=16)
plt.xlabel("Purchase Amount", fontsize=14)
plt.ylabel("Frequency", fontsize=14)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
Step 6
Analyze the trend of average purchase amount for different subscription types.
# Group by subscription details and calculate average purchase amount
subscription_trend = combined_data.groupby("Subscription details")["Purchase amount till this date"].mean()
# Plot the trend
plt.figure(figsize=(10, 6))
subscription_trend.plot(kind='bar', color='skyblue', edgecolor='k')
plt.title("Average Purchase Amount by Subscription Type", fontsize=16)
plt.xlabel("Subscription Type", fontsize=14)
plt.ylabel("Average Purchase Amount", fontsize=14)
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()