# Legal Text Analysis Project Documentation - Part 2*Continued from Previous Setup Documentation*
## Database Creation VerificationWe've successfully created our SQLite database with all required tables and structures. Let's review what we've accomplished and how to verify each component.
### Current Project Status1. **Database Location and Access**
Our database is located at:
```bashD:/Projects/LegalTextAnalysis/data/legal_text.db
We confirmed its existence and read/write permissions with:
sqlite3 D:/Projects/LegalTextAnalysis/data/legal_text.db ".databases"# Output: main: D:\\Projects\\LegalTextAnalysis\\data\\legal_text.db r/w
sqlite3 D:/Projects/LegalTextAnalysis/data/legal_text.db ".tables"# Output: legal_cases text_metrics tf_idf_scores
Each table serves a specific purpose in our analysis pipeline:
legal_cases: Stores raw case data and basic metricstext_metrics: Holds advanced text analysis measurementstf_idf_scores: Contains term frequency-inverse document frequency analysisTo interact with the database and dataset, we have several options:
# Open SQLite consolesqlite3 D:/Projects/LegalTextAnalysis/data/legal_text.db
# View table structure.schema legal_cases
.schema text_metrics
.schema tf_idf_scores
# Exit SQLite console.quit
database_utils.py in your scripts directory:import sqlite3
import pandas as pd
def connect_db():
"""Establish connection to our SQLite database.""" return sqlite3.connect('D:/Projects/LegalTextAnalysis/data/legal_text.db')
def read_csv_to_df(csv_path):
"""Read our legal text CSV file into a pandas DataFrame.""" return pd.read_csv(csv_path)
def insert_legal_cases(df, conn):
"""Insert data from DataFrame into legal_cases table.""" df.to_sql('legal_cases', conn, if_exists='append', index=False)
Now that our database structure is set up, we can proceed with our text analysis pipeline:
Here are some helpful SQLite commands for exploring our data:
-- Count total casesSELECT COUNT(*) FROM legal_cases;
-- View data distributionSELECT length_category, COUNT(*)
FROM text_metrics
GROUP BY length_category;
-- Find top TF-IDF termsSELECT term, AVG(score) as avg_score
FROM tf_idf_scores
GROUP BY term
ORDER BY avg_score DESC
LIMIT 10;