Dictionary: response: fields[41]{name,type,label,description,min,max,cardinality,enumeration,null_count,examples}: Unique Key,Integer,Unique Key,"A system‑generated integer that uniquely identifies each complaint record. The dataset contains one million distinct values (uniqueness ratio 1), so every row can be referenced by this key.","11465364","48478173",1000000,null,0, Created Date,DateTime,Created Date,"Timestamp indicating when the complaint was entered into the system. Valid dates range from 2010‑01‑01 to 2020‑12‑23, with a mean of approximately 2015‑11‑10. The field is never null; its distribution shows that most complaints were logged in early 2013–2014.","2010-01-01T00:00:00+00:00","2020-12-23T01:25:51+00:00",841014,null,0,"Other (841,004) [997333]\n01/24/2013 12:00:00 AM [347]\n01/07/2014 12:00:00 AM [315]\n01/08/2015 12:00:00 AM [283]\n02/16/2015 12:00:00 AM [269]" Closed Date,DateTime,Closed Date,"Timestamp indicating when the complaint was formally closed. About 97 % of records have a non‑null value, but the majority (≈96 %) fall into an \"Other\" bucket because many closures are not captured in the raw data. When present, dates range from 1900‑01‑01 to 2100‑01‑01.","1900-01-01T00:00:00+00:00","2100-01-01T00:00:00+00:00",688837,null,28619,"Other (688,827) [968897]\n(NULL) [28619]\n11/15/2010 12:00:00 AM [384]\n11/07/2012 12:00:00 AM [329]\n12/09/2010 12:00:00 AM [267]" Agency,String,Agency,"Two‑letter code identifying the agency that processed the complaint (e.g., NYPD, HPD). The most common codes are NYPD (26 %), HPD (26 %), and DOT (13 %). An \"Other\" category accounts for 3 % of entries.","3-1-1",TLC,28,null,0,"NYPD [265116]\nHPD [258033]\nDOT [132462]\nDSNY [81606]\nDEP [75895]" Agency Name,String,Agency Name,"Full name of the agency corresponding to the two‑letter code. The top ten names map directly to the most common codes; an \"Other\" category contains a diverse set of agencies, accounting for about 10 % of records.","3-1-1",Valuation Policy,553,null,0,"New York City Police Depa… [265038]\nDepartment of Housing Pre… [258019]\nDepartment of Transportat… [132462]\nOther (543) [103974]\nDepartment of Environment… [75895]" Complaint Type,String,Complaint Type,"High‑level category describing the nature of the complaint (e.g., Noise – Residential, HEAT/HOT WATER). The top ten categories cover roughly 44 % of all complaints; an \"Other\" bucket contains over half the dataset.","../../WEB-INF/web.xml;x=",ZTESTINT,287,null,0,"Other (277) [563561]\nNoise - Residential [89439]\nHEAT/HOT WATER [56639]\nIllegal Parking [45032]\nBlocked Driveway [42356]" Descriptor,String,Descriptor,"A more specific description within a complaint type (e.g., Loud Music/Party, HEAT). The most frequent descriptors represent about 30 % of records; the remaining 70 % fall into an \"Other\" bucket.",1 Missed Collection,unknown odor/taste in drinking water (QA6),1392,null,3001,"Other (1,382) [674871]\nLoud Music/Party [93646]\nENTIRE BUILDING [36885]\nHEAT [35088]\nNo Access [31631]" Location Type,String,Location Type,"Classification of the physical location where the complaint was reported (e.g., Residential Building, Street/Sidewalk). The top categories cover roughly a quarter of records; many entries are null or fall into an \"Other\" bucket.","1-, 2- and 3- Family Home",Wooded Area,162,null,239131,"RESIDENTIAL BUILDING [255562]\n(NULL) [239131]\nStreet/Sidewalk [145653]\nResidential Building/Hous… [92765]\nStreet [92190]" Incident Zip,String,Incident Zip,"ZIP code where the incident occurred. About 82 % of records have a ZIP value, with the remaining 18 % missing. The most common ZIPs are in Manhattan and Brooklyn.",*,XXXXX,535,null,54978,"Other (525) [827654]\n(NULL) [54978]\n11226 [17114]\n10467 [14495]\n11207 [12872]" Incident Address,String,Incident Address,"Street address string for the incident location. Over 80 % of records contain an address; the top ten addresses represent less than 1 % each, with a large \"Other\" category covering the rest.",* *,west 155 street and edgecombe avenue,341996,null,174700,"Other (341,986) [819378]\n(NULL) [174700]\n655 EAST 230 STREET [1538]\n78-15 PARSONS BOULEVARD [694]\n672 EAST 231 STREET [663]" Street Name,String,Street Name,"Name of the street at the incident site. The field is populated in roughly 82 % of records; common streets include Broadway and 5th Avenue, but most values are unique or fall into an \"Other\" bucket.",*,wyckoff avenue,14837,null,174720,"Other (14,827) [787222]\n(NULL) [174720]\nBROADWAY [9702]\nGRAND CONCOURSE [5851]\nOCEAN AVENUE [3946]" Cross Street 1,String,Cross Street 1,"First cross street intersecting the incident location. About 32 % of records have a value; the top ten streets account for roughly 10 % of those, with a large \"Other\" category.",1 AVE,mermaid,16238,null,320401,"Other (16,228) [623317]\n(NULL) [320401]\nBEND [12562]\nBROADWAY [8548]\n3 AVENUE [6154]" Cross Street 2,String,Cross Street 2,"Second cross street intersecting the incident location. Similar to Cross Street 1: ~32 % populated, top ten streets cover ~10 %, and an \"Other\" bucket holds the majority.",1 AVE,surf,16486,null,323644,"Other (16,476) [626168]\n(NULL) [323644]\nBEND [12390]\nBROADWAY [8833]\nDEAD END [5626]" Intersection Street 1,String,Intersection Street 1,"First street in an intersection at the incident site. The field is present for about 77 % of records; top streets (e.g., Broadway) cover ~5 %, with a large \"Other\" bucket.",1 AVE,flatlands AVE,11237,null,767422,"(NULL) [767422]\nOther (11,227) [215482]\nBROADWAY [3761]\nCARPENTER AVENUE [2918]\nBEND [2009]" Intersection Street 2,String,Intersection Street 2,"Second street in an intersection at the incident site. Similar distribution to Intersection Street 1, with most values falling into the \"Other\" category.",1 AVE,glenwood RD,11674,null,767709,"(NULL) [767709]\nOther (11,664) [216748]\nBROADWAY [3462]\nBEND [1942]\n2 AVENUE [1690]" Address Type,String,Address Type,"Type of address used (e.g., ADDRESS, INTERSECTION, BLOCKFACE). The majority (71 %) are simple street addresses; about 13 % are intersections and ~12 % have no value.",ADDRESS,PLACENAME,6,"(NULL)\nADDRESS\nBLOCKFACE\nINTERSECTION\nLATLONG\nPLACENAME",125802,"ADDRESS [710380]\nINTERSECTION [133361]\n(NULL) [125802]\nBLOCKFACE [22620]\nLATLONG [7421]" City,String,City,"Borough or city name where the incident occurred. Brooklyn is most common (30 %), followed by New York, Bronx, etc.; a small fraction (~6 %) are missing.",*,YORKTOWN HEIGHTS,382,null,61963,"BROOKLYN [296254]\nNEW YORK [189069]\nBRONX [181168]\nOther (372) [171028]\n(NULL) [61963]" Landmark,String,Landmark,"Notable landmark near the incident site. Most records lack a value (91 %); the remaining entries include street names and specific landmarks such as \"East 230 Street\".",1 AVENUE,ZULETTE AVENUE,5915,null,912779,"(NULL) [912779]\nOther (5,905) [80508]\nEAST 230 STREET [1545]\nEAST 231 STREET [1291]\nBROADWAY [1148]" Facility Type,String,Facility Type,"Type of facility involved in the complaint. The dominant value is \"N/A\" (~63 %), followed by \"Precinct\" (19 %) and other building or park categories; many entries are missing.",DSNY Garage,School District,6,"(NULL)\nDSNY Garage\nN/A\nPrecinct\nSchool\nSchool District",145478,"N/A [628279]\nPrecinct [193259]\n(NULL) [145478]\nDSNY Garage [32310]\nSchool [617]" Status,String,Status,"Current status of the complaint record. Over 95 % are closed, with small percentages pending, open, or in progress. A few records have undefined statuses.",Assigned,Unspecified,10,"Assigned\nClosed\nClosed - Testing\nEmail Sent\nIn Progress\nOpen\nPending\nStarted\nUnassigned\nUnspecified",0,"Closed [952522]\nPending [20119]\nOpen [12340]\nIn Progress [7841]\nAssigned [6651]" Due Date,DateTime,Due Date,"Scheduled date for resolution of the complaint. Approximately two thirds of records lack a due date; when present, dates cluster around early 2014–2015.","1900-01-02T00:00:00+00:00","2021-06-17T16:34:13+00:00",345077,null,647794,"(NULL) [647794]\nOther (345,067) [350849]\n04/08/2015 10:00:58 AM [214]\n05/02/2014 03:32:17 PM [183]\n03/30/2018 10:10:39 AM [172]" Resolution Description,String,Resolution Description,"Narrative summary of actions taken to resolve the complaint (e.g., police response, inspection). The most frequent descriptions involve police or housing department responses; a large \"Other\" bucket contains custom statements.",A DOB violation was issued for failing to comply with an existing Stop Work Order.,"Your request was submitted to the Department of Homeless Services. The City?s outreach team will assess the homeless individual and offer appropriate assistance within 2 hours. If you asked to know the outcome of your request, you will get a call within 2 hours. No further status will be available through the NYC 311 App, 311, or 311 Online.",1216,null,20480,"Other (1,206) [532002]\nThe Police Department res… [91408]\nThe Department of Housing… [72962]\nThe Police Department res… [63868]\nService Request status fo… [52155]" Resolution Action Updated Date,DateTime,Resolution Action Updated Date,"Timestamp of the last update to the resolution. Most records have a value (≈98 %); dates range from 2009‑12‑31 to 2020‑12‑23, with a mean around 2015‑11‑19.","2009-12-31T01:35:00+00:00","2020-12-23T06:56:14+00:00",690314,null,15072,"Other (690,304) [982378]\n(NULL) [15072]\n11/15/2010 12:00:00 AM [385]\n11/07/2012 12:00:00 AM [336]\n12/09/2010 12:00:00 AM [273]" Community Board,String,Community Board,Local community board number or designation. The most common values are the numbered boards; a large proportion (≈75 %) are unspecified.,0 Unspecified,Unspecified STATEN ISLAND,77,null,0,"Other (67) [751635]\n0 Unspecified [49878]\n12 MANHATTAN [29845]\n12 QUEENS [23570]\n01 BROOKLYN [21714]" BBL,String,BBL,"Borough‑Block‑Lot identifier for NYC real‑estate parcels. About 24 % of records have a specific BBL; most entries fall into an \"Other\" bucket, indicating many complaints are not tied to a parcel.","0140694020","0140694020",268383,null,243046,"Other (268,373) [751031]\n(NULL) [243046]\n2048330028 [1566]\n4068290001 [696]\n4015110001 [664]" Borough,String,Borough,"NYC borough where the incident occurred (Brooklyn, Queens, Manhattan, Bronx, Staten Island). Brooklyn is most frequent (~30 %); an additional ~5 % of records are unspecified.",BRONX,Unspecified,6,"BRONX\nBROOKLYN\nMANHATTAN\nQUEENS\nSTATEN ISLAND\nUnspecified",0,"BROOKLYN [296081]\nQUEENS [228818]\nMANHATTAN [195488]\nBRONX [180142]\nUnspecified [49878]" X Coordinate (State Plane),Integer,X Coordinate (State Plane),"Easting coordinate in the New York State Plane system. Most values fall into a large \"Other\" bucket, with only about 8 % having one of the top ten coordinates; nulls account for ~8 %.","913281","1067220",102556,null,85327,"Other (102,546) [908877]\n(NULL) [85327]\n1022911 [1568]\n1037000 [701]\n1023174 [675]" Y Coordinate (State Plane),Integer,Y Coordinate (State Plane),"Northing coordinate in the New York State Plane system. Similar to X coordinate: a large \"Other\" bucket, ~8 % of records with top values, and ~8 % null.","121152","271876",116092,null,85327,"Other (116,082) [908868]\n(NULL) [85327]\n264242 [1566]\n202363 [706]\n211606 [665]" Open Data Channel Type,String,Open Data Channel Type,"Medium through which the complaint was submitted (PHONE, ONLINE, MOBILE, UNKNOWN). Phone is most common (~50 %), followed by unknown (~23 %) and online (~18 %).",MOBILE,UNKNOWN,5,"MOBILE\nONLINE\nOTHER\nPHONE\nUNKNOWN",0,"PHONE [497606]\nUNKNOWN [230402]\nONLINE [177334]\nMOBILE [79892]\nOTHER [14766]" Park Facility Name,String,Park Facility Name,Name of a park or recreation facility involved in the complaint. The vast majority (≈99 %) are unspecified; a few records reference specific parks such as Central Park.,"\"Uncle\" Vito F. Maranzano Glendale Playground",Zimmerman Playground,1889,null,0,"Unspecified [993141]\nOther (1,879) [5964]\nCentral Park [261]\nRiverside Park [136]\nProspect Park [129]" Park Borough,String,Park Borough,"Borough where the park is located. Distribution mirrors overall borough distribution: Brooklyn (~30 %), Queens (~23 %), Manhattan (~20 %).",BRONX,Unspecified,6,"BRONX\nBROOKLYN\nMANHATTAN\nQUEENS\nSTATEN ISLAND\nUnspecified",0,"BROOKLYN [296081]\nQUEENS [228818]\nMANHATTAN [195488]\nBRONX [180142]\nUnspecified [49878]" Vehicle Type,String,Vehicle Type,"Type of vehicle involved in a complaint (e.g., Car Service, Ambulette). Nearly all records are null or unspecified; the small fraction with values is dominated by \"Car Service\".",Ambulette / Paratransit,Green Taxi,5,"(NULL)\nAmbulette / Paratransit\nCar Service\nCommuter Van\nGreen Taxi",999652,"(NULL) [999652]\nCar Service [317]\nAmbulette / Paratransit [19]\nCommuter Van [11]\nGreen Taxi [1]" Taxi Company Borough,String,Taxi Company Borough,Borough where the taxi company operates. Almost all entries are null (≈100 %); a tiny minority indicates boroughs such as Brooklyn or Queens.,BRONX,Staten Island,11,null,999156,"(NULL) [999156]\nBROOKLYN [207]\nQUEENS [194]\nMANHATTAN [171]\nBRONX [127]" Taxi Pick Up Location,String,Taxi Pick Up Location,"Location from which a taxi was picked up (e.g., JFK Airport, Intersection). Over 99 % of records lack this information; the few that do are largely generic \"Other\" values.",1 5 AVENUE MANHATTAN,YORK AVENUE AND EAST 70 STREET,1903,null,992129,"(NULL) [992129]\nOther [4091]\nOther (1,893) [2021]\nJFK Airport [562]\nIntersection [486]" Bridge Highway Name,String,Bridge Highway Name,"\"Name\" of the bridge or highway involved in a traffic complaint. Most entries are null (≈100 %) with a small set of common names like Belt Pkwy and BQE/Gowanus Expwy.","145th St. Br - Lenox Ave","Willis Ave Br - 125th St/1st Ave",68,null,997711,"(NULL) [997711]\nOther (58) [851]\nBelt Pkwy [276]\nBQE/Gowanus Expwy [254]\nGrand Central Pkwy [186]" Bridge Highway Direction,String,Bridge Highway Direction,"\"Direction\" of travel on the bridge or highway. Similar sparsity to Bridge Highway Name; top directions include East/Long Island Bound and North/Bronx Bound.",Bronx Bound,Westbound/To Goethals Br,50,null,997691,"(NULL) [997691]\nOther (40) [1064]\nEast/Long Island Bound [210]\nNorth/Bronx Bound [208]\nEast/Queens Bound [197]" Road Ramp,String,Road Ramp,"Type of ramp involved in a traffic complaint (e.g., Roadway, Ramp). The field is almost always null; the few values are predominantly \"Roadway\" or \"Ramp.\"",N/A,Roadway,4,"(NULL)\nN/A\nRamp\nRoadway",997693,"(NULL) [997693]\nRoadway [1731]\nRamp [555]\nN/A [21]" Bridge Highway Segment,String,Bridge Highway Segment,"Segment identifier for a bridge or highway. As with other traffic fields, most records lack this value; the small number of entries includes segment IDs and descriptive exit names.","1-1-1265963747",Wythe Ave/Kent Ave (Exit 31),937,null,997556,"(NULL) [997556]\nOther (927) [2159]\nRamp [92]\nRoadway [54]\nClove Rd/Richmond Rd (Exi… [23]" Latitude,Float,Latitude,"Decimal latitude coordinate (WGS 84). About 25 % of records have a valid latitude; the rest are null or fall into an \"Other\" bucket, indicating many complaints lack precise geographic coordinates.","40.1123853","40.9128688",353694,null,254695,"Other (353,684) [739574]\n(NULL) [254695]\n40.89187241649303 [1538]\n40.1123853 [1153]\n40.89238451539139 [663]" Longitude,Float,Longitude,"Decimal longitude coordinate (WGS 84). Similar sparsity to Latitude: ~25 % populated, with most values missing or grouped in a large \"Other\" bucket.","-77.5195844","-73.7005968",353996,null,254695,"Other (353,986) [739574]\n(NULL) [254695]\n-73.86016845296459 [1538]\n-77.5195844 [1153]\n-73.8592161325675 [663]" Location,String,Location,"String representation of the geographic point in parentheses (e.g., \"(40.1123853, -77.5195844)\"). The field is populated for roughly a quarter of records; most entries are null or belong to an \"Other\" bucket.","(40.1123853, -77.5195844)","(40.9128688, -73.9024731)",375772,null,254695,"Other (375,762) [739574]\n(NULL) [254695]\n(40.89187241649303, -73.8… [1538]\n(40.1123853, -77.5195844) [1153]\n(40.89238451539139, -73.8… [663]" enum_threshold: 10 num_examples: 5 truncate_str: 25 attribution: "Generated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt /tmp/NYC_311_SR_2010-2020-sample-1M.csv --all --fresh --format toon -o nyc311-describegpt.toon\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nLanguage: \nTimestamp: 2025-12-19T02:14:17.966385+00:00\n\nWARNING: Label and Description generated by an LLM and may contain inaccuracies. Verify before using!\n" reasoning: "We need to generate labels and descriptions for each field. Use the summary stats and frequency distribution to guide context.\n\nWe must output JSON where keys are field names as given: \"Unique Key\", etc.\n\nNeed human-friendly label (likely same as field name). Description should explain meaning, possible values, data characteristics like cardinality, typical values, nulls, etc.\n\nWe'll produce for all 41 fields. Use frequency distribution to note top values and others. Also summary stats provide min max, range, mean, null counts, uniqueness ratio, cardinality, etc.\n\nWe need concise but full description. Let's craft.\n\nAlso consider that some fields have \"\" meaning each record unique; we can mention uniqueness.\n\nWe must use JSON with double quotes. Ensure no trailing commas.\n\nLet's produce.\n\nWe'll create for each field:\n\n- Unique Key: label \"Unique Key\". Description: unique identifier, integer, 1M records, all unique.\n\n- Created Date: label \"Created Date\". description: timestamp of complaint creation; range 2010-01-01 to 2020-12-23; many nulls? summary shows nullcount 0. Provide distribution: top dates etc.\n\n- Closed Date: similar, with many nulls (most are null). Use frequency distribution: other category majority 96% unknown.\n\n- Agency: agency that handled complaint. Top agencies like NYPD, HPD, DOT. Many other.\n\n- Agency Name: full name of agency. Provide mapping to codes.\n\n- Complaint Type: category of complaint e.g., Noise - Residential; top types etc.\n\n- Descriptor: more detailed description within type. Provide example values.\n\n- Location Type: classification of location (Residential Building, Street/Sidewalk, etc). Many nulls.\n\n- Incident Zip: zip code of incident; many nulls and distribution.\n\n- Incident Address: street address string.\n\n- Street Name: name of street involved.\n\n- Cross Street 1 & 2: cross streets.\n\n- Intersection Street 1 & 2: intersection streets.\n\n- Address Type: type of address (ADDRESS, INTERSECTION, etc).\n\n- City: city borough.\n\n- Landmark: notable landmark near incident.\n\n- Facility Type: type of facility; many N/A.\n\n- Status: status of complaint (Closed, Pending, etc). Most closed.\n\n- Due Date: due date for resolution. Many nulls.\n\n- Resolution Description: description of resolution action.\n\n- Resolution Action Updated Date: timestamp when resolution was last updated.\n\n- Community Board: community board number; many unspecified.\n\n- BBL: Borough/Block/Lot identifier.\n\n- Borough: borough string.\n\n- X Coordinate (State Plane): numeric coordinate.\n\n- Y Coordinate (State Plane).\n\n- Open Data Channel Type: channel used to submit complaint (PHONE, UNKNOWN, ONLINE, etc).\n\n- Park Facility Name: name of park facility; mostly unspecified.\n\n- Park Borough: borough of park.\n\n- Vehicle Type: type of vehicle involved in incident.\n\n- Taxi Company Borough: borough where taxi company is based.\n\n- Taxi Pick Up Location: pickup location for taxis.\n\n- Bridge Highway Name: name of bridge or highway.\n\n- Bridge Highway Direction: direction of travel.\n\n- Road Ramp: type of ramp.\n\n- Bridge Highway Segment: segment identifier.\n\n- Latitude: decimal latitude; many nulls.\n\n- Longitude: decimal longitude; many nulls.\n\n- Location: string coordinate representation.\n\nWe need to include uniqueness ratio, cardinality etc? Description may mention cardinality and uniqueness ratio for key fields. Use summary stats: e.g., Unique Key has uniqueness_ratio 1, cardinality 1000000. For others like Latitude, uniqueness_ratio ~0.0014 maybe.\n\nBut we can keep description concise but incorporate some of these insights.\n\nLet's produce JSON accordingly. Ensure field names exactly as given (with spaces). Use backslashes? No need; just keys in JSON string with quotes.\n\nWill generate.\n\n" token_usage: prompt: 13313 completion: 3416 total: 16729 elapsed: 52109 Description: response: "**Dataset Description**\n\nThis NYC 311 complaints dataset contains 1 000 000 distinct incident records spanning a decade from early 2010 to late 2020. Each record is uniquely identified by an integer key and captures the time a complaint was created, when it was closed (if applicable), the agency responsible, the nature of the problem, the location details (including street, zip code, borough, and optional geographic coordinates), the status of the case, and any resolution notes. The data are heavily skewed toward “Other” or highly granular categories for many fields—over 56 % of complaints fall under a generic “Other” complaint type, and more than 67 % have an “Other” descriptor—indicating that most incidents involve niche or uncommon issues not represented in the top ten values. Temporal distribution shows a gradual rise in complaints through the mid‑2010s, with a mean created date of early November 2015 and a moderate left‑skewed spread (negative skewness). Geographically, most incidents are reported in Brooklyn and Queens, but a large proportion lack precise address or coordinate data. The dataset is largely clean of duplicate keys, though it contains substantial missingness for dates, coordinates, and optional descriptive fields.\n\n---\n\n### Notable Characteristics\n\n- **High cardinality & uniqueness** – the `Unique Key` field has 100 % uniqueness; other fields such as `Incident Zip`, `Street Name`, and `Location Type` have very high cardinalities, but most of their values are captured in an “Other” bucket (e.g., 82 % for zip codes, 78 % for street names). \n- **Missingness** – about 2.9 % of records lack a `Closed Date`; up to 25 % have no latitude/longitude; roughly 8–9 % have null X/Y State Plane coordinates. \n- **Skewed distributions** – many categorical fields (e.g., `Agency`, `Complaint Type`) are heavily dominated by the top few values, while the remaining categories are sparse and largely grouped under “Other.” \n- **Temporal spread** – the dataset covers a ten‑year period with a mean creation date in late 2015; the range of dates is broad (10 years) but most activity clusters around early 2013–2014. \n- **Potential outliers** – extreme `Created Date` or `Closed Date` values are confined to the null or “Other” categories and do not materially affect summary statistics. \n- **PII/Privacy** – addresses, zip codes, boroughs, and coordinates can be used to approximate location and may contain personally identifying information; users should treat these fields with care in public sharing. \n- **Data quality issues** – the prevalence of “Other” categories masks a large number of unique but low‑frequency values, making it difficult to perform fine‑grained analysis without further data cleaning or enrichment. \n\n---\n\n*Attribution: Generated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt /tmp/NYC_311_SR_2010-2020-sample-1M.csv --all --fresh --format toon -o nyc311-describegpt.toon\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nLanguage: \nTimestamp: 2025-12-19T02:14:45.515058+00:00\n\nWARNING: Description generated by an LLM and may contain inaccuracies. Verify before using!\n*" reasoning: "We need to produce a dataset description using summary stats and frequency distribution but not output per field. Summarize overall dataset: 1 million records, fields etc. Provide notable characteristics bullet list. Use data insights from statistics: uniqueness ratio of key 1; Created Date mean around 2015-11-10; distribution skewness negative small -0.0857 indicates left skew? Actually negative indicates left tail (lower values). But maybe not needed.\n\nWe should note that many fields have high missing percentages e.g., Closed Date null 2.86%; Created Date has many unique values but some top frequencies show \"Other\" majority ~99.73%. Agency distribution: NYPD 26%, HPD 26%, etc. Complaint Type: \"Other\" 56% dominant; descriptor: \"Other\" 67%. Location type: top two categories each about 25% and 24%; others lower. Incident Zip: 82% other; incident address similar 81% other; street name 78% other. So dataset has many unknown or 'Other' values.\n\nAlso high cardinality fields like Unique Key unique 1 million.\n\nMissing values: various percentages: e.g., X/Y coordinates null ~8.5%. Latitude/Longitude null 25%.\n\nDuplicates? Unique key ensures no duplicates.\n\nPII? Contains addresses, zip codes, coordinates may be PII; also names of agencies but not personal info. So potential privacy concerns.\n\nOutliers: Created Date range 10 years; some extreme dates in Closed Date 1900-2100 range but only few outliers maybe null counts.\n\nAlso \"Other\" categories large may mask many unique values.\n\nNow produce description paragraph(s) 1-8 sentences.\n\nLet's write." token_usage: prompt: 15640 completion: 983 total: 16623 elapsed: 27545 Tags: response: "nyc_311, public_complaint, city_services, noise_complaint, heat_and_water, parking_violation, geospatial_reporting, open_data, complaint_resolution" reasoning: "We need to produce tags in TOON format: lowercase with underscores. Up to 10 unique tags based on dataset content. Use summary stats and freq distribution to guide.\n\nDataset is NYC 311 complaints data from 2010-2020. Key aspects: city, borough, complaint types (noise, heat, parking), agency, location type residential etc., dates, status, resolution, open data channel types. So tags could be:\n\n- nyc_311_complaints\n- public_safety\n- noise_complaint\n- building_code\n- transportation_incidents\n- city_government_data\n- geospatial_data\n- service_request_tracking\n- citizen_reporting\n- open_data\n\nBut we need no more than 10 tags. Let's think which are most important: dataset is NYC 311. So 'nyc_311', 'public_complaint', 'city_services', 'noise', 'heat', 'parking', 'geolocation', 'open_data', 'agency_response', maybe 'complaint_resolution'.\n\nBut we should choose generic tags that help find similar datasets. Maybe:\n\n- nyc_311\n- public_complaints\n- city_government\n- noise_and_safety\n- building_inspections\n- transportation_incidents\n- geospatial_reporting\n- open_data_dataset\n- citizen_service_requests\n- complaint_resolution\n\nNeed to keep <=10.\n\nLet's pick 9 tags: 'nyc_311', 'public_complaint', 'city_services', 'noise_complaint', 'heat_and_water', 'parking_violation', 'geospatial_data', 'open_data', 'complaint_resolution'.\n\nBut ensure lower case underscore only. So:\n\n- nyc_311\n- public_complaint\n- city_services\n- noise_complaint\n- heat_and_water\n- parking_violation\n- geospatial_reporting\n- open_data\n- complaint_resolution\n\nThat's 9 tags.\n\nNow add Attribution placeholder \"{GENERATED_BY_SIGNATURE}\". Since we produce plain text, after tags. The instructions: \"Add an Attribution with the placeholder ... after the Tags.\" So likely list tags in markdown separated by commas or new lines? They didn't specify format but says 'after the Tags'. Could be a line like:\n\nnyc_311, public_complaint, city_services, noise_complaint, heat_and_water, parking_violation, geospatial_reporting, open_data, complaint_resolution\n\nThen next line: {GENERATED_BY_SIGNATURE}\n\nLet's output that." token_usage: prompt: 15616 completion: 557 total: 16173 elapsed: 20299 num_tags: 10 tag_vocab: null attribution: "Generated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt /tmp/NYC_311_SR_2010-2020-sample-1M.csv --all --fresh --format toon -o nyc311-describegpt.toon\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nLanguage: \nTimestamp: 2025-12-19T02:15:05.815978+00:00\n\nWARNING: Tags generated by an LLM and may contain inaccuracies. Verify before using!"