-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irc_disentangle - Issue with splitting data #6906
Comments
Thank you I will try this out!
…On Tue, Jun 11, 2024 at 3:55 AM Vincent Lau ***@***.***> wrote:
I add a "streaming=True" after the name of the dataset, and it
works.....hope it can help you
And if you install the version datasets==2.15.0, this bug will not happen.
I don't know why, but all of them works
—
Reply to this email directly, view it on GitHub
<#6906 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3HXU7AMBT2MNO34SC3Z5G3ZG2UOXAVCNFSM6AAAAABH45CNPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGA2DCOBRGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I still find out that there are some strange bug in v2.15.0 of datasets. it seems like that the *.arrow file cannot be established. it may be an index of the subsets. well I still try to debug it. but, one of the most efficient way may be using the google colab to build this index in the ~/huggingface/datasets, and than download them to replace the local file.....lol......it works! |
Yeah I did try what you suggested and it didn’t work. I was able to get it
on a local from someone who access the dataset in the past. Let me know
when you end up fixing this bug.
…On Tue, Jun 11, 2024 at 10:33 PM Vincent Lau ***@***.***> wrote:
I still find out that there are some strange bug in v2.15.0 of datasets.
it seems like that the *.arrow file cannot be established. it may be an
index of the subsets. well I still try to debug it. but, one of the most
efficient way may be using the google colab to build this index in the
~/huggingface/datasets, and than download them to replace the local
file.....lol......it works!
—
Reply to this email directly, view it on GitHub
<#6906 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3HXU7BCJE2LOCWRVWPMNODZG6XPJAVCNFSM6AAAAABH45CNPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRHE4DQNZZHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
I am trying to access your database through python using "datasets.load_dataset("irc_disentangle")" and I am getting this error message:
ValueError: Instruction "train" corresponds to no data!
Steps to reproduce the bug
import datasets
ds = datasets.load_dataset('irc_disentangle')
ds
Expected behavior
The data is supposed to load into ds and be accessable as such:
ds['train'][1050], ds['train'][1055]
Environment info
I tired Python 3.12 and 3.10
The text was updated successfully, but these errors were encountered: